The digitization of historical documents began in earnest in the mid-1990s, driven by a desire to make records accessible and as a way to digitally preserve fragile documents. By the early 2000s, digitization was accompanied by improved search capabilities. A researcher could search a term and locate records across a large collection of documents that would have otherwise been logistically difficult if not impossible. We are now in the third stage of digitization. As digital collections become more "complete," (1) some historians wish to identify patterns across entire collections: in effect, to "read" those collections at a macro-level scale.
"Big data" analytics are increasingly used in historical research. The History Manifesto (2014), by Jo Guldi and David Armitage, argues that big-data analytics allow historians to see patterns in large data sets that stretch over long periods of time. The literary scholar Franco Moretti coined the term "distant reading" for this style of investigation, as opposed to a "close reading" of a relatively small number of texts. To engage in distant reading is like using a world map. Most historians, on the other hand, work with "city-level maps": small-scale investigations via a close reading of a relatively small set of documents. Crucially, the questions we ask and the results we seek at the macro-scale are different from those at the micro-scale.
Big-data approaches are being employed to examine the primary sources we have been digitizing over the last two decades. What can we learn when we apply macro-scale reading techniques to secondary sources, to historiography? We ask graduate students to master "the literature of a field" by reading lots of books and articles. I want to know if we can "read" the literature of a field at a macro-scale distance to discern the shape and form of historiography. Some historians might fret that such macro-level analysis will push us too far into the quantitative realms. I would argue that it need not.
As a way to view historiographic change over time, historical journals prove to be a relatively coherent record of how scholars have understood a body of knowledge. Reading the entire corpus of a journal at a macro-scale can tell us what subjects historians have been interested in over time. Keeping in mind Moretti's technique, my goal is to "read" all of the academic articles written by historians, not just those now identified as seminal in the field.
Figure 1: FHQ III
I first applied my method to The Florida Historical Quarterly (FHQ), using the Data for Research application from JSTOR (see figure 1). The application allows us to determine the top key terms across the more than 1,500 articles published by FHQ, using a statistical procedure known as TF-IDF (term frequency–inverse document frequency). This measure determines the frequency of a word in a specific document, giving us a measure of the importance of the term as well as some idea of the subject of the article. In this visualization, the x-axis represents time, and the y-axis indicates the top one hundred key terms. The "spikes"—what might be thought of as the z-axis—reflect the number of times that a key term appears in a given year. Thus, we can observe how the importance of these terms have waxed and waned over the eighty-five-year history of the journal.
In the FHQ, the top key term was "Indian," and terms related to Native Americans occurred frequently. The Seminole War looms larger in Florida's history than the Civil War. Political history is a historiographic concentration (reflected in terms such as "governor," "state," "county"). Most of the key terms referred to the eighteenth and nineteenth century, terms tied to Florida's twentieth-century history ("animation," "space," "Disney," "retirement"), do not appear in the top one hundred. "Black" appears with frequency after the mid-1960s, and indeed terms denoting race, are prominent key terms. "Woman" appears in frequency only after the mid-1990s. One of the current FHQ editors observed that via this method, I had visualized "the biography of the journal."
Figure 2: Frequency of the key term "woman" across twenty journals (1888–2013)
In another visualization (figure 2), I used the keyword search technique to determine the frequency of the term "woman" across a number of journals. I chose journals of relatively long life, and avoided specialized journals in women's or gender history. (The gap in the upper right of the visualization reflects that the twenty journals were founded at different times.) It is apparent that the late 1980s represents a historiographic takeoff point, in which the number of references to "woman" across all of the journals increases dramatically. I note that the Journal of American History has one of the highest overall frequencies of the term among the journals I surveyed. I wanted to draw comparisons between different journals, and to note whether one journal was more likely to have an interest in women as an historical subject than another. As I examine more journals and consider more key terms, I believe we will be able to visualize that the range of subjects that historians have been interested in has expanded since the 1960s (That pattern is clearly evident in figure 1).
Figure 3: Frequency of the key term "government" across twenty journals (1888–2013)
For comparison's sake, figure 3 is a visualization that examines the same twenty journals but tracks the key term "government." We find more frequent references to that term across all time periods and across all journals, although we can perceive a decline in frequency as we approach the present. To be clear, this method does not tell us how historians have talked about these subjects, but does give us some sense of what subjects historians have talked about and how that interest has changed over time.
It is tempting when applying big-data analytics to view digital history as a statistical, scientific, and analytic practice. Instead, I seek a move toward digital history as design, as the creation of form, and as interpretation derived from visual perception. To effectively and realistically "read" such large corpora as a historian means that we must rely on visualizations and see them as instruments of interpretive insight. Visualizations are not simply pretty pictures or illustrations in support of the real analysis carried out in words. The visualization--the visual pattern--constitutes the reading, and is itself the source of interpretive insight. In much the same way Chrétien Frédéric Guillaume Roth gave visual form to Denis Diderot's Encyclopedia by drawing a tree diagram, visualizations can make visible underlying historiographic forms. The visualization is the hermeneutic object.
I took the step of creating a model of the visualization seen in figure 3 using a 3D printer and mounted it at an exhibition at the Lumos Gallery in Columbus, Ohio. Indeed, I have similar plans for the visualizations of "woman" and "government" I described earlier, which I envision as part of a larger sculptural installation depicting the shifting configurations of sixteen subjects that have interested historians over the decades. I wish to situate these installations as comparable to other "historical sculptures," such as Maya Lin's Vietnam Veterans Memorial in Washington, D.C., and Women's Table at Yale University. I view historical sculpture both as a physical object that represents history and as a rhetorical act, as a way to perceive historical information in ways other than the traditional research paper, monograph, or other text-based performance that are standard in our craft. The literature scholar James Anderson Winn observed that the humanities "have identified themselves excessively with analytical processes based narrowly on language, thus disassociating themselves from performance in most of its guises," and that "our conception of the humanities remains largely confined to 'the pale of Words.'" (2) My visual performances are meant to be "objects to think with" that extend historiography beyond that pale. My vision for digital historiography is one based on an object-oriented hermeneutics.
David J. Staley is an associate professor of history and an adjunct associate professor of design at The Ohio State University, where he serves as director of the Harvey Goldberg Center. He is the author of Computers, Visualization, and History (second ed., 2013).
(1) I am aware, of course, that collections such as Google Books are not always as compete as we would like, which is why I leave the term in quotation marks.
(2) James Anderson Winn, The Pale of Words: Reflections on the Humanities and Performance (1998), 3, 74.