Modeling Revision

In light of Thoreau’s repeated and extensive re-workings of his initial conception, some scholars have argued that Walden’s meaning can’t be found by reading the published words alone but only through a careful examination of its progressive evolution in relation to the biographical evolution that accompanied it. Walden, on this view, is as much the record of that evolution as its triumphant end result. In the words of Robert Sattelmeyer, “Walden is, in this respect, an archetypal Romantic text, like [Walt Whitman’s] Leaves of Grass, that developed as its author developed and that preserves experience while continually re-interpreting it” (“The Remaking of Walden,” 75).

Perhaps we might say the same of any text that’s undergone significant revision, especially over an extended period of time. It’s hard to imagine that such a text would not reflect the author’s own ambivalence, vacillation, and shifts of vision and purpose against the background of an ongoing life. As we saw earlier, such is in fact the premise of both genetic scholarly editing and fluid-text editing. What makes Walden different from “any” heavily revised text isn’t that it evolves but rather, first, that we have so complete a record of its evolution and, second, that readers care about it deeply enough to want to understand how it became what it is.

Together, the record and the care brought Ronald E. Clapper, a PhD candidate in English Literature at UCLA, to the Huntington Library in the 1960s to study HM 924, the manuscript of Walden. In the years since, scholars like Sattelmeyer who’ve argued for an evolutionary interpretation of Walden have routinely relied on Clapper’s 1967 dissertation, “The Development of Walden: A Genetic Text.”

For his genetic edition of Walden, Clapper devised a set of symbols to indicate places where Thoreau made changes on a manuscript page and to distinguish the way a string of text read in one version as opposed to another. Some years later, he re-typed the dissertation on a computer and introduced additional symbols — for example, to identify text that Thoreau inserted in pencil as opposed to ink. We can recognize Clapper’s symbols as a kind of homegrown markup language. Here’s an example of his code:

Economy 2a  written: A; rewritten: C,C

I should not obtrude my affairs so much on the notice of my readers6 if very particular7 inquiries had not been made by my townsmen8 concerning my mode of life, which9 some would call impertinent, though they do not appear to me at all impertinent, but, considering the circumstances, very natural and pertinent10.


6obtrude my affairs so much on the notice of my readers [A:] presume to talk so much about myself and my affairs as I shall in this <lecture> <↑book↓> <↑work↓>book [C2:] obtrude <myself and> my affairs so much on the notice of my readers

7particular [A:] particular and personal

8by my townsmen [A: not in ms.] [C2: interlined in pencil]

9which [A:] what [C2:] <what> ↑p which↓

10though they do not appear to me at all impertinent, but, considering the circumstances, very natural and pertinent [A:] but they are by no means impertinent to me, but, on the contrary very natural and pertinent, considering the circumstances [C2:] though they do not appear ↑p to me↓ at all impertinent <to me>, but <on the contrary> ↑considering the circumstances↓ very natural and pertinent <, considering the circumstances>

Clapper is here representing changes to a portion of the second paragraph of “Economy,” the first chapter of Walden. He labels this portion “2a” to distinguish it from two other portions (“2b” and “2c”) that have separate revision histories. Above the text, he’s indicated that paragraph 2a was “written” (that is, first added to the manuscript) in the first, or “A” draft, then “rewritten” twice in the third or “C” draft.

Above the line, we read the words of Walden as they were published in 1854. At each point where there are differences between the published version and the text as it reads in the A draft, the C draft, or both, he places a number that corresponds to a footnote below the line. In a footnote, the string in question is repeated in italics, the symbol “[A:]” is used to introduce the wording as it appears in Version A, and the symbol “[C2:]” is used to introduce the wording as it appears in the second writing of the passage within Version C. Text that Thoreau deleted appears within angle brackets (< >), text that he inserted appears between arrows (↑ ↓), and inserted text written in pencil is preceded by an italic p.

Taking one of the simpler examples, Clapper’s markup tells us that the word which annotated in footnote 9 was what in Version A and again in Version C, but that in C, Thoreau deleted what and replaced it, in pencil, with which.

We can see the change Clapper has marked up by comparing these two manuscript images, the first from Version A, the second from Version C. In the image from A, find the word what in the upper left corner. In the image from C, find what on the first line, at about the midpoint of the image, crossed out in pencil, with which written just above it and to the left of it.

Walden MS image

Walden MS image

Here’s what we see if we look at a slightly larger section of each manuscript page, first A, then C, showing (in A) the whole passage in question and (in C) the portion of the passage beginning with the words the notice of my readers:

Walden MS image

Walden MS image

Notice that there are things — that is, potential data points — on each page that Clapper’s markup doesn’t describe, such as the numbers written in the left margin, the penciled caret (‸) after the word but in C (three lines from the bottom), pointing to the inserted words considering the circumstances, the shape and direction of the lines that Thoreau uses to cross out words (compare the deletions of to me and considering the circumstances in C), or the fact that in C, the initial words of the passage are written at the bottom of the previous manuscript page:

Walden MS image

Clapper hasn’t overlooked this potential data; he captures some of it in additional explanatory /footnotes outside his markup, and some of it he simply chooses to pass over in silence. Remember what we said earlier: There isn’t a single right way to model a manuscript or any other textual object, and there’s probably no way to model everything about any object worth modeling. One of the most critical tasks a scholarly editors faces is choosing what to model. The choice must be driven by the purpose of the edition. Clapper’s purpose was to model Thoreau’s revision process, not to model the physical appearance of the manuscript. For his purpose, he decided, it was enough, for the most part, to indicate where in the text-flow words were inserted and deleted; it wasn’t necessary to indicate just what kinds of marks Thoreau used to carry out these changes.

Clapper’s selective, homegrown markup makes it possible for a reader to gain some insight into how Walden evolved across Thoreau’s multiple drafts. It takes no special training to understand this markup, just a bit of patience. That’s in part because Clapper explains his symbols at the outset of his dissertation and uses them consistently, in part because most readers already understand how footnotes work as an informal markup system (even if they’ve never thought of footnotes as such a system).

What Clapper’s markup doesn’t do — even as re-typed using word-processing software — is to model the manuscript data in a way that other computer applications can do something useful with, such as producing visualizations or mining the data to find meaningful patterns in it. Clapper’s markup isn’t what we would call machine-readable — that is, capable of being understood and processed by software.

To build a model of Thoreau’s revisions in a way that enables software to do useful things with what we build, we need to use a machine-readable markup language such as XML.