XML/TEI and Literary Study in the Digital Age
How important are XML encoding in general, and TEI encoding in particular, to literary study in the digital age? Here are just a few of the many digital humanities projects that use some version of XML, including TEI.
- The William Blake Archive
- Rossetti Archive
- The Walt Whitman Archive
- The Mark Twain Project
- Shelley-Godwin Archive
- Emily Dickinson’s Correspondences
- The Frankenstein Variorum
- The Melville Electronic Library
- Digital Mitford
- The Seward Family Digital Archive
Understanding TEI markup
In this class, we’re not going to try to become experts in the use of TEI, but we do want to understand it well enough to appreciate how it works.
Structure of a basic TEI file
Here’s what a basic TEI file looks like. The bits in the code below that look like this —
<!-- Some text or other here. -->
— are comments. They serve no functional purpose in the file and are only there to help explain the different parts of the document. You’ll find comments like these in many XML and HTML documents, by the way, since comments are a useful way for encoders to document, for themselves and others, what they’re attempting to accomplish with their markup.
<!-- XML declaration and other information about the encoding. -->
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<!-- The opening tag for root element, TEI -->
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<!-- The teiHeader element, which contains elements with information about both the TEI file and the document being encoded in it. -->
<teiHeader>
<fileDesc>
<titleStmt>
<title>Title</title>
</titleStmt>
<publicationStmt>
<p>Publication Information</p>
</publicationStmt>
<sourceDesc>
<p>Information about the source</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<!-- The text element, which contains the encoded document. -->
<text>
<body>
<p>Some text here.</p>
</body>
</text>
<!-- The closing tag of the root element, TEI -->
</TEI>
You can find a copy of this file online. It’s named basic_tei_file.xml
. To download it to your own computer, find the Raw button at the top right of the file window, right-click on it, and choose “Save Link As” from the dropdown menu. Open the saved file in VS Code.
Alternatively, you can navigate on the command line to where you’d like to save the file, then enter
curl https://raw.githubusercontent.com/WhatTheDickens/engl340-s24/main/downloads/basic_tei_file.xml > basic_tei_file.xml
then open the file with
code basic_tei_file.xml
If all else fails, you can open a new, empty file in VS Code, copy the contents from the GitHub file in your browser, paste it in, and save the file, naming it basic_tei_file.xml
.
As you can see from the comments, a valid TEI file has the following main components:
- The XML declaration and other information about the encoding
- The root element TEI (enclosed between
<TEI>
and</TEI>
), which contains- The TEI header (enclosed between
<teiHeader>
and</teiHeader>
) - The TEI text (enclosed between
<text>
and</text>
)
- The TEI header (enclosed between
That’s it! In the sample file, you see other elements nested inside the TEI header and the TEI text, respectively (for example, <body>
is nested within <text>
). But basically, the file has a two-part structure — (1) declaration and (2) <TEI>
— with <TEI>
containing its own two sub-parts, <teiHeader>
and <text>
.
The TEI header
What kind of information goes in the TEI header? Mainly, information about the encoding project and the document being encoded. Take a look at the TEI file for the “Solitude” chapter of the fluid-text Walden that you downloaded earlier. Then look at some other examples from the teach-yourself-TEI website TEI by Example.
TEI text
What goes in TEI text? This depends entirely on the textual content to be encoded and the decisions that an encoder or team of encoders have made about which features of the text are worth encoding for the purpose of their project.
Again, you can get a feel for some of the different approaches one can take to encoding by looking at the TEI by Example website. There you’ll find examples of
Particularly relevant to our focus on fluid texts in ENGL 340 are the examples of critical editing, all of which encode some text that exists in more than one version. These examples use a special set of elements to mark up the variants between a “base” or reference version of the text and one or more “witnesses” or differences from the base.
- The
<app>
element introduces a place of variance from the base. - The
<lem>
element contains the text as it appears in the base or reference version. - The
<rdg>
element indicates how the text reads in one or more of the witnesses.
Within the opening <rdg>
tag, you’ll find the attribute wit
, whose value (wit = "something"
) will be an identifier for one or more of the versions.
After you’ve looked over these examples, look at the following files to see how they use the same set of TEI elements.
- Any file encoding a chapter of the fluid-text Walden.
- The file created by students in ENGL 340 in Spring 2015 encoding the variants of the Gettysburg Address. The same software that reads TEI to produce the fluid-text edition of Walden, called The Versioning Machine, can be used to read the students’ TEI file and produce this display of the differences between the Gettysburg Address variants.
- The file encoding the variants of Emily Dickinson’s poem “Faith is a Fine Invention” that’s used to produce this display of the variants on the website of The Versioning Machine.