XML/TEI and Literary Study in the Digital Age

How important are XML encoding in general, and TEI encoding in particular, to literary study in the digital age? Here are just a few of the many digital humanities projects that use some version of XML, including TEI.

Understanding TEI markup

In this class, we’re not going to try to become experts in the use of TEI, but we do want to understand it well enough to appreciate how it works.

Structure of a basic TEI file

Here’s what a basic TEI file looks like. The bits in the code below that look like this —

    <!-- Some text or other here. -->

— are comments. They serve no functional purpose in the file and are only there to help explain the different parts of the document. You’ll find comments like these in many XML and HTML documents, by the way, since comments are a useful way for encoders to document, for themselves and others, what they’re attempting to accomplish with their markup.


    <!-- XML declaration and other information about the encoding. -->

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
    <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
        schematypens="http://purl.oclc.org/dsdl/schematron"?>

    <!-- The opening tag for root element, TEI -->

    <TEI xmlns="http://www.tei-c.org/ns/1.0">

    <!-- The teiHeader element, which contains elements with information about both the TEI file and the document being encoded in it. -->

      <teiHeader>
          <fileDesc>
             <titleStmt>
                <title>Title</title>
             </titleStmt>
             <publicationStmt>
                <p>Publication Information</p>
             </publicationStmt>
             <sourceDesc>
                <p>Information about the source</p>
             </sourceDesc>
          </fileDesc>
      </teiHeader>

      <!-- The text element, which contains the encoded document. -->

      <text>
          <body>
             <p>Some text here.</p>
          </body>
      </text>

    <!-- The closing tag of the root element, TEI -->

    </TEI>

You can find a copy of this file online. It’s named basic_tei_file.xml. To download it to your own computer, find the Raw button at the top right of the file window, right-click on it, and choose “Save Link As” from the dropdown menu. Open the saved file in VS Code.

Alternatively, you can navigate on the command line to where you’d like to save the file, then enter

    curl https://raw.githubusercontent.com/WhatTheDickens/engl340-s24/main/downloads/basic_tei_file.xml > basic_tei_file.xml

then open the file with

    code basic_tei_file.xml

If all else fails, you can open a new, empty file in VS Code, copy the contents from the GitHub file in your browser, paste it in, and save the file, naming it basic_tei_file.xml.

As you can see from the comments, a valid TEI file has the following main components:

  • The XML declaration and other information about the encoding
  • The root element TEI (enclosed between <TEI> and </TEI>), which contains
    • The TEI header (enclosed between <teiHeader> and </teiHeader>)
    • The TEI text (enclosed between <text> and </text>)

That’s it! In the sample file, you see other elements nested inside the TEI header and the TEI text, respectively (for example, <body> is nested within <text>). But basically, the file has a two-part structure — (1) declaration and (2) <TEI> — with <TEI> containing its own two sub-parts, <teiHeader> and <text>.

The TEI header

What kind of information goes in the TEI header? Mainly, information about the encoding project and the document being encoded. Take a look at the TEI file for the “Solitude” chapter of the fluid-text Walden that you downloaded earlier. Then look at some other examples from the teach-yourself-TEI website TEI by Example.

TEI text

What goes in TEI text? This depends entirely on the textual content to be encoded and the decisions that an encoder or team of encoders have made about which features of the text are worth encoding for the purpose of their project.

Again, you can get a feel for some of the different approaches one can take to encoding by looking at the TEI by Example website. There you’ll find examples of

Particularly relevant to our focus on fluid texts in ENGL 340 are the examples of critical editing, all of which encode some text that exists in more than one version. These examples use a special set of elements to mark up the variants between a “base” or reference version of the text and one or more “witnesses” or differences from the base.

  • The <app> element introduces a place of variance from the base.
  • The <lem> element contains the text as it appears in the base or reference version.
  • The <rdg> element indicates how the text reads in one or more of the witnesses.

Within the opening <rdg> tag, you’ll find the attribute wit, whose value (wit = "something") will be an identifier for one or more of the versions.

After you’ve looked over these examples, look at the following files to see how they use the same set of TEI elements.