Table of Contents

Annotation

XML-markup consists of two kinds of annotation. Structural markup (Markup I) contains elements such as paragraphs, headings, and tables. Project-specific markup (Markup II) contains elements relevant to network analyses, e.g. referenced grammarians.
The XML structure is defined by a documented RELAX NG schema that is provided alongside the corpus files.

Annotation Philosophy

The idea behind the annotation of this corpus is that we only preserve relevant information instead of trying to copy the grammar book's layout as closely as possible. We, for instance, mark italicised elements if they have a function in the grammar, e.g. emphasis or highlighting. Lists, tables and tree diagrams are rebuilt as structural elements, preserving the logic behind them, and not necessarily their layout.

In the following the most important and most frequent elements are documented and exemplified. A list of all available tags and their attributes is provided here.

Markup I: Structural Annotation

Inline Elements

"bold", "italic", "underline" - Font Weight

<bold>Text</bold>
<italic>Text</italic>
<underline>Text</underline>

"footnote" - Footnote

There is a footnote* here.

*This is an addition.
There is a footnote <footnote indicator="Asterisk">This is an addition.</footnote> here.

Footnotes are inserted where the footnote indicator (e.g. Asterisk, Dagger, etc.) occurs. If footnotes continue beyond pagebreaks, the pagebreak is omitted within the footnote because it exists within the main text and should not be doubled.

Tabular Elements

"list" - Tree Diagrams

Lists can either be “simple”, “bulleted”, or “numbered”. If the list is “numbered”, the label element is required.

<list rend="numbered">
    <head>Parts of Speech</head>
    <label>1</label><item>A Noun is a Name.</item>
    <label>2</label><item>A Verb is a Telling Word.</item>
    <label>3</label><item>An Adjective is a Noun-Marking Word.</item>
    <label>4</label><item>An Adverb is a Modifying Word.</item>
    <label>5</label><item>A Preposition is a Noun-Connecting Word.</item>
    <label>6</label><item>A Conjunction is a Sentence-Connecting Word.</item>
    <label>7</label><item>A Pronoun is a For-Name.</item>
</list>

"table" - Table

 <table cols="2" rows="2">
  <row>
   <cell><small_caps>Adjectives : Nouns</small_caps></cell>
   <cell><small_caps>Adverbs : Verbs</small_caps></cell>
  </row>
  <row>
   <cell><small_caps>Prepositions : Nouns</small_caps></cell>
   <cell><small_caps>Conjunctions : Verbs</small_caps></cell>
  </row>
</table> 

"toc" - Table of Contents

A table of contents consists of a number of entries, each of which has a level, indicating its hierarchy, a section name, and an optional page number.

<paragraph>
    <toc>
        <entry level="1">
            <section_name>Introduction</section_name>
            <page_no>5</page_no>
        </entry>
        <entry level="1">
            <section_name>Etymology</section_name>
            <page_no>10</page_no>
        </entry>
            <entry level="2">
                <section_name>Nouns</section_name>
                <page_no>12</page_no>
            </entry>
        <entry level="1">
            <section_name>Syntax</section_name>
            <page_no>20</page_no>
        </entry>
    </toc>
</paragraph>

"tree" - Tree Diagrams

The arity indicated the maximum number of children a node can have. The child nodes can be either ordered, partially orderd or unorderd.

<tree arity="2" ord="false" order="2">
    <root ref_node="R" children="#C1 #C2">
        <label>Names of Nouns</label>
    </root>
    <leaf ref_node="C1" parent="#R">
        <label>Names of Things</label>
    </leaf>
    <leaf ref_node="C2" parent="#R">
        <label>Names of Notions</label>
    </leaf>
</tree>

Miscellaneous

"paragraph" - Paragraph

A paragraph is a self-contained structural unit which denotes one coherent line of thought or idea.

<paragraph>ALEXANDER IRELAND &amp; CO., Pall Mall Court, Manchester, propose to issue, at intervals, a SERIES OF SCHOOL BOOKS, Under the above title.</paragraph>

"heading" and "heading_undefined" - Headings

There are two types of heading elements - hierarchical ones and undefined ones. The hierarchical depth (level) is theoretically unlimited.

<heading level="1">PART I. - OF WORDS.</heading>
<heading level="2">Chapter I. Nouns.</heading>
 
<heading_undefined>An Easy English Grammar for Beginners; being a Plain Doctrine of Words and Sentences.</heading_undefined>

"pagebreak" - Pagebreaks

The pagebreak element denotes the break of the physical page in the original document. The “page_no” attribute contains the number of the new page as given in the original text.

<pagebreak page_no="12" />

"ed_note" - Editor's Note

Attributes: type = addition | correction | omission | note

An editor's note provides additional information given by the editor. It can be either a note, an addition, an omission, or a correction (e.g. typos).

For corrections, the uncorrected form of a word is noted within the element, so that the correct form is counted.

if they have loved <ed_note type="correction">love</ed_note>
<!-- In this case, the wrong spelling "love" in the original is corrected to "loved". -->

"graphic" - Graphic Elements

Complex graphic elements that cannot be fully represented in XML are supplied as digital images. The “desc” element contains a written description of the graphic element and serves as a backup for systems that are not capable of displaying digital images.

<graphic image_id="1" url="images/0_1.jpg" desc="Exercise 36. Prepositions with Sheperds" filetype="jpg"/> 


Markup II: Project Specific Annotation

Evaluative and Normative Utterances

Evaluative and normative statements by the author on other authors, grammars, society, etc. Single words, phrases, or sentences.

<judgement tendency="positive" type="praise" addressee_explicit="Kigan, John" addresse_implicit="Kigan, John">Appraisal</judgement>

References and Quotations

"quotation" - Quotations

The attribute “source_added” marks if “title” or “author” have been added by the editors.

<quotation author="Surname, First name" title="Title" source_added="0">A quotation</quotation>

"reference" - References

Indicates a referenced author. The element encapsulates the author's name and title.

<reference referenced="Referenced Author" referencing="Referencing Author" type="dedication" judgemental="0" source="Source (Year)">Dedication</reference>