GUIDELINES FOR SGML MARKUP FOR FULL TEXT

I. Markup Used/Required by Textclass/XPAT:

The structure below lays out the basic framework of a document marked up for the FCLA FullText Collections. See the fcla.textclass.dtd for the full list of elements and specific markup rules. Markup may be as simple or as detailed as you wish, as long as all required fields indicated below are included. Please note that although indentation with TABs is used for readability below, they are not needed for FullText markup.

REQUIRED tags are marked with an asterisk. NOTES are enclosed in curly brackets {}

*<DLPSTEXTCLASS>
*<HEADER>
*<FILEDESC>
* <TITLESTMT>
* <TITLE TYPE="245">
* </TITLE>
   < AUTHOR>
   < /AUTHOR>
* </TITLESTMT>
* <PUBLICATIONSTMT>
       {This tagging should be used for information on the electronic publication. See SOURCEDESC for publication information for the original print version.}

* <PUBLISHER> </PUBLISHER>
< PUBPLACE></PUBPLACE>
< DATE></DATE>
< AUTHORITY>
       {This element is used for the version statment that is currently displaying in the DL for texts loaded there (Electronic version created [year], State University System of Florida.)}

</AUTHORITY>
* <IDNO TYPE="dlps"> entity ID </IDNO>
       {IDNO type must be "dlps". The ID should be the entity ID used by the digital library (FHP, JUV, etc.) if it is included there.}

<COPYRIGHT>
       {This field may be used for display of any copyright information for the electronic version.}

</COPYRIGHT>
< AVAILABILITY>
       {This field is optional, and can be used for any restrictions or availability information that you wish to display.}

</AVAILABILITY>
* </PUBLICATIONSTMT>
* <SOURCEDESC>
* <BIBLFULL>
       {This element should be used for the original publication in print format. We use BIBLFULL rather than "BIBL", since we have made local changes to the TEILite dtd to allow for inclusion of additional bibliographic information. This bibliographic information is used in the "full citation" display.}

* <TITLE TYPE="main"> </TITLE>
< EDITIONSTMT>
< EDITION> </EDITION>
< EXTENT></EXTENT>
< PUBLICATIONSTMT>
< PUBLISHER></PUBLISHER>
< PUBPLACE></PUBPLACE>
< DATE></DATE>
< /PUBLICATIONSTMT>
< RESPSTMT>
< RESP> </RESP>
< NAME> </NAME>
       {The RESPSTMT element will be displayed every time the brief citation appears. So any credit that needs to be always shown should be put here. Ex:
       < RESP>Digitized from photograph at the</RESP><NAME>Historical Museum of Southern Florida, Miami, Florida</NAME>

</RESPSTMT>
* </BIBLFULL>
* </SOURCEDESC>
*</FILEDESC>
*<ENCODINGDESC>
*<EDITORIALDECL N="4">
       {The value of "4" for the TEI encoding level is required for full text display. Encoding level of "2" is used for items that do not have full text to display; i.e., dirty ASCII or page viewer application.}

*</EDITORIALDECL>
*</ENCODINGDESC>
*</HEADER>
*<TEXT>
< PB REF="[page reference]"> <P>[text]</P></DIV1>
< DIV1 PDF="[pdf reference, if applicable]}> <HEAD> </HEAD>
       {Text continues with <DIVn>s and <P>s and page references until the end of the body is reached. Then file closes with:}

*</DIVn>
*</BODY>
*</TEXT>
*</DLPSTEXTCLASS>

II. File naming conventions:

  1. The collection code used will be project code already established for PALMM. It becomes the full text directory name:
    1. FGS (Florida Geological Survey Publications)
    2. FHP (Florida Heritage Collection)
    3. JUV (Literature for Children)
    4. LAW (Florida Historical Legal Documents)
    5. FLNP (Florida Newspapers)
    6. PSA (Psychoanalytic Study of Art)

  2. Text IDs within collections should be the same as the entity IDs used for titles in FHP or other PALMM collection, or formed by:
    1. two character institution or project code; use upper case alphas
    2. 8 character unique number (i.e., not used for any other PALMM item)
      ex.: FA01234567

III. General rules:

  1. The major sections/chapters of an item should be marked up as DIV1 level. Subchapters would then be at the DIV2 level. And smaller sections would be at the DIV3 level and so on, until the item structure is represented. See how the DIVs are used in the sample marked up text.
  2. Use fcla.textclass.dtd for the SGML markup.
  3. Coding for special characters (foreign or punctuation such as "&") is defined in the file charents.frag. This is a selection of the most heavily used special characters, from a group of ISO standards. It can be supplemented as the need arises. Let FCLA know if you need a character that is not defined in that file. Do NOT use unicode hex equivalents.
  4. When text in adjacent tags is to be concatenated for display, be sure to include a blank space somewhere between the two tags. For example, the line below is concatenated for display. Note the space between "quarterly" and <NUM>.
  5. <TITLE TYPE="245">The Florida historical quarterly <NUM>volume 1 issue 1</NUM></TITLE>
    If the space were not present, the display would be: The Florida historical quarterlyvolume 1 issue 1

  6. Eliminate unnecessary white space - i.e., no tabs, extra line breaks, indentation, etc.
  7. Do not use external SGML entity file references (because of indexing and directory problems)
  8. The normalization programs don't allow use of HI, DIV (perhaps others) without number/level; i.e., must use DIV1, DIV2, etc.
  9. The normalization programs don't allow use of <FIGDESC>, most likely because of indexing; should use <p> to describe figures unless an adjacent paragraph has the same information.
  10. When working with a multi-volume piece, each volume and/or issue must be a separate "text", with a unique file name.

IV. Contents list at beginning of text:

The list is generated from the text associated with every <DIV1....><HEAD> head text here</HEAD>.
       If there is no <HEAD>, the DIV type attribute (type=) is displayed. Additional DIV levels (div2 and div3) may also be used for the table of contents.

V. Figure image references:

For figures that you wish to display with the text (automatically pulled into the display, not a link), insert a figure reference at the appropriate place in the text, in this format:

<FIGURE ID="[entity id]/[image filename]"></FIGURE>

Ex. <FIGURE ID="UF00003032/saints.gif"></FIGURE>

VI. Page references:

To display page images, the references for the pages must be within <PB> tags. This produces the "View page image" display with the page number, when one is looking at the text . The path for the page image may be done 2 ways, depending on the location of the page images.

  1. For document pages already loaded into FHP (or other collection in the DL), the format for the file reference is:
  2. <PB REF="/DLData/[2 character IG code]/ [entity ID]/[filename.jpg] SEQ=[page sequence in text] FMT="[format]" N="[page number printed on page]">
    (No closing tag required)(IG stands for the two character institution group code: UF, SF, NF, etc.)

           Ex.: <PB REF="/DLData/uf/UF00000282/22.jpg SEQ="30" FMT="JPEG" N="22">

    (FCLA can provide the correct directory structure to you, if needed.)

  3. Pages not already available through the DL should be stored/ftp'ed in the same directory as the sgm file, even though their final destination will be another folder. The format for this type of reference is:
  4. <PB REF=/[first letter of collection code]/[collection code]/pages/[entity id]/[filename.jpg] SEQ=[page sequence in text] FMT="[format]" N="[page number printed on page]">
    (No closing tag required)

           Ex.: <PB REF="/p/psa/pages/UF00000232/10.jpg" SEQ="15" FMT="JPEG" N="10">

NOTE: The page reference for the first page of a DIV should come immediately before the mark up for the DIV.

VII. PDF References

If you wish to link to a pdf for chapters or sections, the reference for the file is included within the DIV1 tag for the section. The format for PDFs already available through the DL is:

<DIV[n] [optional title or type attributes here] PDF="/DLData/[ig code]/[entity id]/[filename].pdf">
([n] stands for the division level)

       Ex. <DIV1 TITLE="Chapter 1" PDF="/DLData/UF/UF00000232/file1.pdf">

For PDFs not already stored on DLData, the format is:

<DIV[n] [optional title or type attributes here] PDF="/[first letter of collection code]/[collection code]/pdfs/[entity id]/[filename].pdf">

       Ex. <DIV1 PDF="/p/psa/pdfs/UF00000232/file3.pdf">

VIII. Example Documents and Template for Markup

For an example of a simple document, see: UF00001616.sgm

For an example of a complex document, see: ftl1823.sgm

Template for marking up text