|
I. Markup Used/Required by Textclass/XPAT:
The structure below lays out the basic framework of a
document marked up for the FCLA FullText Collections. See
the fcla.textclass.dtd for
the full list of elements and specific markup rules. Markup
may be as simple or as detailed as you wish, as long as
all required fields indicated below are included. Please
note that although indentation with TABs is used for readability
below, they are not needed for FullText markup.
REQUIRED tags are marked with an asterisk. NOTES
are enclosed in curly brackets {}
*<DLPSTEXTCLASS>
*<HEADER>
*<FILEDESC>
* <TITLESTMT>
* <TITLE TYPE="245">
* </TITLE>
<
AUTHOR>
<
/AUTHOR>
* </TITLESTMT>
* <PUBLICATIONSTMT>
{This tagging should be used for information on the electronic
publication. See SOURCEDESC for publication information for
the original print version.}
* <PUBLISHER> </PUBLISHER>
<
PUBPLACE></PUBPLACE>
<
DATE></DATE>
<
AUTHORITY>
{This element is used for the version statment that is
currently displaying in the DL for texts loaded there (Electronic
version created [year], State University System of Florida.)}
</AUTHORITY>
* <IDNO TYPE="dlps"> entity ID </IDNO>
{IDNO type must be "dlps". The ID should be the
entity ID used by the digital library (FHP, JUV, etc.)
if it is included there.}
<COPYRIGHT>
{This field may be used for display of any copyright
information for the electronic version.}
</COPYRIGHT>
<
AVAILABILITY>
{This field is optional, and can be used for any restrictions
or availability information that you wish to display.}
</AVAILABILITY>
* </PUBLICATIONSTMT>
* <SOURCEDESC>
* <BIBLFULL>
{This element should be used for the original publication
in print format. We use BIBLFULL rather than "BIBL",
since we have made local changes to the TEILite dtd to
allow for inclusion of additional bibliographic information.
This bibliographic information is used in the "full citation" display.}
* <TITLE TYPE="main"> </TITLE>
<
EDITIONSTMT>
<
EDITION> </EDITION>
<
EXTENT></EXTENT>
<
PUBLICATIONSTMT>
<
PUBLISHER></PUBLISHER>
<
PUBPLACE></PUBPLACE>
<
DATE></DATE>
<
/PUBLICATIONSTMT>
<
RESPSTMT>
<
RESP> </RESP>
<
NAME> </NAME>
{The RESPSTMT element will be displayed every time the
brief citation appears. So any credit that needs to be
always shown should be put here. Ex:
<
RESP>Digitized from photograph at the</RESP><NAME>Historical
Museum of Southern Florida, Miami, Florida</NAME>
</RESPSTMT>
* </BIBLFULL>
* </SOURCEDESC>
*</FILEDESC>
*<ENCODINGDESC>
*<EDITORIALDECL N="4">
{The value of "4" for the TEI encoding level is required
for full text display. Encoding level of "2" is used
for items that do not have full text to display; i.e.,
dirty ASCII or page viewer application.}
*</EDITORIALDECL>
*</ENCODINGDESC>
*</HEADER>
*<TEXT>
<
PB REF="[page reference]"> <P>[text]</P></DIV1>
<
DIV1 PDF="[pdf reference, if applicable]}> <HEAD> </HEAD>
{Text continues with <DIVn>s and <P>s and
page references until the end of the body is reached.
Then file closes with:}
*</DIVn>
*</BODY>
*</TEXT>
*</DLPSTEXTCLASS>
II. File naming conventions:
- The collection code used will be project code already
established for PALMM. It becomes the full text directory
name:
- FGS (Florida Geological Survey Publications)
- FHP (Florida Heritage Collection)
- JUV (Literature for Children)
- LAW (Florida Historical Legal Documents)
- FLNP (Florida Newspapers)
- PSA (Psychoanalytic Study of Art)
- Text IDs within collections should be the same as the
entity IDs used for titles in FHP or other PALMM collection,
or formed by:
- two character institution or project code; use upper
case alphas
- 8 character unique number (i.e., not used for any
other PALMM item)
ex.: FA01234567
III. General rules:
- The major sections/chapters of an item should
be marked up as DIV1 level. Subchapters would then be
at the DIV2 level. And smaller sections would be at the
DIV3 level and so on, until the item structure is represented.
See how the DIVs are used in the sample marked up text.
- Use fcla.textclass.dtd for
the SGML markup.
- Coding for special characters (foreign or punctuation
such as "&") is defined in the file charents.frag.
This is a selection of the most heavily used special
characters, from a group of ISO standards. It can be
supplemented as the need arises. Let FCLA know if you
need a character that is not defined in that file. Do
NOT use unicode hex equivalents.
- When text in adjacent tags is to be concatenated for
display, be sure to include a blank space somewhere between
the two tags. For example, the line below is concatenated
for display. Note the space between "quarterly" and <NUM>.
<TITLE TYPE="245">The Florida historical quarterly <NUM>volume
1 issue 1</NUM></TITLE>
If the space were not present, the display would be:
The Florida historical quarterlyvolume 1 issue 1
- Eliminate unnecessary white space - i.e., no tabs,
extra line breaks, indentation, etc.
- Do not use external SGML entity file references (because
of indexing and directory problems)
- The normalization programs don't allow use of HI, DIV
(perhaps others) without number/level; i.e., must use
DIV1, DIV2, etc.
- The normalization programs don't allow use of <FIGDESC>,
most likely because of indexing; should use <p> to
describe figures unless an adjacent paragraph has the
same information.
- When working with a multi-volume piece, each volume
and/or issue must be a separate "text", with a unique
file name.
IV. Contents list at beginning of text:
The list is generated from the text associated with every <DIV1....><HEAD> head
text here</HEAD>.
If there is no <HEAD>, the DIV type attribute (type=)
is displayed. Additional DIV levels (div2 and div3) may
also be used for the table of contents.
V. Figure image references:
For figures that you wish to display with the text (automatically
pulled into the display, not a link), insert a figure reference
at the appropriate place in the text, in this format:
<FIGURE ID="[entity id]/[image filename]"></FIGURE>
Ex. <FIGURE ID="UF00003032/saints.gif"></FIGURE>
VI. Page references:
To display page images, the references for the pages must
be within <PB> tags. This produces the "View page
image" display with the page number, when one is looking
at the text . The path for the page image may be done 2
ways, depending on the location of the page images.
- For document pages already loaded into FHP (or other
collection in the DL), the format for the file reference
is:
<PB REF="/DLData/[2 character IG code]/ [entity ID]/[filename.jpg]
SEQ=[page sequence in text] FMT="[format]" N="[page number
printed on page]">
(No closing tag required)(IG stands for the two character
institution group code: UF, SF, NF, etc.)
Ex.: <PB REF="/DLData/uf/UF00000282/22.jpg
SEQ="30" FMT="JPEG" N="22">
(FCLA can provide the correct directory structure to
you, if needed.)
- Pages not already available through the DL should be
stored/ftp'ed in the same directory as the sgm file,
even though their final destination will be another folder.
The format for this type of reference is:
<PB REF=/[first letter of collection code]/[collection
code]/pages/[entity id]/[filename.jpg] SEQ=[page sequence
in text] FMT="[format]" N="[page number printed on page]">
(No closing tag required)
Ex.: <PB REF="/p/psa/pages/UF00000232/10.jpg" SEQ="15" FMT="JPEG" N="10">
NOTE: The page reference for the first page of a DIV
should come immediately before the mark up for the
DIV.
VII. PDF References
If you wish to link to a pdf for chapters or sections,
the reference for the file is included within the DIV1
tag for the section. The format for PDFs already available
through the DL is:
<DIV[n] [optional title or type attributes here]
PDF="/DLData/[ig code]/[entity id]/[filename].pdf">
([n] stands for the division level)
Ex. <DIV1 TITLE="Chapter
1" PDF="/DLData/UF/UF00000232/file1.pdf">
For PDFs not already stored on DLData, the format
is:
<DIV[n] [optional title or type attributes here] PDF="/[first
letter of collection code]/[collection code]/pdfs/[entity
id]/[filename].pdf">
Ex. <DIV1 PDF="/p/psa/pdfs/UF00000232/file3.pdf">
VIII. Example Documents and Template for Markup
For an example of a simple document, see: UF00001616.sgm
For an example of a complex document, see: ftl1823.sgm
Template for marking up
text
|