Working Draft

Developed at RFB R&D
P.O. Box 7068
Missoula MT 59807
Voice: 406/728-7201
FAX: 406/728-6331
InterNet: cbfb_gwk@selway.umt.edu
Version 1.2

September 27, 1992


Publishing system conversion to support braille, large print and
electronic access for persons with print disabilities.

(NOTE: If your publishing system uses SGML or other similar generic
markup, this document is not for you.  Please contact the author
for instructions.  This paper explains how most other systems can
prepare files which can easily be used for delivery to the print
disabled.)

INTRODUCTION

The structural and content information about a book that may be
specified in the Standard Generalized Markup Language (SGML, the
ISO standard for document interchange) is ideal for braille, large
print and electronic books.  However, it is recognized that very
few in-house production publishing systems currently use SGML. 
Accordingly, it may be difficult and costly for some publishers to
provide computer files that comply with the recommended SGML
specification (formally a "document type definition," or DTD) being
established by the International Committee for Accessible Document
Design (ICADD).

For more immediate implementation, therefore, we have designed a
simple alternative: a set of tags and mappings into the standard
ASCII character set that will be extremely useful in making
documents accessible to persons with print disabilities.  ASCII
(0-127) files are acceptable if they use the tags listed below to
identify structural components.  This list provides
machine-readable encodings that can be translated automatically
into both the hierarchical "table of contents" structure useful for
electronic delivery and the minimal formatting needed to produce
meaningful Braille and large print output.
 
For those not familiar with SGML, it might be easier to think of
tags as being equivalent to a style sheet.  Style sheets ensure
consistency in the publishing process and are used by most word
processor or publishing systems.
 
In most circumstances, it is expected that users will work with a
set of style sheets whose names are identical with the tag set
listed below. WordPerfect and Microsoft Word files that have
equivalent information can be used with the tag set.  For example,
bold text can be represented either with a style named "bold" or
with the WordPerfect or Microsoft Word indications for bold. 
Foreign language characters and diacritical marks also can be used
in the proprietary WordPerfect/Word codes or using equivalent Post
Script characters or SGML character entities specified at the end
of this document.  The author's name, however, must be set in a
style called "au".  It may be formatted to the screen or printed
page in any way appropriate to the book.  For the ASCII electronic
copy, all that we need is the author's name enclosed in tags: <AU>
at the start, and </AU> after the name.
 
The remainder of this paper explains how a style sheet would be
associated with the tags.  Then a table of the most essential tags
is presented.  We have also listed some styles that probably will
need to be addressed in the future for more complex publications. 
Finally, a list of mappings from common typesetting characters into
ASCII is presented.

The following scenario describes the use of the most complex
construct present in the minimum tag set.

     A style sheet for LISTS would be created.  The style indicates
     that the heading of the list would be in a particular size and
     font.  Each item in the list may be noted with a selected
     dingbat.  If there are terms and associated definitions in the
     list the term or key word would be indicated by use of
     emphasis.  Finally the style sheet would end at the end of the
     list. 

     The equivalent information is needed by a print disabled
     reader.  The information can be conveyed by using the tags
     listed.  The <list> tag indicates the beginning of a list. 
     The <lhead> tag indicates the heading of the list.  <litem> is
     the list item that equates to the bullet.  The <term> tag
     specifies the key word associated with a definition.  The
     </..> tags indicate the end of the fields including the
     </list> tag which indicates the end of the list style sheet.


MINIMUM LIST OF TAGS

To identify a file that complies with the specifications described
herein, please include as the first line of text the following
statement.

<!DOCTYPE BOOK PUBLIC "DTP2PDD">

(O/R indicates if the tag is required or optional. Thes tagse are
case insensitive.  The tag may appear in either upper or lower
case.)

Begin Tag End Tag   O/R  Description

1. <anchor id=>     O    "MARK SPOT ON A PAGE"  The id assigned
                         should be a unique named location.  When
                         used in conjunction with <xref>, cross
                         references, the pair serves to create
                         links to spots in text.  These can be
                         used to mark index items, table or figure
                         citations, key points to be referenced
                         later, etc.

2. <au>   </au>     R    "AUTHOR"  Indicates the author(s).

3. <b>    </b>      R    "BOLD"  Emphasized text in bold face
                         print.   Normally headings are emphasized
                         automatically and do not need to be
                         indicated with the bold tag.

4. <box>  </box>    R    "BOXED INFORMATION"  Material set off
                         from the main flow of text.  Sidebars,
                         Historical Notes and other text that can
                         be placed at a variety of locations on
                         the page.  (Not to be confused with
                         <note> information where position in
                         relation to text is critical.)

5. <bq>   </bq>     R    "BLOCK QUOTE"  Indented quoted material
                         not to be confused with information that
                         is an in line quotation.

6. <fig>  </fig>    R    "FIGURE"  Figures normally contain an
                         optional figure title, an optional figure
                         number, a graphic of some kind, an
                         optional figure caption, and an optional
                         figure description.  This information
                         would be contained within the "fig" tag. 
                         (Since figures are treated specially by
                         trained editors, simply noting the
                         beginning and end of figure information
                         is all that is required.)

7. <fn>   </fn>     rR   "FOOTNOTE"  Footnote information is
                         placed within this tag.

8. <hn>   </hn>     R    "HEADING LEVEL INDICATOR"  n is replaced
                         by a number.  <h1> would be major
                         divisions of the book.  Acknowledgments,
                         dedication, prefaces, chapters,
                         appendices, etc. are examples of <h1>
                         heads.  Within a chapter, headings would
                         be associated with lower numeric values. 
                         <h2>, for example, would be the highest
                         level head in a chapter.  Many publishers
                         and authors call these (A) heads.  (B)
                         heads would be indicated with <h3>.  As
                         many levels that exist in the book would
                         be indicated with the <hn> tag.

9. <ipp no=>        O    "INK PRINT PAGE"  Optional tag that
                         indicates the page number of the ink
                         print page associated with the files.

10. <it>  </it>     R    "ITALIC"  Emphasized text with italics
                         would be indicated with this tag.

11. <lang> </lang>  R    "LANGUAGE INDICATOR"  This tag indicates
                         that foreign language words are present
                         at this location in the text.

12. <lhead> </lhead> O   "LIST HEADING"  If a list has a heading,
                         this tag is used.

13. <list> </list>  R    "LIST OF ITEMS"  Many types of lists
                         would be indicated by this tag:  ordered
                         lists that have numbers or letters,
                         unordered lists that are simply indicated
                         with a bullet, or term lists that use the
                         "TERM" tag to indicate a term and
                         associated definition or explanation.

14. <lit> </lit>    R    "LITERAL TEXT"  Often examples or
                         computer input/output is shown in text
                         and is separated by visual differences. 
                         This tag allows us to distinguish literal
                         text from the surrounding paragraphs.

15. <litem> </litem> R   "LIST ITEM"  Each item in a list should
                         be indicated with this tag.  Ordered
                         lists, unordered lists and term lists use
                         this tag.

16. <note> </note> R     "NOTE IN TEXT"  Cautions, Warnings, Notes
                         and other information that is intended to
                         be read prior to the text.  Any type of
                         boxed information whose position in
                         relation to the text is important.  (Not
                         to be confused with the <box> tag whose
                         position is not dependent on text.)

17. <other </other> R    "OTHER EMPHASIZED TEXT"  Emphasized text
                         that is not bold, italic, or bold &
                         italic.  Underline and all other types of
                         emphasis would be represented in this
                         way.

18. <para> </para>  R    "PARAGRAPH" Indicates the beginning and
                         end  of a paragraph. It is acceptable to
                         show white space by using an empty
                         paragraph. i.e. <para></para> shows a
                         blank space.

19. <pp>  </pp>     O    "PRINT PAGE REFERENCE"  Numbers of pages
                         in the book are enclosed in this tag. 
                         This allows us to determine the indicated
                         numbers refer to pages in the book.

20. <term> </term> R     "TERM OR KEYWORD"  Typically used in two
                         locations.  Key words are introduced in
                         the body of the text.  The term is then
                         incorporated into the glossary. 
                         Dictionaries with a term followed by an
                         explanation, definition or series of
                         definitions also use this construct.  The
                         term is the word that is normally in bold
                         face print and in alphabetical order. 
                         The TERM tag is used within LITEM when a
                         term occurs in a list.

21. <ti>  </ti>     R    "TITLE OF THE BOOK"  This is the title of
                         the book that normally appears in the
                         front material.

22. <xref idref=>   O    "CROSS REFERENCE"  Used in conjunction
                         with the <anchor> tag.  Normally this tag
                         creates a reference to a named location. 
                         For example, in text you may see, "See
                         page 77 for a list of instructions."  The
                         computer files should contain two items. 
                         On page 77, before the list of
                         instructions, the tag <anchor id=list11>
                         is inserted.  The page making reference
                         to the list of instructions would
                         contain, "See page 77 <xref idref=list11>
                         for a list of instructions."
                         
                         (NOTE:  <anchor> and <xref> are not used
                         for paper braille or large print.  These
                         two tags are invaluable for electronic
                         text used with refreshable braille,
                         screen magnification and synthesized
                         speech.)




OPTIONAL TAG MODULES

The following optional tag modules would help greatly, but are not
found in every book.  These tags would be easy to include where
style sheets are used.

            OPTIONAL TAG MODULES THAT ASSOCIATE WITH STYLE SHEETS

     Table tags
     Outline tags
     Poetry tags
     Prose play tags
     Verse Play tags
     Level 1 Math and Science tags

CHARACTER REPRESENTATION

There are many characters in publishing systems.  Most of these
characters do not exist in ASCII 0-127.  If the files are to be
provided in WordPerfect or Microsoft Word, it is acceptable to use
the conventions found in these word processing packages.  If the
files are to be delivered in ASCII, these characters may be
represented by the equilivant Post Script or the SGML entity
specified below. 

There are three characters that appear in ASCII that are used as
"escape" characters in this document type. It will be necessary to
use the mnemonic instead of the character where they occur.

Escape character                   Mnemonic

<                                  &lt;
>                                  &gt;
&                                  &amp;

Characters that do not map into the ASCII character set, but can be
represented with mnemonic.
                                   (Note: these characters ARE
                                   case sensitive.)
Character                          Mnemonic

                                  &yuml;
                                  &uuml;
                                  &agrave;

For a complete list of SGML characters see Appendix C of:

     Electronic Manuscript Preparation and Markup
     Transaction Publishers
     Copyright 1991
     ISBN 0-88738-945-7


It is also acceptable to use the Post Script character set to
represent characters.  To use the Post Script character prefix the
Post Script nemonic with "ps" For example:  &ps-----; where ---- is
the Post Script character.

                                   (Note: these characters ARE
                                   case sensitive.)
Character                          Mnemonic

                                  &psydieresis;
                                  &psudieresis;;
                                  &psagrave;

For a complete list of Post Script characters see Appendix E of:

     Post Script Language Reference Manual
     2nd. Edition 
     Addison Wesley
     Copyright 1990
     ISBN 0-201-18127-4

Final File FOrmat For Print Disabled Persons

It may be useful to understand the mappings that ultimately need to
take place for a document provided to persons with print
disabilities. Many typesetting characters directly map to ASCII
characters.  The following table explains this mapping.  For a
complete description of the Print Disabled Document type definition
(PDD), please contact the author. 

Mappings into the ASCII character set

Character                          Maps to (Decimal ASCII values)

Open quote                         ASCII quote (34)
close quote                        ASCII quote
Open Single quote                  ASCII single quote (39)
Close Single quote                 ASCII single quote
hyphen                             ASCII hyphen (45)
endash                             ASCII hyphen 
dash                               two ASCII hyphens
emdash                             two ASCII hyphens
Soft or hard spaces                ASCII space (32)
tabs                               Tabs present special problems
                                   for conversion.  If possible
                                   try to preserve the ASCII tab
                                   character (9). If that is not
                                   practical then try to use the
                                   nemonic &tab; or if all else
                                   fails use the equivalent number
                                   of ASCII space characters. 
soft returns                       ASCII space
Hard returns                       ASCII pair (13)(10)
All types of bullets               ASCII * (42)
