This document describes the different steps in preprocessing for DITA topics.
The DITA Toolkit implements a multi-stage, map-driven architecture to process DITA content. Each step in the process examines some or all of the content; some steps result in temporary files used by later steps, while others result in updated copies of the DITA content. Most of the processing takes place in a temporary working directory (the source files themselves are never modified).
Most of this process is common between all output formats, and is known as the "preprocess" stage. In general, any DITA process begins with this common set of pre-processing routines, after which processing specific to the output format (such as PDF or XHTML) takes over. The stages of the pre-processing pipeline are described below, after the following note on processing order.
The order of the steps below is significant. Although the DITA specification does not mandate a specific order for processing, the toolkit has over time found that the current order best meets user expectations. Switching the order of processing may give different results.
<note conref="documentA.dita#doc/note" product="MyProd"></note>
If the conref attribute is evaluated first, then documentA must be parsed in order to retrieve the note content. That content is then stored in the current document (or in a representation of that document in memory). However, if all content with product="MyProd" is filtered out, then that work is all discarded later in the build.
The gen-list step examines the input files and creates lists of topics, images, or other content. These lists are used by later steps in the pipeline. For example, one list includes all topics that make use of the conref attribute; only those files are processed during the conref stage of the build. This step is implemented in Ant and Java.
The result of this list is a set of several list files in the temporary directory, including dita.list and dita.xml.properties. A non-exhaustive selection of each list is given in the table below.
|conreflist||Documents that contains conref attribute that need to be resolved in preprocess.|
|fullditamapandtopiclist||All of the ditamap and topic files that are referenced during the transformation. These may be referenced by href or conref attributes.|
|fullditamaplist||All of the ditamap files in dita.list|
|hrefditatopiclist||All of the topic files that are referenced with an href attribute|
|fullditatopiclist||All of the topic files in dita.list|
|imagelist||Images files that are referenced in the content|
This stage processes all referenced DITA content, and creates copies in a temporary directory for use during the remainder of the build. This step is implemented in Java.
This step copies related non-DITA resources to the output directory, including referenced HTML files and images referenced by DITAVAL files.
This step resolves "conref push" references. Conref push is a new feature in the DITA 1.2 specification. This step only processes documents that use conref push (or that are updated due to the push action. The step is implemented in Java.
This step resolves traditional conref attributes. It is implemented in XSLT. The step only runs on documents that use the conref attribute. It is implemented in XSLT.
This step pushes metadata back and forth between maps and topics. For example, index entries and copyrights in the map are pushed into affected topics, so that topics may be processed later in isolation while retaining all relevant metadata. This step is implemented in Java.
Keyref is perhaps the most significant new feature in the DITA 1.2 standard. This step examines all keys defined in the source material, and updates content appropriately. Links that make use of keys are updated so that any href value is replaced by the appropriate target; key based text replacement is also evaluated. This step is implemented in Java.
The codref element was added in DITA 1.2, and allows references to non-DITA code for use in a codeblock element. This step reads the referenced document, and adds the content to the referencing DITA topic. This step is implemented in Java.
<topicref href="other.ditamap" format="ditamap"></topicref>
This step is implemented in XSLT.
The mappull step pulls content from referenced topics into maps. For example, by default, the navigation title on a topicref is replaced by a title from the referenced topic. Short descriptions and link text are also established during this stage. Finally, cascading attributes are made explicit; for example, if a topicref does not set the toc attribute while the parent attribute sets toc="no", that toc="no" value is explicitly set for the child. This step is implemented in XSLT.
The chunk step breaks apart and assembles referenced DITA content based on the chunk attribute in maps. For example, the chunk attribute may be used to assemble all referenced topics into a single document; alternatively, it may do the opposite and break one document with multiple topics into several individual documents. This step is implemented in Java.
These steps work together to add links to topics based on the location of those topics in a map. This is where family links (such as parent, child, next, and previous) are created, as well as links based on relationship tables. The maplink step (implemented in XSLT) generates all of the links, and places them in one large file in the temporary directory. Next, the move-links step (implemented in Java) pushes all of the generated links into the proper topics.
The topicpull step is similar to the mappull step, except that it runs on topics. This is used to pull in titles for <xref> and <link> elements that do not already specify link text. The step is implemented in XSLT.