Literate Programming Using OmniMark
Contents
3. Weaving
Weaving is in essence a formatting process, where code
references are replaced by link to the actual code blocks. In
Knuth's original system, the output of the weaving step could be
used as the input to TeX. Here, we'll generate well-formed
XML as the output of the weaving step; this well-formed XML can
then be used as the input to some formatting program.
Alternatively, the well-formed XML output could be viewed
directly in a browser using a stylesheet. The premise of literate programming is that the author has
chosen a presentation order for the program. We therefore do not
need to worry about re-ordering sections. However, we must
resolve cross-references from one program section to another.
This can be handled by using referents. Our goal should be to simplify the task of the formatter as
much as possible; this way, we make it easier to support various
output targets. It seems reasonable, therefore, to put each
section into its own file: I find it easier to re-combine split
files than to split a monolithic file. The main file consists of
a list of section files: it will provide a driver for the
formatter. The weaving process is begun when we hit the program
element: <4 weaving a program> =
Generating a filename for the weaved output is handled by
<34 generating output filenames>: Once we've parsed the document, we can generate a list of
sections and output the main document. The main document is
generated according to the template <5 main page template> =
"<?xml version=%"1.0%"?>%n"
|| "<weaved type=%"main%">%n"
|| "<program-name>[:program name:]</program-name>%n"
|| "<sections>%n"
|| "[:section list:]%n"
|| "</sections>%n"
|| "</weaved>%n"
The [:section list:] template parameter is generated
dynamically using <6 generate section list> =
open s as buffer
using output as s
repeat over section-numbers & section-titles
save-clear template-parameters
set new template-parameters{"section name"} to key of section-titles
set new template-parameters{"section number"} to "d" % section-numbers
set new template-parameters{"section title"} to section-titles
output emit-template templates{"section list item"}
again
close s
set new template-parameters{"section list"} to s
and consists of multiple copies of the following template, one
for each section in the input document <7 section list item template> =
"<section><filename>[:section name:]</filename>"
|| "<number>[:section number:]</number>"
|| "<title>[:section title:]</title></section>%n"
We will see shortly that the section-numbers and section-titles shelves are built-up when weaving the various
sections in the literate program (see <9 weaving a section>). Meanwhile, the [:program name:] template parameter is
picked up from the title child of the program
element. We will need the program name for each section we
format, so we store it in the program-name global. <8 weaving miscellaneous elements> =
element "title" when parent is "program"
set program-name to "%sc"
Combining all of this, the steps for handling the program element are <4 weaving a program> +=
clear template-parameters
open new weaved-file with referents-allowed as file weaved-filename
using output as weaved-file
do
local stream s
output "%c"
<6 generate section list>
set new template-parameters{"program name"} to program-name
output emit-template templates{"main page"}
done
close weaved-file
As mentioned earlier, it seems reasonable to put each section
of the weaved program into its own file. We therefore need to
generate a new filename each time we encounter a section
element: <9 weaving a section> =
Sections are numbered, and their titles must be collected for
the main page. <9 weaving a section> +=
increment section-number
new section-titles{weaved-filename}
set new section-numbers{weaved-filename} to section-number
set s with referents-allowed to "%c"
Although we create the entry on the shelf here, the section's
title will only be stored in the section-titles shelf when
we encounter the title element: <8 weaving miscellaneous elements> +=
element "title" when parent is "section"
set section-titles to "%sc"
Just like with the program element, once we've parsed
the contents of the section, we collect the template parameters <9 weaving a section> +=
set new template-parameters{"program name"} to program-name
set new template-parameters{"section number"} to "d" % section-numbers
set new template-parameters{"section title"} to section-titles
set new template-parameters{"section contents"} with referents-allowed to s
open new weaved-file with referents-allowed as file weaved-filename
using output as weaved-file
output emit-template templates{"section page"}
close weaved-file
and emit the section page template: <10 section page template> =
"<?xml version=%"1.0%"?>%n"
|| "<weaved type=%"section%">%n"
|| "<program-name>[:program name:]</program-name>%n"
|| "<number>[:section number:]</number>%n"
|| "<title>[:section title:]</title>%n"
|| "<section>%n"
|| "[:section contents:]"
|| "</section>%n"
|| "</weaved>%n"
Oddly enough, the rule for element code is the longest
in the program, even though weaving code is simpler than
tangling it (see <28 tangling code>). <11 weaving code> =
The reason is fairly simple, however: the formatting of a code element changes depending on whether or not this is the
first occurrence of this particular code block. The first time
we see a code block (called, say some code block), we must
emit a cross-reference anchor as well as the XML required to
support the following formatted herald
<99 some code block> =
... the code goes here ...
Specifically, we use the following template the first time we
see a code block. <12 template code identified> =
"<code-body type=%"identified%">[?key:code id?]<name>[:code name:]</name>%n"
|| "<code>%n[:code body:]%n</code>%n"
|| "</code-body>"
We also need to generate a block number. <13 generating a block's number> =
increment block-number
set new block-numbers{code-key} to block-number
set new template-parameters{"block number"} to "d" % block-number
However, the second and subsequent occurrences of a code block
are formatted as
<99 some code block> +=
... more code goes here ...
Note that the = from the first example has turned in to
+=, indicating that this block is appending code to a
previous block. More precisely, we use a template similar to
<12 template code identified>: <14 template code identified appended> =
"<code-body type=%"identified appended%">[?key:code id?]"
|| "<name>[:code name:]</name>%n"
|| "<code>%n[:code body:]%n</code>%n"
|| "</code-body>"
The attribute type is used by the formatting program to
determine if this is the first occurrence of the code block or
not. In the case where this is a subsequent occurrence, we need
to determine the what number was previously-assigned to this
block; we store this in the block-numbers shelf, which is
keyed on the block identifier: <15 determining a block's number> =
set new template-parameters{"block number"}
to "d" % block-numbers{code-key}
The templates <12 template code identified> and
<14 template code identified appended> could be combined into a
single template that has an extra parameterisation specifying
whether the current block is a first occurrence or not. However,
I prefer to have templates be self-contained as much as
possible, rather than have parts generated conditionally in the
code. Otherwise, it becomes more difficult to track down where
different parts of a template are generated. Combining these elements, weaving identified code is
relatively simple <16 weaving identified code> =
local switch block-exists
local string code-key initial { "key:" || "lg" % attribute "id" }
set block-exists to referents has key code-key
& referents{code-key} is attached
do when block-exists
<15 determining a block's number>
else
<13 generating a block's number>
done
set new template-parameters{"code id"} to "lg" % attribute "id"
set new template-parameters{"filename"} to weaved-filename
set referent code-key to emit-template templates{"code pointer"}
If the code element does not provide a name
attribute, we can re-use the id attribute. <16 weaving identified code> +=
do when attribute "name" is specified
set referent ("name:" || "lg" % attribute "id") to attribute "name"
set new template-parameters{"code name"} to attribute "name"
else
set referent ("name:" || "lg" % attribute "id") to attribute "id"
set new template-parameters{"code name"} to attribute "id"
done
output emit-template templates{block-exists -> "identified code appended"
| "identified code"}
If a code element does not provide an id
attribute, then all we do is emit the code verbatim, with no
herald: <17 weaving anonymous code> =
output emit-template templates{"anonymous code"}
which uses the template <18 template anonymous code> =
"<code-body type=%"anonymous%">%n"
|| "<code>%n[:code body:]%n</code>%n"
|| "</code-body>%n"
In our literate programming tool, cross-references are
represented using undefined general entities (i.e., general
entites that are not defined in the document's internal DTD
subset). When an entity is encountered in the input, the external-text-entity rule (<33 handling a cross-reference>)
fires, translating the entity into a processing instruction. In
the weaving process, this processing instruction is further
translated into a cross-reference to the appropriate code
section: <19 weaving a code reference> =
processing-instruction "code-reference " any+ => reference-name
<3 clear template parameters>
set new template-parameters{"reference"} to reference-name
output emit-template templates{"code reference"}
This uses the template <20 template code reference> =
"<code-reference>[?key:reference?]"
|| "<name>[?name:reference?]</name></code-reference>"
Apart from section titles, we have not dealt with any of the
textual elements of the document (e.g., paragraphs,
italicised sections, and so on). These elements are handled by
the formatting program (or a stylesheet); it is sufficient to
let them pass through essentially unaffected, into the XML
output. <21 weaving formatting elements> =
element ("p" | "b" | "i" | "tt")
<3 clear template parameters>
set new template-parameters{"element name"} to "%lq"
set new template-parameters{"element content"} with referents-allowed to "%sc"
output emit-template templates{"identity"}
The same applies to the data content in a code block,
except that we need to escape the reserved XML text entities: <22 escaping data-content> =
data-content when element is "code"
repeat scan "%c"
match "<"
output "<"
match ">"
output ">"
match "%""
output """
match "'"
output "'"
match "&"
output "&"
match [any \ "<>%"'&"]+ => t
output t
again
To complete the weaving process, we need to declare the global
shelves mentioned earlier, <2 global shelves> +=
global string weaved-filename variable
global stream weaved-file variable
global string weaved-filename-suffix initial { ".xml" }
global string program-name
global integer block-numbers variable
global integer block-number
global integer section-number
global string section-titles variable
global integer section-numbers variable
and declare the group that holds all the rules together: <23 weaving> =
Previous section: Template Processing
Next section: Tangling
|