Wednesday, February 14, 2007

Writing up CML conventions in LaTeX

CML is a rather flexible format, which allows, however, to specify it's specifics in form of conventions, as specified in the @convention attribute. Such conventions can put extra restrictions on hierarchy and expected content. For example, we may define a 'simpleMolecule' convention which only allows <atomArray> and <bondArray> inside a <molecule>, and <atom> in <atomArray> and <bond> in <bondArray>:
<molecule convention="simpleMolecule">
<atomArray>
<atom id="c1" elementType="C" hydrogenCount="3">
<atom id="n1" elementType="N" hydrogenCount="2">
</atomArray>
<bondArray>
<bond atomRefs="c1 n1" order="s"/>
</bondArray>
</molecule>

Now, such conventions need to be specified. One way of doing this is writing it up in a LaTeX document, which would include explanations and likely code examples. To ensure that these code examples are actually valid CML, the following setup may be used, which separates code examples from the LaTeX source.

Including CML examples in the LaTeX source

Because the framework makes use of the LaTeX package listings, it cannot make use of the \include command to include the code examples into the PDF. Instead, it makes use of a preprocessor that includes the examples instead. This is the directory layout I have:
.
|-- Makefile
|-- spec.tex.in
|-- examples
| |-- Makefile
| |-- schema.xsd
| |-- simple1d.valid.xml
`-- preproces.pl

The Makefile creates the PDF from the LaTeX source by first running preproces.pl on the .tex.in file, and then running pdflatex on the created .tex file:
all: spec.pdf

spec.pdf: spec.tex
pdflatex spec.tex
pdflatex spec.tex
pdflatex spec.tex

spec.tex: spec.tex.in
perl preproces.pl < spec.tex.in > spec.tex

The LaTeX source in the .tex.in file looks like:

\begin{lstlisting}[language=XML,
caption={Simple 1D ${^13}C$ NMR spectrum.},
label={list:simple1d}]
% INPUT: simple1d.valid.cml
\end{lstlisting}

The string "% INPUT:" is picked up by the Perl script to include that file. The full script looks like:
#!/usr/bin/perl

use diagnostics;
use strict;

while (my $line = <STDIN>) {
if ($line =~ /^\%\sINPUT:\s(.*)/) {
my $file = $1;
die "Cannot find file 'examples/$file' to insert!\n" if (!(-e "examples/$file"));
open (INPUT, "<examples/$file");
while (<INPUT>) { print STDOUT $_; };
} else {
print STDOUT $line;
}
}

CML Validation

Now that the examples are split out, but tightly integrated in the LaTeX source, validation of the CML examples is easy. I use a simple Makefile for that, which makes use of xmllint:

all: validate

validate: *.valid.cml
@for f in *.valid.cml; do \
echo "** Validating $${f} against XML Schema..."; \
xmllint --noout --schema schema.xsd $${f}; \
done