Wednesday, February 14, 2007

Writing up CML conventions in LaTeX

CML is a rather flexible format, which allows, however, to specify it's specifics in form of conventions, as specified in the @convention attribute. Such conventions can put extra restrictions on hierarchy and expected content. For example, we may define a 'simpleMolecule' convention which only allows <atomArray> and <bondArray> inside a <molecule>, and <atom> in <atomArray> and <bond> in <bondArray>:
<molecule convention="simpleMolecule">
<atom id="c1" elementType="C" hydrogenCount="3">
<atom id="n1" elementType="N" hydrogenCount="2">
<bond atomRefs="c1 n1" order="s"/>

Now, such conventions need to be specified. One way of doing this is writing it up in a LaTeX document, which would include explanations and likely code examples. To ensure that these code examples are actually valid CML, the following setup may be used, which separates code examples from the LaTeX source.

Including CML examples in the LaTeX source

Because the framework makes use of the LaTeX package listings, it cannot make use of the \include command to include the code examples into the PDF. Instead, it makes use of a preprocessor that includes the examples instead. This is the directory layout I have:
|-- Makefile
|-- examples
| |-- Makefile
| |-- schema.xsd
| |-- simple1d.valid.xml

The Makefile creates the PDF from the LaTeX source by first running on the file, and then running pdflatex on the created .tex file:
all: spec.pdf

spec.pdf: spec.tex
pdflatex spec.tex
pdflatex spec.tex
pdflatex spec.tex

perl < > spec.tex

The LaTeX source in the file looks like:

caption={Simple 1D ${^13}C$ NMR spectrum.},
% INPUT: simple1d.valid.cml

The string "% INPUT:" is picked up by the Perl script to include that file. The full script looks like:

use diagnostics;
use strict;

while (my $line = <STDIN>) {
if ($line =~ /^\%\sINPUT:\s(.*)/) {
my $file = $1;
die "Cannot find file 'examples/$file' to insert!\n" if (!(-e "examples/$file"));
open (INPUT, "<examples/$file");
while (<INPUT>) { print STDOUT $_; };
} else {
print STDOUT $line;

CML Validation

Now that the examples are split out, but tightly integrated in the LaTeX source, validation of the CML examples is easy. I use a simple Makefile for that, which makes use of xmllint:

all: validate

validate: *.valid.cml
@for f in *.valid.cml; do \
echo "** Validating $${f} against XML Schema..."; \
xmllint --noout --schema schema.xsd $${f}; \

1 comment:

Johannes Ranke said...

Hi Egon,

Couldn't you use the \input command? I use it within tabular environments, where it is not possible to use \include. \input doesn't force a pagebreak.

Best, and thanks for the great blogs!