Thursday, March 08, 2007

Chemical Formulas in CML

Molecular formulas are useful to have in CML, especially when connectivity is not know. Working on Computer Aided Structure Elucidation (CASE) based on NMR and MS data, this is of particular interest, as the input I have is in CML.

CML provides a few options to contain such, all using the <formula> element (xsd). The first option is to use a concise formula, which is a formal specification with all elements that occur on the formula given once, and where the count is explicitly given (all whitespace separated):
<formula concise="C 5 H 10 O 2"/>
Alternatively, you can use the @inline attribute, and refer to some convention, as in here:
<formula convention="iucr:_chemical_formula_structural" inline="Sn (C2 O4) K F"/>

More flexible formats are possible too, such as in this example:
<molecule id="CuprammoniumSulfate">
<formula id="f2" title="[Cu(NH3)4]2+ SO42-]">
<formula id="f21" formalCharge="2">
<atomArray id="aa1" elementType="Cu" count="1"/>
<formula id="f22" count="4">
<atomArray elementType="N H" count="1 3"/>
</formula>
</formula>
<formula id="f3" formalCharge="-2">
<atomArray id="aa2" elementType="S O" count="1 4"/>
</formula>
</formula>
</molecule>

You can see it in action in the chem-file project