Monday, September 11, 2006

InChI's in CML

I've seen some incorrect uses of <identifier> for adding InChI's in CML. The bad code I have seen in the wild looks like:

<molecule>
<identifier convention="iupac:inchi">InChI=1/CH4/h1H4</identifier>
</molecule>

However, the <identifier> does not allow content other then elements. In XML terms: it does not allow mixed content. You might wonder why it allows any element. This is because the first InChI betas had an XML syntax (InChI still has an XML syntax too).

If you want the one-line InChI format most of us know, it needs to be put into the @value attribute. Thus:

<molecule>
<identifier convention="iupac:inchi" value="InChI=1/CH4/h1H4"/>
</molecule>


Sunday, September 03, 2006

Scalar data types

The CML <scalar> element has an attribute @dataType which allows one to state what the type of the scalar value. Consider the following fragment, which could define a nick name for an atom:

<scalar dataType="xsd;string" title="NickName">carbonRulez</scalar>

The CML schema suggests to use the XML Schema Datatypes, as defined in the XML Schema Part 2: Datatypes specification. For these data types, commonly the xsd prefix is used, as in the above example.

Now, this specification allows restricting the content of the string in many ways. For example, that it defines a year:

<scalar dataType="xsd;year" title="DateOfDiscovery">1891</scalar>

It defines fourteen primitive datatypes: string, boolean, float, double, decimal, timeDuration, recurringDuration, binary, uriReference, ID, IDREF, ENTITY, NOTATION, and QName. Of these, the first five have obvious meaning. The specification also defines a number of derived datatypes. which include these useful types: language, integer, positiveInteger, negativeInteger, data, month and year.

The CML schema also defines a large set of custom data types. For example, it defines an elementTypeType which derives from the xsd:string and allows only the one/two/three character element name abbreviations ("C", "N", etc), and Dummy, Du and R.

The use of datatypes in XML Schema's is really powerful, and the CML schema allows to define custom datatypes if needed. But I'll save that for a later time.

Saturday, September 02, 2006

CML blog from the source

Peter Murray-Rust started a blog on CML too. I do plan to continue to give small CML tips and comments in this blog.