Sunday, September 03, 2006

Scalar data types

The CML <scalar> element has an attribute @dataType which allows one to state what the type of the scalar value. Consider the following fragment, which could define a nick name for an atom:

<scalar dataType="xsd;string" title="NickName">carbonRulez</scalar>

The CML schema suggests to use the XML Schema Datatypes, as defined in the XML Schema Part 2: Datatypes specification. For these data types, commonly the xsd prefix is used, as in the above example.

Now, this specification allows restricting the content of the string in many ways. For example, that it defines a year:

<scalar dataType="xsd;year" title="DateOfDiscovery">1891</scalar>

It defines fourteen primitive datatypes: string, boolean, float, double, decimal, timeDuration, recurringDuration, binary, uriReference, ID, IDREF, ENTITY, NOTATION, and QName. Of these, the first five have obvious meaning. The specification also defines a number of derived datatypes. which include these useful types: language, integer, positiveInteger, negativeInteger, data, month and year.

The CML schema also defines a large set of custom data types. For example, it defines an elementTypeType which derives from the xsd:string and allows only the one/two/three character element name abbreviations ("C", "N", etc), and Dummy, Du and R.

The use of datatypes in XML Schema's is really powerful, and the CML schema allows to define custom datatypes if needed. But I'll save that for a later time.

No comments: