The CML <scalar> element has an attribute @dataType which allows one to state what the type of the scalar value. Consider the following fragment, which could define a nick name for an atom:
<scalar dataType="xsd;string" title="NickName">carbonRulez</scalar>
The CML schema suggests to use the XML Schema Datatypes, as defined in the
XML Schema Part 2: Datatypes specification. For these data types, commonly the
xsd prefix is used, as in the above example.
Now, this specification allows restricting the content of the string in many ways. For example, that it defines a year:
<scalar dataType="xsd;year" title="DateOfDiscovery">1891</scalar>
It defines fourteen
primitive datatypes:
string, boolean, float, double, decimal, timeDuration, recurringDuration, binary, uriReference, ID, IDREF, ENTITY, NOTATION, and
QName. Of these, the first five have obvious meaning. The specification also defines a number of
derived datatypes. which include these useful types:
language, integer, positiveInteger, negativeInteger, data, month and
year.
The CML schema also defines a large set of custom data types. For example, it defines an
elementTypeType which derives from the
xsd:string and allows only the one/two/three character element name abbreviations ("C", "N", etc), and
Dummy,
Du and
R.
The use of datatypes in XML Schema's is really powerful, and the CML schema allows to define custom datatypes if needed. But I'll save that for a later time.