The CML <scalar> element has an attribute @dataType which allows one to state what the type of the scalar value. Consider the following fragment, which could define a nick name for an atom:
<scalar dataType="xsd;string" title="NickName">carbonRulez</scalar>
The CML schema suggests to use the XML Schema Datatypes, as defined in the 
XML Schema Part 2: Datatypes specification. For these data types, commonly the 
xsd prefix is used, as in the above example.
Now, this specification allows restricting the content of the string in many ways. For example, that it defines a year:
<scalar dataType="xsd;year" title="DateOfDiscovery">1891</scalar>
It defines fourteen 
primitive datatypes: 
string, boolean, float, double, decimal, timeDuration, recurringDuration, binary, uriReference, ID, IDREF, ENTITY, NOTATION, and 
QName. Of these, the first five have obvious meaning. The specification also defines a number of 
derived datatypes. which include these useful types: 
language, integer, positiveInteger, negativeInteger, data, month and 
year.
The CML schema also defines a large set of custom data types. For example, it defines an 
elementTypeType which derives from the 
xsd:string and allows only the one/two/three character element name abbreviations ("C", "N", etc), and 
Dummy, 
Du and 
R.
The use of datatypes in XML Schema's is really powerful, and the CML schema allows to define custom datatypes if needed. But I'll save that for a later time.