When doing this I found a problem which I have seen more often: the problem was that the MM2 and MMFF94 atom type lists contained '=' and other forbidden characters in the <atomType> @id's. The content of this @id attribute is defined by the idType in the XML Schema for CML 2.3. It extends the XML Schema datatype xsd:string, but is also require to match this regular expression: [A-Za-z][A-Za-z0-9\.\-_]*. That is, it has to start with a upper or lower case letter, and then consists of alphanumerics, dots (.), a hyphen (-), or an underscore (_).
Now, the atom type lists used to identifiers from the MMFF94 and MM2 force field atom type lists, giving code like:
<atomType id="N=C">
<!-- in imines -->
<atom elementType="N" formalCharge="0">
<scalar dataType="xsd:double" dictRef="cdk:maxBondOrder">2.0
<scalar dataType="xsd:double" dictRef="cdk:bondOrderSum">3.0
<scalar dataType="xsd:integer" dictRef="cdk:formalNeighbourCount">2
<scalar dataType="xsd:integer" dictRef="cdk:valency">5
</atom>
<scalar dataType="xsd:string" dictRef="cdk:hybridization">sp2
<scalar dataType="xsd:string" dictRef="cdk:DA">A
<scalar dataType="xsd:string" dictRef="cdk:sphericalMatcher">N-[1-3];[H]{0,2}+[A-Za-z]*+=[CN].*+
</atomType>
The solution is to put the original identifier, "N=C" in this example, into a <label> element and fix the @id:
<atomType id="NdoubleBondedC">
<!-- in imines -->
<label value="N=C"/>
</atomType>
It did require to fix the atom type perception too, because those were using the original identifiers to look up atom types.
No comments:
Post a Comment