ABSTRACT

Introduction ............................................................................................................ 100 XML Technologies ................................................................................................ 100 XML Syntax ........................................................................................................... 101 XML Namespaces .................................................................................................. 103 Validation of XML Documents .............................................................................. 104

Document Type Defi nition (DTD) .................................................................... 105 W3C XML Schema ........................................................................................... 105 RelaxNG ............................................................................................................ 106 Schematron ........................................................................................................ 106

Processing of XML Documents ............................................................................. 106 SAX Processing ................................................................................................ 107 DOM (Document Object Model) ...................................................................... 107 XSLT Transformations ...................................................................................... 108

XML Databases ..................................................................................................... 108 XML Markup Languages ....................................................................................... 109

Versioning ......................................................................................................... 110 Flexibility .......................................................................................................... 110

Standards ................................................................................................................ 111 World Wide Web Consortium ........................................................................... 111 Organization for the Advancement of Structured Information Standards (OASIS) ........................................................................................ 111

XML and Chemical Data Mining .......................................................................... 112 Chemical Structures and Reactions ................................................................... 112 Chemical Markup Language (CML) ................................................................. 113 Physical Measurements ..................................................................................... 114

ThermoML ................................................................................................... 114 AnIML (Analytical Information Markup Language) ................................... 115 UnitsML (Units Markup Language)............................................................. 115

Mathematical Expressions ................................................................................ 115 SBML (Systems Biology Markup Language)................................................... 116 Resource Description Framework (RDF) ......................................................... 116

Conclusions and Perspectives ................................................................................ 117 References .............................................................................................................. 118

Data mining applications aim at the automatic discovery of new information in available data. Because extraction of data from unstructured text is very diffi cult, current applications usually work with data that is in some way structured.