XML

Template: Infobox file format / Maintenance / default missing

Template: Infobox file format / Maintenance / Screenshot Format

The Extensible Markup Language (English " extensible markup language" ), abbreviated XML, is a markup language for representing hierarchically structured data in the form of text files. XML is used inter alia for the platform-and implementation- independent exchange of data between computer systems, particularly via the Internet.

Published by the World Wide Web Consortium (W3C ) XML specification (Recommendation, first issue of 10 February 1998, is currently the fifth edition of 26 November 2008 ) defines a meta-language, defined on the basis of structural and content constraints application-specific languages be. These constraints are expressed by schema languages ​​such as DTD or XML Schema. Examples of XML languages ​​are: RSS, MathML, GraphML, XHTML, XAML, Scalable Vector Graphics ( SVG), GPX, but also XML schema.

An XML document consists of text characters, in the simplest case in ASCII encoding, making it human readable. Contains binary data it is not by definition.

  • 2.1 Physical Structure
  • 2.2 Logical structure
  • 4.1 processing criteria
  • 4.2 Programmatic access to XML documents
  • 4.3 XML parser API examples
  • 4.4 transformation and representation of XML documents
  • 5.1 DTD
  • 5.2 XML Schema / XSD
  • 5.3 Other schema languages
  • 6.1 infrastructure
  • 6.2 languages 6.2.1 text
  • 6.2.2 graphics
  • 6.2.3 Spatial Data
  • 6.2.4 Multimedia
  • 6.2.5 safety
  • 6.2.6 Engineering
  • 6.2.7 More

Terms

Element

Important structural unit of an XML application is the element. The name of an XML element can largely be freely chosen. Elements can other elements, text and other nodes - possibly also mixed - included. Elements are the carrier of the information in an XML document, regardless of whether it is text, images, etc..

Shapeliness

An XML document is " well formed " ( or English is well-formed ) if it complies with all the rules of XML. Examples which may be mentioned here as follows:

  • The document has exactly one root element. As root element, always the outermost element is referred to, eg in XHTML.
  • All elements with content having a start and an end labeller ( tag) (eg Entry 1 ). Elements with no content may also be closed in when they consist of one labeller, which ends with / > (eg ).
  • The beginning and end labeller are nested just true - pairs. This means that all elements must be closed before the end labeller of the corresponding parent element or the beginning labeller a sibling element appear.
  • An element can not have multiple attributes with the same name.
  • Attribute properties must be enclosed in quotes.
  • The beginning and end labeller are case -insensitive (eg is not valid).

Validity ( validity)

Should XML be used for data exchange, it is advantageous if the format by means of a grammar is defined (eg a Document Type Definition or XML Schema ). The standard defines an XML document as valid ( valid or English ), if it is well-formed, the reference to a grammar contains and the format described by the grammar comply.

Parser

Programs or parts of programs that read XML data, interpret, and if necessary, check for validity, called XML parser. Checks the validity of the parser, it is a validating parser.

Structure of an XML document

Example of an XML file

Physical architecture

  • Entities. The first entity is the main file of the XML document. Other possible entities are Entity references ( &name; for the document or % name, and for the document type definition ) embedded strings, possibly including multiple files, as well as references to character entities for the integration of individual characters that were referenced by their number ( & # decimal, or & # xHexadezimalzahl ;).

Logical structure

The logical structure corresponds to a tree structure and is organized hierarchically. As tree nodes, there are:

  • Elements whose physical distinction means a matching pair of start tag ( ) and end tag (< / tag- name> ) or
  • Can be an empty- element tag ( )

An XML document must contain exactly one element at the top level. Below this document element other elements can be nested. Furthermore, it can be ensured by specifying a namespace (XML namespace), that in duplication with XML data from another vocabulary no ambiguities arise.

For specification of the logical structure of the document type definitions are replaced by the more extensive XML Schema, which has no ability to define entities, but an adequate replacement for it. Processing instructions are usually used in practice to incorporate in an XML document processing instructions in other languages ​​. An example of this is PHP, the processing instructions in XML documents with PHP processing instruction, eg < php echo ' Hello, World '? ;? >, May be incorporated.

Some web browsers, including Internet Explorer (MSXML engine), Mozilla Firefox and Netscape Navigator ( TransforMiiX engine), Opera (native engine) and Safari, can directly represent XML documents by means of a built-in XML parser. This occurs, for example, in conjunction with a style sheet. This transformation can convert the data into a completely different format, the target format should not even be XML.

Classification of XML documents

XML documents can be based on their intended use and their structuring degree divided into document- centric and data-centric documents. The boundary between these document types is gradual. Mixed forms are referred to as semi- structured.

  • Centered document: The document is based on a text document that is understandable to human readers for the most part even without the additional meta information. XML elements are used mainly for semantic labeling of passages of the document, the document is only weakly structured. Due to the weak structuring machine processing is difficult.
  • Data-centric: The document is primarily intended for machine processing. It follows a scheme that entities of a data model describes and defines how they relate the entities to each other, and which attributes have the entities. The document is thus highly structured and less suitable for direct human use.
  • Semi- structured: Semi-structured documents are a kind of hybrid that is more structured than document- centric documents, but weaker than data-centric documents.

It is typical for data-centric XML documents that have elements of either element content or text content. The so-called mixed content (mixed content ) contained in the elements of both text and child elements, is typical of the other XML documents.

Processing XML

Processing criteria

Basically, there are three aspects when accessing an XML document of importance:

  • How to access the XML file: sequentially or randomly?
  • What is the process when accessing the XML data designed " push" or " pull"? (Push means that the flow control of the program, the parser is. Pull means that the flow control in the code that calls the parser is implemented.)
  • How is the tree structure of the XML data management: hierarchical or nested?

Programmatic access to XML documents

The reading of XML documents is at the lowest level, through a special program component, an XML processor, also called the XML parser. It provides an API through which the application access to the XML document.

The XML processors help three basic models of computation.

  • DOM: A DOM API represents an XML document as a tree structure and provides random access to the individual components of the tree structure. DOM allowed except for reading XML documents and the manipulation of the tree structure, and writing back the tree structure into an XML document. For this reason, DOM is very memory intensive.
  • SAX: A SAX API represents an XML document as a sequential data stream, and invokes defined in the standard events specified callback functions (callback function) on. An application that uses SAX, your own subroutines can register as callback functions and to interpret the XML data in this way.
  • Pull- API: An XML pull API processes data sequentially and offers both event-based processing as well as an iterator to. It is highly memory efficient and possibly easier to program than the SAX API, since the flow control lies with the program and not the parser.

Further processing models:

  • DataBinding: This option provides XML data as a data structure directly ready for a program access. The XML data is converted directly into eg objects by unmarshalling.
  • Not extracting XML API: The data will be processed very efficiently at the byte level.

Often, the application code does not directly access to the parser API. Instead, XML is further enclosed, so that the application code works with the native objects / data structures, which are supported on XML. Examples of such access layers are JAXB in Java, the Data Binding Wizard in Delphi or XML Schema Definition Toolkit. Net. The conversion of objects to XML is usually bidirectional possible. This conversion is called serialization or marshaling.

XML Parser API examples

XML parser APIs are available for various programming languages ​​such as Java, C, C , C #, Python, Perl and PHP. Parser API examples:

  • XML :: Parser ( Perl): An XML parser for Perl. A very simple API provides, for example, also the CPAN module XML :: Simple.
  • DOM Functions ( PHP5 ): module in PHP5 to read XML documents; alternatively SimpleXML; for PHP4 there are DOM XML.
  • StAX (Java): A highly memory-efficient parser implementation (pull ) and at the same time easy to program. It offered cursor and Iteratorverarbeitungsmodelle.
  • JAXB: Data Binding for Java. For example, from an XML schema, the corresponding Java class is generated and vice versa.
  • Apache XMLBeans Java Data Binding framework can already be used with Java 1.4.2
  • Xerces: A validating XML parser for C , Java and Perl for a large number of platforms.
  • ElementTree iterparse: A parser API for Python that iterates over subtrees. Combining the storage efficiency of a pull parser with the simplicity of a DOM parser.
  • VTD - XML: Example of a non -extracting XML API.
  • MSXML Microsoft XML Core Services, the Microsoft XML software library for XML support DOM, SAX, XSLT, XML schemas to XML and other associated technologies

To create XML documents, there are special programs called XML editors. For the storage and management of XML documents, there are also special programs called XML databases.

Transformation and representation of XML documents

An XML document can be transformed into another document by means of suitable transformation languages ​​such as XSLT or DSSSL. Often, for example, the transformation is the transformation for converting a document from an XML language to another XML language to XHTML to display the document in a Web browser.

Schema languages

To describe the structure of XML languages ​​, one uses so-called schema languages ​​. The two best known are Document Type Definition and XML Schema.

DTD

A Document Type Definition (DTD) describes the structure and grammar of XML documents. It was standardized along with XML, to a time when the XML still mainly for "narrative documents" ( " narrative documents ", ie newspaper articles, books, ... ) was intended less as a data exchange format. Therefore, it is not possible, for example, DTD to distinguish between text and numbers. Another disadvantage is the fact that the DTD has to be written in its own language. In addition, the DTD has no namespaces.

XML Schema / XSD

XML Schema ( XSD or XML Schema Definition ) is the modern way to describe the structure of XML documents. XML Schema also provides the ability to restrict the content of elements and attributes, eg numbers, dates, or text, for example using regular expressions. A schema is itself an XML document, which allows to describe more complex ( and substantive ) relationships than is possible with a formal DTD.

Other schema languages

Other schema languages ​​are Document Structure Description, RELAX NG and Schematron.

XML Family

Infrastructure

In the context of XML, many languages ​​have been defined by the W3 consortium based on XML, which XML expressions for common general features to offer such as the combination of XML documents. Many XML languages ​​use these basic building blocks.

  • Transformation of XML documents: XSLT, STX
  • Addressing parts of an XML tree: XPath
  • Combination of XML resources: XPointer, XLink and XInclude
  • Selection of data from an XML data set: XQuery
  • Data manipulation in an XML record: XUpdate
  • Drafting of electronic forms: XForms
  • Definition of XML data structures: XML Schema ( XSD = XML Schema Definition Language) DTD and RELAX NG
  • Signature and encryption of XML node: XML Signature and XML Encryption
  • Statements to the formal information content: XML Infoset
  • Formatted display of XML data: XSL -FO
  • Definition of methods or function call by Distributed Systems: XML -RPC
  • Standardized attributes: XML Base and ID (DTD)
  • XML-based declarative programming language: MXML

Languages

While XML itself emerged from SGML, today employ very many formal languages ​​the syntax of XML. So XML is an essential tool to keep - as required by the W3C - to create an open, understandable for man and machine information landscape (semantic Web).

The well-known document language HTML was integrated after the 4.01 in this concept, so its XML is now as a definition based on reason as " Extensible Hypertext Markup Language " (XHTML). Multiplier reason for the use of XML is the presence of numerous parsers and the simple syntax: the definition of SGML comprises 500 pages, those XML only 26

The following lists represent some of these XML languages

Text

Graphic

  • SVG (vector graphics )
  • X3D ( 3D Modeling Language )
  • Collada ( exchange format for data between different 3D programs )

Geodata

Multimedia

  • MusicXML ( note data you wrote down music)
  • SMIL ( time-synchronized multimedia contents )
  • MPEG -7 ( MPEG-7 metadata)
  • Laszlo ( LZX )

Security

Engineering

  • AutomationML, a format for storing system planning data
  • CAEX, a format for storing hierarchical object information
  • GSDML, a format for describing automation devices that can communicate with Profinet
  • IODDs, a format for describing sensors and actuators

More

Furthermore, there are XML languages ​​for web services ( eg SOAP, WSDL and WS- *), for the integration of Java code into XML documents ( XSP ) for synchronization of calendar data SyncML, mathematical formulas ( MathML ) representation of graphs ( GraphML ) method in the field of the Semantic Web ( RDF, OWL, Topic Maps, UOML ), service Provisioning ( SPML ), the exchange of messages ( XMPP ), or financial reports such as financial statements ( XBRL) in areas of automotive industry (ODX, MSRSW, AUTOSAR templates, QDX, JADM, OTX ), automated test, for example circuits ( ATML ) on systems Biology ( SBML ) and agriculture ( agroXML ) to publishing ( ONIX ) or Chemistry ( CIDX ) and many more.

A summary of XML languages ​​for Office applications can be found in the OpenDocument interchange format ( OASIS Open Document Format for Office Applications).

Alternative formats

  • S-expressions ( Lisp syntax for lists)
  • YAML ( YAML Is not Markup Language )
19299
de