Documentation/documentation
punctum xtool - Overview
[[documentation/./index.jsp|xtool]] is a collection of tools for XML processing with special focus on schema quality and namespace consistency. All tools are implemented in Java 1.4. The programmatic interface is described in the API documentation.
Currently the collection contains:
- SchemaList - linear W3C XML schema representation
- XmlnsPrefix - modification of XMLnamespace prefixes
- XmlnsXref - XML namespace crossreference
- XPathSelect - Evaluation of an XPath expression
SchemaList - List the Element Tree of a W3C XML Schema
Starting with the first xs:element definition, the type hierarchy of a W3C schema file is recursively expanded. The output is a linear, indented list of the unfolded possible substructures, the leaf XML elements and their attributes.
Optionally values may be generated, and the tool can generate comments which show the schema type, data type, restrictions and annotations attached to the elements. With value generation and a selection of the first choice, the output will be a well-formed XML instance which usually validates against the input schema. This representation has the big advantage that it shows both the schema design and a real instance of that schema, both combined in a single XML document.
The schema list can be shown as HTML (the default), plain text, pure XML or in tab separated format which is suitable for MS-Excel. For HTML, the start tags have a link showing the XPath to the element. In Excel, columns can easily be hidden or appended. Such a worksheet is then a good base for the development of additional restrictions, mapping rules and the like.
The tool may be called on a commandline or from a web page. The following optional settings may be specified:
- -c show comments with types, restrictions, patterns etc.
- -e enc source file encoding, default: UTF-8
- -e enc target file encoding, default: UTF-8
- -f show first alternative of choices only
- -m mode output mode: "html" (default), "plain", "tsv" (for MS-Excel) or "xml"
- -s show start tags only (no end tags)
- -v generate element values
These options may be combined. Typical settings are:
- -cvf generate a well-formed, commented XML instance from the schema
- -s only show the minimal indented element structure without end tags
- -cv -m tsv generate an Excel worksheet with comments and values
- -v -m xml show elements with values in the browser's XML representation (elements can be collapsed)
- -vf -m plain generate a concise XML instance file which may be stored and validated
In the web interface, the user specifies the desired options and uploads the input schema file to the application on the web server.
Example for HTML output (-cv -m html)
<Document xmlns="urn:sepade:xsd:pain.001.001.02%22><!--[1..1] Document -->
<pain.001.001.02><!--[1..1] pain.001.001.02 -->
<GrpHdr><!--[1..1] GroupHeader20 -->
<MsgId>Max35Text</MsgId><!--[1..1] Max35Text string 1..35 -->
<CreDtTm>2007-06-29T05:30:00Z</CreDtTm><!--[1..1] ISODateTime dateTime -->
<NbOfTxs>09</NbOfTxs><!--[1..1] Max15NumericText string /[0-9]{1,15}/ -->
<CtrlSum>1</CtrlSum><!--[0..1] DecimalNumber decimal L18.17 ! SEPA AOS Can optionally be used as specification for the total amount of the file. -->
<Grpg>GRPD</Grpg><!--[1..1] Grouping2Code string "GRPD" ! Only the GRPD option may be used.-->
<InitgPty><!--[1..1] PartyIdentification20 ! Initiating party. -->
<Nm>Max70Text</Nm><!--[1..1] Max70Text string 1..70 ! AT-02 Name of the originator.-->
<PstlAdr><!--[0..1] PostalAddress5 ! AT-03 Address of the originator.-->
<AdrLine>Max70Text</AdrLine><!--[1..2] Max70Text string 1..70 -->
<Ctry>AZ</Ctry><!--[1..1] CountryCode string /[A-Z]{2,2}/ -->
</PstlAdr>
<Id><!--[0..1] Party5Choice ! AT-10 - ID of the originator. Recommendation: This field should not be used.-->
<__unresolvedChoice__><!--[1..1] -->
<OrgId><!--[1..1] OrganisationIdentification2 -->
<BIC>COBADEFF</BIC><!--[0..1] BICIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ -->
<IBEI>AZBDFHJNP0</IBEI><!--[0..1] IBEIIdentifier string /[A-Z]{2,2}[B-DF-HJ-NP-TV-XZ0-9]{7,7}[0-9]{1,1}/ -->
<BEI>BEIADEFF</BEI><!--[0..1] BEIIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ -->
<EANGLN>0909090909090</EANGLN><!--[0..1] EANGLNIdentifier string /[0-9]{13,13}/ -->
...
Example for output in MS-Excel
The columns of the Excel worksheet are filled as follows:
<tbody> </tbody>A | indented elements with generated values |
B | cardinality, multiplicity: minOccurs and maxOccurs |
C | schema type |
D | elementary XML datatype |
E | restrictions: string lengths, number ranges, patterns, value enumerations |
F | annotations attached to the element |
G | absolute XPath to this element node |
H | a single ";" in all relevant (non-descriptive) rows, useful for hiding rows |
Special Rows
There are two sorts of special rows which are useful for the description of the schema, but which will not lead to a valid XML instance.
- With option "-c", any attribute is shown on separate line starting with "@", since attributes also have types, restrictions etc.</>
- Without option "-f", any <xs:choice> leads to an artificial element <__unresolvedChoice__> in order to make visible that this choice must still be resolved to yield a valid XML instance. This resolution could be realized by an XSLT stylesheet or by manual editing.
Value Generation
The tool tries to generate validating values for the most common cases. This works rather well for the ISO 20022 message schemata relevant to SEPA (camt, pacs, pain families), but it will possibly fail for complicated patterns or different application areas.
The values are generated with a fixed set of rules which depend on the elementary data type, sometimes the schema type, and the restrictions. The following table shows these rules:
<thead> </thead> <tbody> </tbody>Datatype | Restriction, Schema Type | Generated Value |
---|---|---|
boolean | true | |
decimal | 1 | |
dataTime | 2007-06-29T04:30:00Z (?) | |
date | 2007-06-29 | |
NCName | NCName | |
decimal | 1 | |
string | (pattern) | (the characters from the pattern repeated up to a minimal length) |
string | (length) | (schema type name truncated or padded with letters) |
string | (enumeration) | (the first alternative) |
string | CurrencyCode | EUR |
string | IBANIdentifier | DE28500400000123456589 |
string | BICIdentifier | COBADEFF |
string | BEIIdentifier | PUTMDEEM |
string | CHIPSUniversalIdentifier | CH012345 |
XmlnsPrefix - Rewrite XML Namespace Prefixes in an XML file
Sometimes it is necessary to change the namespace prefixes in an XML document, to remove one (thus making it the default) or to change the default to an explicit prefix. The latter operation is rather tedious when done manually. With XmlnsPrefix one or more mappings from one prefix to another can be specified. The XML file to be modified must be uploaded to the web server, and the resulting, modified XML file is shown in the browser.
Options:
- -e enc source file encoding, default: UTF-8
- -p old1:new1 change prefix "old1" to prefix "new1" (both may be empty)
- -p old2:new2 ...
- ...
XmlnsXref - Cross-Reference of XML Namespace URIs and their Prefixes in a Set of XML Files
Even very subtle differences in the namespace URIs of XML documents, schemata and stylesheets lead almost inevitably to bad results (when namespaces are evaluated at all). Often it is difficult to track namespace URI modifications in a set of related XML files. XmlnsXref takes several files (on the commandline) or a ZIP file collection (in the web interface) and shows a sorted crossreference list. For each namespace URI the files are shown where the URI occurs, optionally with the prefix used for that namespace URI.
Options:
- -e enc source file encoding, default: UTF-8
- -p show namespace prefixes
- -zip input file is zip archive
Example:
- (missing URI)
- (default): pacs8.xml
- http://schema.punctum.com/test
- sept: gfis2906.xml
- sept: pacs8.xml
- (default): test2906.xml
- urn:S2SCTIcf:xsd:$SCTIcfBlkCredTrf
- S2SCTIcf: gfis2906.xml
- S2SCTIcf: pacs8.xml
- S2SCTIcf: test2906.xml
- urn:iso:std:iso:20022:tech:xsd:S2SCTpacs.008.001.01
- sw8: pacs8.xml
- urn:iso:std:iso:20022:tech:xsd:pacs.008.001.01
- (default): gfis2906.xml
- sw8: test2906.xml
XPathSelect - Evaluation of an XPath expression
Instead of writing a complete XSLT stylesheet, it is sometimes interesting to apply a single XPath expression to an XML document. This tool shows the resulting node set, enclosed in a <result> element.
Options:
- -e enc target file encoding, default: UTF-8
- expr the XPath expression to be applied to the XML document
Back to the xtool input form
Version 1.5, 2007-12-11
Questions, remarks to: Dr. Georg Fischer