Documentation/documentation: Difference between revisions

From tehowiki
Jump to navigation Jump to search
imported>Gfis
Created page with "== punctum xtool - Overview == '''[[documentation/./index.jsp|xtool]]''' is a collection of tools for XML processing with special focus on schema quality and namespace consis..."
 
imported>Gfis
No edit summary
 
Line 1: Line 1:
== punctum xtool - Overview ==
== Overview ==


'''[[documentation/./index.jsp|xtool]]''' is a collection of tools for XML processing with special focus on schema quality and namespace consistency. All tools are implemented in Java 1.4. The programmatic interface is described in the [[documentation/docs/api/index|API documentation]].
[[documentation//index.jsp|'''numword''']] deals with the number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. Likewise, a number word can be entered and the program will write the corresponding number as a sequence of digits.


Currently the collection contains:
Furthermore, the module maps numbers to a few calendar-related sets of words in natural languages:


* [[#SchemaList|SchemaList]] - linear W3C XML schema representation
* 1..7 to weekday names (Monday ... Sunday) and their abbreviations
* [[#XmlnsPrefix|XmlnsPrefix]] - modification of XMLnamespace prefixes
* 1..12 to month names (January ... December)
* [[#XmlnsXref|XmlnsXref]] - XML namespace crossreference
* 1..4 to words for seasons (spring, summer, autumn, winter)
* [[#XPathSelect|XPathSelect]] - Evaluation of an XPath expression


== SchemaList - List the Element Tree of a W3C XML Schema ==
== Applications ==


Starting with the first xs:element definition, the type hierarchy of a W3C schema file is recursively expanded. The output is a linear, indented list of the unfolded possible substructures, the leaf XML elements and their attributes.
* Fun for children
* eLearning
* Printing of the amount in words on checks
* Internationalized calendar
* Extraction of numbers and dates from long texts like the bible


Optionally values may be generated, and the tool can generate comments which show the schema type, data type, restrictions and annotations attached to the elements. With value generation and a selection of the first choice, the output will be a well-formed XML instance which usually validates against the input schema. This representation has the big advantage that it shows both the schema design and a real instance of that schema, both combined in a single XML document.
== Possible Future Extensions ==


The schema list can be shown as HTML (the default), plain text, pure XML or in tab separated format which is suitable for MS-Excel. For HTML, the start tags have a link showing the XPath to the element. In Excel, columns can easily be hidden or appended. Such a worksheet is then a good base for the development of additional restrictions, mapping rules and the like.
* angle to compass direction (north, east ...)
* Ordinal numbers (second, third, fourth ...)
* Fractions (half, third, quarter ...)
* Multiples (twice, three times ...), or threefold
* Prefixes for positive (deca, hecto, kilo ...) and negative powers of 10 (dezi, centi, milli ...)
* Genealogical hierarchy
* More languages
* Braille output
* Declination of number words (especially ordinal numbers)
* Evangelists/Gospels
* Apostel/Disciples
* Bible books
* Astrological periods (Sagittarius, Cancer ...), link to ''churchcal''
* Planets of the sun (Mercure = 1, Venus = 2, Earth = 3 ...)
* Euro countries
* Unicode characters
* RGB color values
* Nebulae (M-number) with their galactical positions
* International telephone prefix numbers and corresponding country names (49 = Germany)
* numerical country codes
* Towns by their postal codes
* ISBN book seller prefixes
* Hexadecimal, binary, octal converter
* Text -> SMS digits (multiple or T9)
* reference to Wikipedia article for the number, in specified language (done)
* show on Abacus


The tool may be called on a commandline or from a web page. The following optional settings may be specified:
== Problems ==


* '''<tt>-c</tt>''' show comments with types, restrictions, patterns etc.
* For right-to-left languages (Arabic) the lists are not always displayed properly.
* '''<tt>-e enc</tt>''' source file encoding, default: UTF-8
* Testing is rather difficult if you are not a native speaker of the language.
* '''<tt>-e enc</tt>''' target file encoding, default: UTF-8
* Some languages declinate words for small numbers (1,2,3). The current implementation doesn't attempt to handle variations for gender, numerus and case properly, except for singular/plural of million, milliard, billion etc. Dual is also not yet supported.
* '''<tt>-f</tt>''' show first alternative of choices only
* It is an interesting fact that the numbering schemes in most languages use powers of 1000, but there are exceptions where special words for (powers of) 10000 are used (Klingon, ''SinoSpeller'').
* '''<tt>-m mode</tt>''' output mode: &quot;html&quot; (default), &quot;plain&quot;, &quot;tsv&quot; (for MS-Excel) or &quot;xml&quot;
* For the languages based on powers of 1000, a fixed set of Latin prefixes is used for numbers &gt;= 1 million (c.f. ''BaseSpeller.wordN000''). For some languages these prefixes must be modified (cyrillic character set, c -&gt; k in German).
* '''<tt>-s</tt>''' show start tags only (no end tags)
* Some languages still use different number word sets for living entities, cattle, money etc. There is currently no attempt to handle these variations.
* '''<tt>-v</tt>''' generate element values


These options may be combined. Typical settings are:
See also the [[documentation/developer|hints for developers]].<br />
 
  Back to [[documentation//index.jsp|numword input form]]
* '''<tt>-cvf</tt>''' generate a well-formed, commented XML instance from the schema
* '''<tt>-s</tt>''' only show the minimal indented element structure without end tags
* '''<tt>-cv -m tsv</tt>''' generate an Excel worksheet with comments and values
* '''<tt>-v -m xml</tt>''' show elements with values in the browser's XML representation (elements can be collapsed)
* '''<tt>-vf -m plain</tt>''' generate a concise XML instance file which may be stored and validated
 
In the web interface, the user specifies the desired options and uploads the input schema file to the application on the web server.
 
=== Example for HTML output (-cv -m html) ===
 
&lt;[[documentation/data:text/plain,/Document|Document]] xmlns=&quot;urn:sepade:xsd:pain.001.001.02&quot;&gt;&lt;!--[1..1] Document --&gt;<br />
    &lt;[[documentation/data:text/plain,/Document/pain.001.001.02|pain.001.001.02]]&gt;&lt;!--[1..1] pain.001.001.02 --&gt;<br />
        &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr|GrpHdr]]&gt;&lt;!--[1..1] GroupHeader20 --&gt;<br />
            &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/MsgId|MsgId]]&gt;Max35Text&lt;/MsgId&gt;&lt;!--[1..1] Max35Text string 1..35 --&gt;<br />
            &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/CreDtTm|CreDtTm]]&gt;2007-06-29T05:30:00Z&lt;/CreDtTm&gt;&lt;!--[1..1] ISODateTime dateTime --&gt;<br />
            &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/NbOfTxs|NbOfTxs]]&gt;09&lt;/NbOfTxs&gt;&lt;!--[1..1] Max15NumericText string /[0-9]{1,15}/ --&gt;<br />
            &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/CtrlSum|CtrlSum]]&gt;1&lt;/CtrlSum&gt;&lt;!--[0..1] DecimalNumber decimal L18.17 ! SEPA AOS Can optionally be used as specification for the total amount of the file. --&gt;<br />
            &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/Grpg|Grpg]]&gt;GRPD&lt;/Grpg&gt;&lt;!--[1..1] Grouping2Code string &quot;GRPD&quot; ! Only the GRPD option may be used.--&gt;<br />
              &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty|InitgPty]]&gt;&lt;!--[1..1] PartyIdentification20 ! Initiating party. --&gt;<br />
                &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Nm|Nm]]&gt;Max70Text&lt;/Nm&gt;&lt;!--[1..1] Max70Text string 1..70 ! AT-02 Name of the originator.--&gt;<br />
                &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/PstlAdr|PstlAdr]]&gt;&lt;!--[0..1] PostalAddress5 ! AT-03 Address of the originator.--&gt;<br />
                    &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/PstlAdr/AdrLine|AdrLine]]&gt;Max70Text&lt;/AdrLine&gt;&lt;!--[1..2] Max70Text string 1..70 --&gt;<br />
                    &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/PstlAdr/Ctry|Ctry]]&gt;AZ&lt;/Ctry&gt;&lt;!--[1..1] CountryCode string /[A-Z]{2,2}/ --&gt;<br />
                &lt;/PstlAdr&gt;<br />
                &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id|Id]]&gt;&lt;!--[0..1] Party5Choice ! AT-10 - ID of the originator. Recommendation: This field should not be used.--&gt;<br />
                    &lt;[[documentation/data:text/plain,|__unresolvedChoice__]]&gt;&lt;!--[1..1] --&gt;<br />
                      &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id/OrgId|OrgId]]&gt;&lt;!--[1..1] OrganisationIdentification2 --&gt;<br />
                          &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id/OrgId/BIC|BIC]]&gt;COBADEFF&lt;/BIC&gt;&lt;!--[0..1] BICIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ --&gt;<br />
                          &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id/OrgId/IBEI|IBEI]]&gt;AZBDFHJNP0&lt;/IBEI&gt;&lt;!--[0..1] IBEIIdentifier string /[A-Z]{2,2}[B-DF-HJ-NP-TV-XZ0-9]{7,7}[0-9]{1,1}/ --&gt;<br />
                          &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id/OrgId/BEI|BEI]]&gt;BEIADEFF&lt;/BEI&gt;&lt;!--[0..1] BEIIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ --&gt;<br />
                          &lt;[[documentation/data:text/plain,/Document/pain.001.001.02/GrpHdr/InitgPty/Id/OrgId/EANGLN|EANGLN]]&gt;0909090909090&lt;/EANGLN&gt;&lt;!--[0..1] EANGLNIdentifier string /[0-9]{13,13}/ --&gt;<br />
...
 
=== Example for output in MS-Excel ===
 
[[Image:documentation/list-mtsv.jpg|frame|none|alt=list-mtsv.jpg|caption schema list in Excel worksheet]]
 
The columns of the Excel worksheet are filled as follows:
 
<table>
<tbody>
<tr class="odd">
<td align="left">A</td>
<td align="left">indented elements with generated values</td>
</tr>
<tr class="even">
<td align="left">B</td>
<td align="left">cardinality, multiplicity: minOccurs and maxOccurs</td>
</tr>
<tr class="odd">
<td align="left">C</td>
<td align="left">schema type</td>
</tr>
<tr class="even">
<td align="left">D</td>
<td align="left">elementary XML datatype</td>
</tr>
<tr class="odd">
<td align="left">E</td>
<td align="left">restrictions: string lengths, number ranges, patterns, value enumerations</td>
</tr>
<tr class="even">
<td align="left">F</td>
<td align="left">annotations attached to the element</td>
</tr>
<tr class="odd">
<td align="left">G</td>
<td align="left">absolute XPath to this element node</td>
</tr>
<tr class="even">
<td align="left">H</td>
<td align="left">a single &quot;;&quot; in all relevant (non-descriptive) rows, useful for hiding rows</td>
</tr>
</tbody>
</table>
 
=== Special Rows ===
 
There are two sorts of special rows which are useful for the description of the schema, but which will not lead to a valid XML instance.
 
# With option <tt>&quot;-c&quot;</tt>, any attribute is shown on separate line starting with &quot;@&quot;, since attributes also have types, restrictions etc.&lt;/&gt;
# Without option <tt>&quot;-f&quot;</tt>, any <tt>&lt;xs:choice&gt;</tt> leads to an artificial element <tt>&lt;__unresolvedChoice__&gt;</tt> in order to make visible that this choice must still be resolved to yield a valid XML instance. This resolution could be realized by an XSLT stylesheet or by manual editing.
 
=== Value Generation ===
 
The tool tries to generate validating values for the most common cases. This works rather well for the [http://www.iso20022.org ISO 20022] message schemata relevant to SEPA (camt, pacs, pain families), but it will possibly fail for complicated patterns or different application areas.
 
The values are generated with a fixed set of rules which depend on the elementary data type, sometimes the schema type, and the restrictions. The following table shows these rules:
 
<table>
<thead>
<tr class="header">
<th align="left">Datatype</th>
<th align="left">Restriction, Schema Type</th>
<th align="left">Generated Value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">boolean</td>
<td align="left"> </td>
<td align="left">true</td>
</tr>
<tr class="even">
<td align="left">decimal</td>
<td align="left"> </td>
<td align="left">1</td>
</tr>
<tr class="odd">
<td align="left">dataTime</td>
<td align="left"> </td>
<td align="left">2007-06-29T04:30:00Z   (?)</td>
</tr>
<tr class="even">
<td align="left">date</td>
<td align="left"> </td>
<td align="left">2007-06-29</td>
</tr>
<tr class="odd">
<td align="left">NCName</td>
<td align="left"> </td>
<td align="left">NCName</td>
</tr>
<tr class="even">
<td align="left">decimal</td>
<td align="left"> </td>
<td align="left">1</td>
</tr>
<tr class="odd">
<td align="left">string</td>
<td align="left">(pattern)</td>
<td align="left">(the characters from the pattern repeated up to a minimal length)</td>
</tr>
<tr class="even">
<td align="left">string</td>
<td align="left">(length)</td>
<td align="left">(schema type name truncated or padded with letters)</td>
</tr>
<tr class="odd">
<td align="left">string</td>
<td align="left">(enumeration)</td>
<td align="left">(the first alternative)</td>
</tr>
<tr class="even">
<td align="left">string</td>
<td align="left">CurrencyCode</td>
<td align="left">EUR</td>
</tr>
<tr class="odd">
<td align="left">string</td>
<td align="left">IBANIdentifier</td>
<td align="left">DE28500400000123456589</td>
</tr>
<tr class="even">
<td align="left">string</td>
<td align="left">BICIdentifier</td>
<td align="left">COBADEFF</td>
</tr>
<tr class="odd">
<td align="left">string</td>
<td align="left">BEIIdentifier</td>
<td align="left">PUTMDEEM</td>
</tr>
<tr class="even">
<td align="left">string</td>
<td align="left">CHIPSUniversalIdentifier</td>
<td align="left">CH012345</td>
</tr>
</tbody>
</table>
 
== XmlnsPrefix - Rewrite XML Namespace Prefixes in an XML file ==
 
Sometimes it is necessary to change the namespace prefixes in an XML document, to remove one (thus making it the default) or to change the default to an explicit prefix. The latter operation is rather tedious when done manually. With '''XmlnsPrefix''' one or more mappings from one prefix to another can be specified. The XML file to be modified must be uploaded to the web server, and the resulting, modified XML file is shown in the browser.
 
Options:
 
* <tt>-e enc</tt> source file encoding, default: UTF-8
* <tt>-p old1:new1</tt> change prefix &quot;old1&quot; to prefix &quot;new1&quot; (both may be empty)
* <tt>-p old2:new2</tt> ...
* <tt>...</tt>
 
== XmlnsXref - Cross-Reference of XML Namespace URIs and their Prefixes in a Set of XML Files ==
 
Even very subtle differences in the namespace URIs of XML documents, schemata and stylesheets lead almost inevitably to bad results (when namespaces are evaluated at all). Often it is difficult to track namespace URI modifications in a set of related XML files. '''XmlnsXref''' takes several files (on the commandline) or a ZIP file collection (in the web interface) and shows a sorted crossreference list. For each namespace URI the files are shown where the URI occurs, optionally with the prefix used for that namespace URI.
 
Options:
 
* <tt>-e enc</tt> source file encoding, default: UTF-8
* <tt>-p</tt> show namespace prefixes
* <tt>-zip</tt> input file is zip archive
 
==== Example: ====
 
;  (missing URI)
: (default): pacs8.xml
; http://schema.punctum.com/test
: sept: gfis2906.xml
: sept: pacs8.xml
: (default): test2906.xml
;  urn:S2SCTIcf:xsd:$SCTIcfBlkCredTrf
: S2SCTIcf: gfis2906.xml
: S2SCTIcf: pacs8.xml
: S2SCTIcf: test2906.xml
;  urn:iso:std:iso:20022:tech:xsd:S2SCTpacs.008.001.01
: sw8: pacs8.xml
;  urn:iso:std:iso:20022:tech:xsd:pacs.008.001.01
: (default): gfis2906.xml
: sw8: test2906.xml
 
== XPathSelect - Evaluation of an XPath expression ==
 
Instead of writing a complete XSLT stylesheet, it is sometimes interesting to apply a single XPath expression to an XML document. This tool shows the resulting node set, enclosed in a &lt;result&gt; element.
 
Options:
 
* <tt>-e enc</tt> target file encoding, default: UTF-8
* <tt>expr</tt> the XPath expression to be applied to the XML document
 
Back to the [[documentation/index.jsp|xtool input form]]
 
Version 1.5, 2007-12-11<br />
Questions, remarks to: [[documentation/mailto:punctum@punctum.com|Dr. Georg Fischer]]






[[Category:documentation]]
[[Category:documentation]]

Latest revision as of 09:28, 2 September 2016

Overview

numword deals with the number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. Likewise, a number word can be entered and the program will write the corresponding number as a sequence of digits.

Furthermore, the module maps numbers to a few calendar-related sets of words in natural languages:

  • 1..7 to weekday names (Monday ... Sunday) and their abbreviations
  • 1..12 to month names (January ... December)
  • 1..4 to words for seasons (spring, summer, autumn, winter)

Applications

  • Fun for children
  • eLearning
  • Printing of the amount in words on checks
  • Internationalized calendar
  • Extraction of numbers and dates from long texts like the bible

Possible Future Extensions

  • angle to compass direction (north, east ...)
  • Ordinal numbers (second, third, fourth ...)
  • Fractions (half, third, quarter ...)
  • Multiples (twice, three times ...), or threefold
  • Prefixes for positive (deca, hecto, kilo ...) and negative powers of 10 (dezi, centi, milli ...)
  • Genealogical hierarchy
  • More languages
  • Braille output
  • Declination of number words (especially ordinal numbers)
  • Evangelists/Gospels
  • Apostel/Disciples
  • Bible books
  • Astrological periods (Sagittarius, Cancer ...), link to churchcal
  • Planets of the sun (Mercure = 1, Venus = 2, Earth = 3 ...)
  • Euro countries
  • Unicode characters
  • RGB color values
  • Nebulae (M-number) with their galactical positions
  • International telephone prefix numbers and corresponding country names (49 = Germany)
  • numerical country codes
  • Towns by their postal codes
  • ISBN book seller prefixes
  • Hexadecimal, binary, octal converter
  • Text -> SMS digits (multiple or T9)
  • reference to Wikipedia article for the number, in specified language (done)
  • show on Abacus

Problems

  • For right-to-left languages (Arabic) the lists are not always displayed properly.
  • Testing is rather difficult if you are not a native speaker of the language.
  • Some languages declinate words for small numbers (1,2,3). The current implementation doesn't attempt to handle variations for gender, numerus and case properly, except for singular/plural of million, milliard, billion etc. Dual is also not yet supported.
  • It is an interesting fact that the numbering schemes in most languages use powers of 1000, but there are exceptions where special words for (powers of) 10000 are used (Klingon, SinoSpeller).
  • For the languages based on powers of 1000, a fixed set of Latin prefixes is used for numbers >= 1 million (c.f. BaseSpeller.wordN000). For some languages these prefixes must be modified (cyrillic character set, c -> k in German).
  • Some languages still use different number word sets for living entities, cattle, money etc. There is currently no attempt to handle these variations.

See also the hints for developers.

Back to numword input form