Xtrans: Difference between revisions

From tehowiki
Jump to navigation Jump to search
imported>Gfis
Stub: bugs
 
imported>Gfis
hints
Line 1: Line 1:
=== Bugs ===
== Bugs ==
====General Problems====
====General Problems====
* Though most transformers convert from the raw (specific) format to an XMLized representation, there are a few exceptions where general binary or text files are converted to the specific format which is then wrapped into XML. Examples are Base64, Quoted Prinatble and Morse Code.
* Though most transformers convert from the raw (specific) format to an XMLized representation, there are a few exceptions where general binary or text files are converted to the specific format which is then wrapped into XML. Examples are Base64, Quoted Prinatble and Morse Code.
Line 12: Line 12:
* '''net.URITransformer''' - the set of supported schemas is incomplete, and serializing is not implemented.
* '''net.URITransformer''' - the set of supported schemas is incomplete, and serializing is not implemented.
* '''organizer.LDIFTransformer''' - not well tested, and serializing is not implemented.
* '''organizer.LDIFTransformer''' - not well tested, and serializing is not implemented.
==Hints for Developers==
[[Xtrans]] currently processes only a limited set of formats. You are encouraged to:
* play with the format transformer classes,
* email any suggestions for improvement,
* contribute patches for corrections,
* contribute new transformer classes.
====Coding conventions====
Please try to remain close to the current programming style:
* Write Javadoc comments before all methods and public members.
* Note that the Java sources are compiled with UTF-8 source encoding:
    <javac  srcdir="${src.home}" destdir="${build.classes}" listfiles="yes"
            encoding="utf8"
            source="1.4" target="1.4"
            debug="${javac.debug}" debuglevel="${javac.debuglevel}">
:Determine the proper accents and non-ASCII characters, and write them in Unicode in the Java source files. Use an Unicode enabled editor that handles UTF-8 properly; write some Unicode characters in the header comment such that the editor can detect the UTF-8 encoding.
* Use reliable sources for the format definition like [https://tools.ietf.org/html/ RFCs] or ISO standards, and document them in the Javadoc header of the class.
====Reversibility====
The transformers should try to serialize XML to exactly the same specific format from which they are able to generate XML. The test Ant targets perform a "generate - serialize - binary compare" sequence to check the reversibility of the transformation.
Some formats don't have a well-defined canonical representation. In JCL, for example, the line breaks and the spaces for field separation are lost in the XML representation, and cannot exactly be reproduced by the serializer. In these cases, subsequent "generate - serialize" sequences should finally produce an identical result.
====Future Extensions====
* more text processing formats:
**(La)TeX - similiar to RTF
**dot instruction oriented formats: IBM DCF, nroff, troff, perldoc
**binary formats like IBM DCA/RFT, Siemens Hit, WordPerfect
**common tagset for text processing features
*raster image processing formats:
**TIFF
**EXIF - at least the header
**GIF, BMP etc.
*vector image processing formats with target SVG:
**WMF
**Flash?
**RTF DO, AmiPro, WordPerfect Graphics ...
*ZIP file tree pseudo transformer

Revision as of 19:09, 6 September 2016

Bugs

General Problems

  • Though most transformers convert from the raw (specific) format to an XMLized representation, there are a few exceptions where general binary or text files are converted to the specific format which is then wrapped into XML. Examples are Base64, Quoted Prinatble and Morse Code.
  • Most transformers store values in XML elements, but sometimes it seemed easier to store them in attributes of elements. DTA and Datev are examples for the latter case.
  • For formats with many different tags (SWIFT for example) the question arises whether such tags are syntax or data. These tags can be converted to id attributes of a generalized XML "field" element, or a seperate element for each such tag can be generated. The SwiftTransformer made the latter decision.

Test

  • Not all format conversions are precisely reversible.
  • There are only a few test cases.

Incompletene Transformers

  • general.XMLTransformer - insufficient serialization of entities; serializer should be replaced by Apaches's
  • general.CountingTransformer - cannot generate, but serializes any XML to a sorted list with counts for all elements, and the accumulated length of their direct character content
  • net.URITransformer - the set of supported schemas is incomplete, and serializing is not implemented.
  • organizer.LDIFTransformer - not well tested, and serializing is not implemented.

Hints for Developers

Xtrans currently processes only a limited set of formats. You are encouraged to:

  • play with the format transformer classes,
  • email any suggestions for improvement,
  • contribute patches for corrections,
  • contribute new transformer classes.

Coding conventions

Please try to remain close to the current programming style:

  • Write Javadoc comments before all methods and public members.
  • Note that the Java sources are compiled with UTF-8 source encoding:
   <javac  srcdir="${src.home}" destdir="${build.classes}" listfiles="yes"
           encoding="utf8"
           source="1.4" target="1.4"
           debug="${javac.debug}" debuglevel="${javac.debuglevel}">
Determine the proper accents and non-ASCII characters, and write them in Unicode in the Java source files. Use an Unicode enabled editor that handles UTF-8 properly; write some Unicode characters in the header comment such that the editor can detect the UTF-8 encoding.
  • Use reliable sources for the format definition like RFCs or ISO standards, and document them in the Javadoc header of the class.

Reversibility

The transformers should try to serialize XML to exactly the same specific format from which they are able to generate XML. The test Ant targets perform a "generate - serialize - binary compare" sequence to check the reversibility of the transformation.

Some formats don't have a well-defined canonical representation. In JCL, for example, the line breaks and the spaces for field separation are lost in the XML representation, and cannot exactly be reproduced by the serializer. In these cases, subsequent "generate - serialize" sequences should finally produce an identical result.

Future Extensions

  • more text processing formats:
    • (La)TeX - similiar to RTF
    • dot instruction oriented formats: IBM DCF, nroff, troff, perldoc
    • binary formats like IBM DCA/RFT, Siemens Hit, WordPerfect
    • common tagset for text processing features
  • raster image processing formats:
    • TIFF
    • EXIF - at least the header
    • GIF, BMP etc.
  • vector image processing formats with target SVG:
    • WMF
    • Flash?
    • RTF DO, AmiPro, WordPerfect Graphics ...
  • ZIP file tree pseudo transformer