NumWord/Hints
Hints for Developers
numword handles numbers only in a limited number of western european languages, and though English, German and French were tested rather thoroughly, there may be remaining mistakes in all language classes.
You are strongly encouraged to:
- play with the language modules,
- email any hints for improvement,
- contribute patches for corrections,
- contribute new language classes.
Please try to remain close to the current programming style:
- Determine the correct 3-letter ISO 636 code for the language, and name the new Java class accordingly. Also look out for a 2-letter language code, and enter both codes in the call of method setIso636 in the constructor.
- Copy the Java source file for an existing language which is close to the new language.
- Extend BaseSpeller or one of the generic subclasses like SlavicSpeller.
- Enter the new language class in the constructor of NumberSpeller. It will automatically appear in the list box of the web application.
- Try to put any fixed strings for parts of number words in the arrays wordN... or put them into the morphem hashmap with putMorphem. In that way you will automatically get the parsing of number words as soon as you finish the generation of number words for the language.
- Write Javadoc comments before all methods and public members.
Note that all Java sources for this application are compiled with:
<javac srcdir="${src.home}" destdir="${build.classes}" listfiles="yes" encoding="utf8" source="1.4" target="1.4" debug="${javac.debug}" debuglevel="${javac.debuglevel}">
Determine the proper accents and non-ASCII characters, and write them in Unicode in the Java source files. Have a look at the Unicode tables as they are displayed in your browser. Use an Unicode enabled editor that handles UTF-8 properly. Write some Unicode characters in the header comment such that the editor can detect the UTF-8 encoding.
- Use reliable sources for the word lists (much of the current material is from Wikipedia). Avoid dialects. Pick the number words for 3,4,5,6,7,8,9,10 and search for them altogether in the Internet to get pages with number word lists.
- Determine whether the language uses the million, milliard ... (French) or the million, billion ... (American) scheme. Make sure how spaces, hyphens, particles like "and" etc. are inserted in the wording for bigger numbers. Decide whether to use "one hundred, one thousand" or "hundred, thousand" only.
- Generate the list of all test numbers and check whether parsing them yields the same numbers.