NumWord: Difference between revisions

From tehowiki
Jump to navigation Jump to search
imported>Gfis
from documentation.html
 
imported>Gfis
cli usage
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Overview ==
'''NumWord''' generates and parses number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. For example, it tells you that
* ''194706'' is ''einhundertvierundneunzigtausendsiebenhundertsechs'' in German, or that
* ''mille neuf cent quarante-sept'' is ''1947'' in French.


'''NumWord''' deals with the number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. Likewise, a number word can be entered and the program will write the corresponding number as a sequence of digits.
The program can be used for vocabulary trainers, calendar programs, spelling clocks, check amounts in writing and the like.


Furthermore, the module maps numbers to a few calendar-related sets of words in natural languages:
In western languages, the higher numbers use common latin prefixes for powers of 10^3 or 10^6: "mill", "bill", "trill" and so on. In German, the program spells numbers up to (10^6)^20 = "Vigintilliarden".
 
===Implemented Languages ===
<code>'''ara'''</code> - Arabic (ديسمبر) <code>'''cze'''</code> - Czech (ceština) <code>'''chi'''</code> - Chinese (中文 (zhōngwén)) <code>'''dan'''</code> - Danish (Dansk) <code>'''deu'''</code> - German (Deutsch) <code>'''eng'''</code> - English <code>'''epo'''</code> - Esperanto <code>'''est'''</code> - Estonian <code>'''fin'''</code> - Finnish <code>'''fra'''</code> - French (Français) <code>'''gle'''</code> - Irish (Gaeilge) <code>'''geo'''</code> - Georgian <code>'''gre'''</code> - Greek (Ελληνικά) <code>'''hun'''</code> - Hungarian (Magyar) <code>'''ice'''</code> - Icelandic (íslenska) <code>'''ita'''</code> - Italian (Italiano) <code>'''jpn'''</code> - Japanese <code>'''kor'''</code> - Korean <code>'''lat'''</code> - Latin (Latinum) <code>'''lav'''</code> - Latvian (Latviešu) <code>'''lit'''</code> - Lithuanian (Lietuviu) <code>'''nld'''</code> - Dutch (Nederlands) <code>'''nor'''</code> - Norwegian (Norsk) <code>'''pol'''</code> - Polish (Polski) <code>'''por'''</code> - Portuguese (Português) <code>'''roh'''</code> - Rumantsch Grischun <code>'''ron'''</code> - Romanian (Româna) <code>'''rus'''</code> - Russian (Русский) <code>'''slo'''</code> - Slovak <code>'''slv'''</code> - Slovenian (Slovenšcina) <code>'''spa'''</code> - Spanish (Español) <code>'''swe'''</code> - Swedish (Svenska) <code>'''tha'''</code> - Thai (ไทย) <code>'''tlh'''</code> - Klingon (tlhIngan-Hol) <code>'''tur'''</code> - Turkish (Türkçe) <code>'''vie'''</code> - Vietnamese (ti?ng Vi?t).


* 1..7 to weekday names (Monday ... Sunday) and their abbreviations
Furthermore, there is <code>'''roman'''</code> for Roman numbers (MCMXLVII = 1947) , <code>'''braille'''</code> for Braille codes, and most Unicode symbols can be displayed.  
* 1..12 to month names (January ... December)
* 1..4 to words for seasons (spring, summer, autumn, winter)
 
== Applications ==


===Categories which can be spelled===
The application maps numbers to the following categories of words in natural languages:
* Digits as Word 0- (spell the number as word)
* Word as Digits (parse number word and return the number as digits)
* Month 1-12 (january - december)
* Month's abbreviation 1-12
* Weekday 1-7 (monday - sunday)
* Weekday's abbreviation 1-7
* Season 1-4 (spring, summer, autumn, winter)
* Greeting 1-4 (morning, noon, evening, night)
* Time of aay 00:00 - 24:00 h
** offical Time - variant 1
** Time - variant 2
** Time - variant 3
* Cardinal direction 0-360 degrees
* Planet in the [https://en.wikipedia.org/wiki/Solar_System Solar System] -1, 0, 1-8 (Moon, Sun, Mercury - Neptun)
Generally, when the field ''Digits/Word'' is empty or zero, a set of test cases is displayed.
=== Applications ===
* Fun for children
* Fun for children
* eLearning
* eLearning
Line 16: Line 35:
* Internationalized calendar
* Internationalized calendar
* Extraction of numbers and dates from long texts like the bible
* Extraction of numbers and dates from long texts like the bible
 
===Commandline Usage===
== Possible Future Extensions ==
usage:  java org.teherba.numword.NumwordCommand [-l iso [-c|-m[3]|-s|-w[2]|-g] [number]]
 
        java org.teherba.numword.NumwordCommand -l iso (-f|-t) filename
        java org.teherba.numword.NumwordCommand -l iso -p number-word
        java org.teherba.numword.NumwordCommand -l iso [-m[3]|-s|-w[2]] -p [month|weekday|season]
  -f    find and replace number words in text file
  -c    print cardianal number word (default)
  -d    print compass direction (parameter is degrees, 270 = West)
  -g    print greeting corresponding to day's time (6, 12, 18, 24, 0)
  -hi  print hour of clock (h0 = official, h1,h2,h3 variants)
  -m    print 12 month names
  -m3  print 12 months' abbreviations (3 letters)
  -s    print 4 seasons of the year
  -w    print 7 weekdays
  -w2  print 7 weekdays' abbreviations (2 letters)
  -p    parse word, return digits or cardinal number
  -t    test against file consisting of lines with: digits tab number-word
  -l    ara,ar  ara - Arabic (ديسمبر)
    cze,ces,cs  cze - Czech (čeština)
    chi,zho,zh  chi - Chinese 中文 (zhōngwén)
    dan,da  dan - Danish (Dansk)
    deu,ger,de  deu - German (Deutsch)
    eng,en  eng - English
    epo,eo  epo - Esperanto
    est,et  est - Estonian
    fin,fi  fin - Finnish
    fra,fr  fra - French (Français)
    gle,ga,ir  gle - Irish (Gaeilge)
    geo,ka,kat  geo - Georgian
    gre,ell,el  gre - Greek (Ελληνικά)
    hun,mag,hu  hun - Hungarian (Magyar)
    ice,isl,is  ice - Icelandic (íslenska)
    ita,it  ita - Italian (Italiano)
    jpn,ja  jpn - Japanese
    kor,ko  kor - Korean
    lat,la  lat - Latin (Latinum)
    lav,lv  lav - Latvian (Latviešu)
    lit,lt  lit - Lithuanian (Lietuvių)
    nld,nl  nld - Dutch (Nederlands)
    nor,no  nor - Norwegian (Norsk)
    pol,pl  pol - Polish (Polski)
    por,pt  por - Portuguese (Português)
    roh,rm  roh - Rumantsch Grischun
    ron,ro,rum  ron - Romanian (Română)
    rus,ru  rus - Russian (Русский)
    slo,slk,sk  slo - Slovak
    slv,sl  slv - Slovenian (Slovenščina)
    spa,es  spa - Spanish (Español)
    swe,sv  swe - Swedish (Svenska)
    tha,th  tha - Thai (ไทย)
    tlh tlh - Klingon (tlhIngan-Hol)
    tur,tr  tur - Turkish (Türkçe)
    vie,vi  vie - Vietnamese (tiếng Việt)
    braille braille - Braille Code
    morse  morse - Morse Code
    roman  roman - Roman Numbers
=== Possible Future Extensions ===
* angle to compass direction (north, east ...)
* angle to compass direction (north, east ...)
* Ordinal numbers (second, third, fourth ...)
* Ordinal numbers (second, third, fourth ...)
Line 26: Line 100:
* Genealogical hierarchy
* Genealogical hierarchy
* More languages
* More languages
* Braille output
* Declination of number words (especially ordinal numbers)
* Declination of number words (especially ordinal numbers)
* Evangelists/Gospels
* Evangelists/Gospels
Line 32: Line 105:
* Bible books
* Bible books
* Astrological periods (Sagittarius, Cancer ...), link to ''churchcal''
* Astrological periods (Sagittarius, Cancer ...), link to ''churchcal''
* Planets of the sun (Mercure = 1, Venus = 2, Earth = 3 ...)
* Euro countries
* Euro countries
* Unicode characters
* RGB color values
* RGB color values
* Nebulae (M-number) with their galactical positions
* Nebulae (M-number) with their galactical positions
Line 45: Line 116:
* reference to Wikipedia article for the number, in specified language (done)
* reference to Wikipedia article for the number, in specified language (done)
* show on Abacus
* show on Abacus
 
=== Problems ===
== Problems ==
 
* For right-to-left languages (Arabic) the lists are not always displayed properly.
* For right-to-left languages (Arabic) the lists are not always displayed properly.
* Testing is rather difficult if you are not a native speaker of the language.
* Testing is rather difficult if you are not a native speaker of the language.

Latest revision as of 13:41, 11 September 2016

NumWord generates and parses number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. For example, it tells you that

  • 194706 is einhundertvierundneunzigtausendsiebenhundertsechs in German, or that
  • mille neuf cent quarante-sept is 1947 in French.

The program can be used for vocabulary trainers, calendar programs, spelling clocks, check amounts in writing and the like.

In western languages, the higher numbers use common latin prefixes for powers of 10^3 or 10^6: "mill", "bill", "trill" and so on. In German, the program spells numbers up to (10^6)^20 = "Vigintilliarden".

Implemented Languages

ara - Arabic (ديسمبر) cze - Czech (ceština) chi - Chinese (中文 (zhōngwén)) dan - Danish (Dansk) deu - German (Deutsch) eng - English epo - Esperanto est - Estonian fin - Finnish fra - French (Français) gle - Irish (Gaeilge) geo - Georgian gre - Greek (Ελληνικά) hun - Hungarian (Magyar) ice - Icelandic (íslenska) ita - Italian (Italiano) jpn - Japanese kor - Korean lat - Latin (Latinum) lav - Latvian (Latviešu) lit - Lithuanian (Lietuviu) nld - Dutch (Nederlands) nor - Norwegian (Norsk) pol - Polish (Polski) por - Portuguese (Português) roh - Rumantsch Grischun ron - Romanian (Româna) rus - Russian (Русский) slo - Slovak slv - Slovenian (Slovenšcina) spa - Spanish (Español) swe - Swedish (Svenska) tha - Thai (ไทย) tlh - Klingon (tlhIngan-Hol) tur - Turkish (Türkçe) vie - Vietnamese (ti?ng Vi?t).

Furthermore, there is roman for Roman numbers (MCMXLVII = 1947) , braille for Braille codes, and most Unicode symbols can be displayed.

Categories which can be spelled

The application maps numbers to the following categories of words in natural languages:

  • Digits as Word 0- (spell the number as word)
  • Word as Digits (parse number word and return the number as digits)
  • Month 1-12 (january - december)
  • Month's abbreviation 1-12
  • Weekday 1-7 (monday - sunday)
  • Weekday's abbreviation 1-7
  • Season 1-4 (spring, summer, autumn, winter)
  • Greeting 1-4 (morning, noon, evening, night)
  • Time of aay 00:00 - 24:00 h
    • offical Time - variant 1
    • Time - variant 2
    • Time - variant 3
  • Cardinal direction 0-360 degrees
  • Planet in the Solar System -1, 0, 1-8 (Moon, Sun, Mercury - Neptun)

Generally, when the field Digits/Word is empty or zero, a set of test cases is displayed.

Applications

  • Fun for children
  • eLearning
  • Printing of the amount in words on checks
  • Internationalized calendar
  • Extraction of numbers and dates from long texts like the bible

Commandline Usage

usage:  java org.teherba.numword.NumwordCommand [-l iso [-c|-m[3]|-s|-w[2]|-g] [number]]
        java org.teherba.numword.NumwordCommand -l iso (-f|-t) filename
        java org.teherba.numword.NumwordCommand -l iso -p number-word
        java org.teherba.numword.NumwordCommand -l iso [-m[3]|-s|-w[2]] -p [month|weekday|season]
  -f    find and replace number words in text file
  -c    print cardianal number word (default)
  -d    print compass direction (parameter is degrees, 270 = West)
  -g    print greeting corresponding to day's time (6, 12, 18, 24, 0)
  -hi   print hour of clock (h0 = official, h1,h2,h3 variants)
  -m    print 12 month names
  -m3   print 12 months' abbreviations (3 letters)
  -s    print 4 seasons of the year
  -w    print 7 weekdays
  -w2   print 7 weekdays' abbreviations (2 letters)
  -p    parse word, return digits or cardinal number
  -t    test against file consisting of lines with: digits tab number-word
  -l    ara,ar  ara - Arabic (ديسمبر)
    cze,ces,cs  cze - Czech (čeština)
    chi,zho,zh  chi - Chinese 中文 (zhōngwén)
    dan,da  dan - Danish (Dansk)
    deu,ger,de  deu - German (Deutsch)
    eng,en  eng - English
    epo,eo  epo - Esperanto
    est,et  est - Estonian
    fin,fi  fin - Finnish
    fra,fr  fra - French (Français)
    gle,ga,ir   gle - Irish (Gaeilge)
    geo,ka,kat  geo - Georgian
    gre,ell,el  gre - Greek (Ελληνικά)
    hun,mag,hu  hun - Hungarian (Magyar)
    ice,isl,is  ice - Icelandic (íslenska)
    ita,it  ita - Italian (Italiano)
    jpn,ja  jpn - Japanese
    kor,ko  kor - Korean
    lat,la  lat - Latin (Latinum)
    lav,lv  lav - Latvian (Latviešu)
    lit,lt  lit - Lithuanian (Lietuvių)
    nld,nl  nld - Dutch (Nederlands)
    nor,no  nor - Norwegian (Norsk)
    pol,pl  pol - Polish (Polski)
    por,pt  por - Portuguese (Português)
    roh,rm  roh - Rumantsch Grischun
    ron,ro,rum  ron - Romanian (Română)
    rus,ru  rus - Russian (Русский)
    slo,slk,sk  slo - Slovak
    slv,sl  slv - Slovenian (Slovenščina)
    spa,es  spa - Spanish (Español)
    swe,sv  swe - Swedish (Svenska)
    tha,th  tha - Thai (ไทย)
    tlh tlh - Klingon (tlhIngan-Hol)
    tur,tr  tur - Turkish (Türkçe)
    vie,vi  vie - Vietnamese (tiếng Việt)
    braille braille - Braille Code
    morse   morse - Morse Code
    roman   roman - Roman Numbers

Possible Future Extensions

  • angle to compass direction (north, east ...)
  • Ordinal numbers (second, third, fourth ...)
  • Fractions (half, third, quarter ...)
  • Multiples (twice, three times ...), or threefold
  • Prefixes for positive (deca, hecto, kilo ...) and negative powers of 10 (dezi, centi, milli ...)
  • Genealogical hierarchy
  • More languages
  • Declination of number words (especially ordinal numbers)
  • Evangelists/Gospels
  • Apostel/Disciples
  • Bible books
  • Astrological periods (Sagittarius, Cancer ...), link to churchcal
  • Euro countries
  • RGB color values
  • Nebulae (M-number) with their galactical positions
  • International telephone prefix numbers and corresponding country names (49 = Germany)
  • numerical country codes
  • Towns by their postal codes
  • ISBN book seller prefixes
  • Hexadecimal, binary, octal converter
  • Text -> SMS digits (multiple or T9)
  • reference to Wikipedia article for the number, in specified language (done)
  • show on Abacus

Problems

  • For right-to-left languages (Arabic) the lists are not always displayed properly.
  • Testing is rather difficult if you are not a native speaker of the language.
  • Some languages declinate words for small numbers (1,2,3). The current implementation doesn't attempt to handle variations for gender, numerus and case properly, except for singular/plural of million, milliard, billion etc. Dual is also not yet supported.
  • It is an interesting fact that the numbering schemes in most languages use powers of 1000, but there are exceptions where special words for (powers of) 10000 are used (Klingon, SinoSpeller).
  • For the languages based on powers of 1000, a fixed set of Latin prefixes is used for numbers >= 1 million (c.f. BaseSpeller.wordN000). For some languages these prefixes must be modified (cyrillic character set, c -> k in German).
  • Some languages still use different number word sets for living entities, cattle, money etc. There is currently no attempt to handle these variations.

See also the hints and bugs for developers.