NumWord

From tehowiki
Jump to navigation Jump to search

NumWord generates and parses number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. For example, it tells you that

  • 194706 is einhundertvierundneunzigtausendsiebenhundertsechs in German, or that
  • mille neuf cent quarante-sept is 1947 in French.

The program can be used for vocabulary trainers, calendar programs, spelling clocks, check amounts in writing and the like.

In western languages, the higher numbers use common latin prefixes for powers of 10^3 or 10^6: "mill", "bill", "trill" and so on. In German, the program spells numbers up to (10^6)^20 = "Vigintilliarden".

Implemented Languages

ara - Arabic (ديسمبر) cze - Czech (ceština) chi - Chinese (中文 (zhōngwén)) dan - Danish (Dansk) deu - German (Deutsch) eng - English epo - Esperanto est - Estonian fin - Finnish fra - French (Français) gle - Irish (Gaeilge) geo - Georgian gre - Greek (Ελληνικά) hun - Hungarian (Magyar) ice - Icelandic (íslenska) ita - Italian (Italiano) jpn - Japanese kor - Korean lat - Latin (Latinum) lav - Latvian (Latviešu) lit - Lithuanian (Lietuviu) nld - Dutch (Nederlands) nor - Norwegian (Norsk) pol - Polish (Polski) por - Portuguese (Português) roh - Rumantsch Grischun ron - Romanian (Româna) rus - Russian (Русский) slo - Slovak slv - Slovenian (Slovenšcina) spa - Spanish (Español) swe - Swedish (Svenska) tha - Thai (ไทย) tlh - Klingon (tlhIngan-Hol) tur - Turkish (Türkçe) vie - Vietnamese (ti?ng Vi?t).

Furthermore, there is roman for Roman numbers (MCMXLVII = 1947) , braille for Braille codes, and most Unicode symbols can be displayed.

Categories which can be spelled

The application maps numbers to the following categories of words in natural languages:

  • Digits as Word 0- (spell the number as word)
  • Word as Digits (parse number word and return the number as digits)
  • Month 1-12 (january - december)
  • Month's abbreviation 1-12
  • Weekday 1-7 (monday - sunday)
  • Weekday's abbreviation 1-7
  • Season 1-4 (spring, summer, autumn, winter)
  • Greeting 1-4 (morning, noon, evening, night)
  • Time of aay 00:00 - 24:00 h
    • offical Time - variant 1
    • Time - variant 2
    • Time - variant 3
  • Cardinal direction 0-360 degrees
  • Planet in the Solar System -1, 0, 1-8 (Moon, Sun, Mercury - Neptun)

Generally, when the field Digits/Word is empty or zero, a set of test cases is displayed.

Applications

  • Fun for children
  • eLearning
  • Printing of the amount in words on checks
  • Internationalized calendar
  • Extraction of numbers and dates from long texts like the bible

Commandline Usage

usage:  java org.teherba.numword.NumwordCommand [-l iso [-c|-m[3]|-s|-w[2]|-g] [number]]
        java org.teherba.numword.NumwordCommand -l iso (-f|-t) filename
        java org.teherba.numword.NumwordCommand -l iso -p number-word
        java org.teherba.numword.NumwordCommand -l iso [-m[3]|-s|-w[2]] -p [month|weekday|season]
  -f    find and replace number words in text file
  -c    print cardianal number word (default)
  -d    print compass direction (parameter is degrees, 270 = West)
  -g    print greeting corresponding to day's time (6, 12, 18, 24, 0)
  -hi   print hour of clock (h0 = official, h1,h2,h3 variants)
  -m    print 12 month names
  -m3   print 12 months' abbreviations (3 letters)
  -s    print 4 seasons of the year
  -w    print 7 weekdays
  -w2   print 7 weekdays' abbreviations (2 letters)
  -p    parse word, return digits or cardinal number
  -t    test against file consisting of lines with: digits tab number-word
  -l    ara,ar  ara - Arabic (ديسمبر)
    cze,ces,cs  cze - Czech (čeština)
    chi,zho,zh  chi - Chinese 中文 (zhōngwén)
    dan,da  dan - Danish (Dansk)
    deu,ger,de  deu - German (Deutsch)
    eng,en  eng - English
    epo,eo  epo - Esperanto
    est,et  est - Estonian
    fin,fi  fin - Finnish
    fra,fr  fra - French (Français)
    gle,ga,ir   gle - Irish (Gaeilge)
    geo,ka,kat  geo - Georgian
    gre,ell,el  gre - Greek (Ελληνικά)
    hun,mag,hu  hun - Hungarian (Magyar)
    ice,isl,is  ice - Icelandic (íslenska)
    ita,it  ita - Italian (Italiano)
    jpn,ja  jpn - Japanese
    kor,ko  kor - Korean
    lat,la  lat - Latin (Latinum)
    lav,lv  lav - Latvian (Latviešu)
    lit,lt  lit - Lithuanian (Lietuvių)
    nld,nl  nld - Dutch (Nederlands)
    nor,no  nor - Norwegian (Norsk)
    pol,pl  pol - Polish (Polski)
    por,pt  por - Portuguese (Português)
    roh,rm  roh - Rumantsch Grischun
    ron,ro,rum  ron - Romanian (Română)
    rus,ru  rus - Russian (Русский)
    slo,slk,sk  slo - Slovak
    slv,sl  slv - Slovenian (Slovenščina)
    spa,es  spa - Spanish (Español)
    swe,sv  swe - Swedish (Svenska)
    tha,th  tha - Thai (ไทย)
    tlh tlh - Klingon (tlhIngan-Hol)
    tur,tr  tur - Turkish (Türkçe)
    vie,vi  vie - Vietnamese (tiếng Việt)
    braille braille - Braille Code
    morse   morse - Morse Code
    roman   roman - Roman Numbers

Possible Future Extensions

  • angle to compass direction (north, east ...)
  • Ordinal numbers (second, third, fourth ...)
  • Fractions (half, third, quarter ...)
  • Multiples (twice, three times ...), or threefold
  • Prefixes for positive (deca, hecto, kilo ...) and negative powers of 10 (dezi, centi, milli ...)
  • Genealogical hierarchy
  • More languages
  • Declination of number words (especially ordinal numbers)
  • Evangelists/Gospels
  • Apostel/Disciples
  • Bible books
  • Astrological periods (Sagittarius, Cancer ...), link to churchcal
  • Euro countries
  • RGB color values
  • Nebulae (M-number) with their galactical positions
  • International telephone prefix numbers and corresponding country names (49 = Germany)
  • numerical country codes
  • Towns by their postal codes
  • ISBN book seller prefixes
  • Hexadecimal, binary, octal converter
  • Text -> SMS digits (multiple or T9)
  • reference to Wikipedia article for the number, in specified language (done)
  • show on Abacus

Problems

  • For right-to-left languages (Arabic) the lists are not always displayed properly.
  • Testing is rather difficult if you are not a native speaker of the language.
  • Some languages declinate words for small numbers (1,2,3). The current implementation doesn't attempt to handle variations for gender, numerus and case properly, except for singular/plural of million, milliard, billion etc. Dual is also not yet supported.
  • It is an interesting fact that the numbering schemes in most languages use powers of 1000, but there are exceptions where special words for (powers of) 10000 are used (Klingon, SinoSpeller).
  • For the languages based on powers of 1000, a fixed set of Latin prefixes is used for numbers >= 1 million (c.f. BaseSpeller.wordN000). For some languages these prefixes must be modified (cyrillic character set, c -> k in German).
  • Some languages still use different number word sets for living entities, cattle, money etc. There is currently no attempt to handle these variations.

See also the hints and bugs for developers.