NumWord: Difference between revisions
imported>Gfis Unicode for native language denotation |
imported>Gfis cli usage |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
'''NumWord''' | '''NumWord''' generates and parses number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. For example, it tells you that | ||
* ''194706'' is ''einhundertvierundneunzigtausendsiebenhundertsechs'' in German, or that | |||
* ''mille neuf cent quarante-sept'' is ''1947'' in French. | |||
The program can be used for vocabulary trainers, calendar programs, spelling clocks, check amounts in writing and the like. | |||
In western languages, the higher numbers use common latin prefixes for powers of 10^3 or 10^6: "mill", "bill", "trill" and so on. In German, the program spells numbers up to (10^6)^20 = "Vigintilliarden". | |||
===Implemented Languages === | |||
<code>'''ara'''</code> - Arabic (ديسمبر) <code>'''cze'''</code> - Czech (ceština) <code>'''chi'''</code> - Chinese (中文 (zhōngwén)) <code>'''dan'''</code> - Danish (Dansk) <code>'''deu'''</code> - German (Deutsch) <code>'''eng'''</code> - English <code>'''epo'''</code> - Esperanto <code>'''est'''</code> - Estonian <code>'''fin'''</code> - Finnish <code>'''fra'''</code> - French (Français) <code>'''gle'''</code> - Irish (Gaeilge) <code>'''geo'''</code> - Georgian <code>'''gre'''</code> - Greek (Ελληνικά) <code>'''hun'''</code> - Hungarian (Magyar) <code>'''ice'''</code> - Icelandic (íslenska) <code>'''ita'''</code> - Italian (Italiano) <code>'''jpn'''</code> - Japanese <code>'''kor'''</code> - Korean <code>'''lat'''</code> - Latin (Latinum) <code>'''lav'''</code> - Latvian (Latviešu) <code>'''lit'''</code> - Lithuanian (Lietuviu) <code>'''nld'''</code> - Dutch (Nederlands) <code>'''nor'''</code> - Norwegian (Norsk) <code>'''pol'''</code> - Polish (Polski) <code>'''por'''</code> - Portuguese (Português) <code>'''roh'''</code> - Rumantsch Grischun <code>'''ron'''</code> - Romanian (Româna) <code>'''rus'''</code> - Russian (Русский) <code>'''slo'''</code> - Slovak <code>'''slv'''</code> - Slovenian (Slovenšcina) <code>'''spa'''</code> - Spanish (Español) <code>'''swe'''</code> - Swedish (Svenska) <code>'''tha'''</code> - Thai (ไทย) <code>'''tlh'''</code> - Klingon (tlhIngan-Hol) <code>'''tur'''</code> - Turkish (Türkçe) <code>'''vie'''</code> - Vietnamese (ti?ng Vi?t). | |||
Furthermore, there is <code>'''roman'''</code> for Roman numbers (MCMXLVII = 1947) , <code>'''braille'''</code> for Braille codes, and most Unicode symbols can be displayed. | |||
<code>''' | |||
===Categories which can be spelled=== | ===Categories which can be spelled=== | ||
* Digits as Word | The application maps numbers to the following categories of words in natural languages: | ||
* Word as Digits | * Digits as Word 0- (spell the number as word) | ||
* Month | * Word as Digits (parse number word and return the number as digits) | ||
* Month's | * Month 1-12 (january - december) | ||
* Weekday | * Month's abbreviation 1-12 | ||
* Weekday's | * Weekday 1-7 (monday - sunday) | ||
* Season | * Weekday's abbreviation 1-7 | ||
* Greeting | * Season 1-4 (spring, summer, autumn, winter) | ||
* Time of | * Greeting 1-4 (morning, noon, evening, night) | ||
* Time of aay 00:00 - 24:00 h | |||
** offical Time - variant 1 | ** offical Time - variant 1 | ||
** Time - variant 2 | ** Time - variant 2 | ||
** Time - variant 3 | ** Time - variant 3 | ||
* Cardinal | * Cardinal direction 0-360 degrees | ||
* Planet | * Planet in the [https://en.wikipedia.org/wiki/Solar_System Solar System] -1, 0, 1-8 (Moon, Sun, Mercury - Neptun) | ||
Generally, when the field ''Digits/Word'' is empty or zero, a set of test cases is displayed. | |||
=== Applications === | === Applications === | ||
* Fun for children | * Fun for children | ||
Line 31: | Line 35: | ||
* Internationalized calendar | * Internationalized calendar | ||
* Extraction of numbers and dates from long texts like the bible | * Extraction of numbers and dates from long texts like the bible | ||
===Commandline Usage=== | |||
usage: java org.teherba.numword.NumwordCommand [-l iso [-c|-m[3]|-s|-w[2]|-g] [number]] | |||
java org.teherba.numword.NumwordCommand -l iso (-f|-t) filename | |||
java org.teherba.numword.NumwordCommand -l iso -p number-word | |||
java org.teherba.numword.NumwordCommand -l iso [-m[3]|-s|-w[2]] -p [month|weekday|season] | |||
-f find and replace number words in text file | |||
-c print cardianal number word (default) | |||
-d print compass direction (parameter is degrees, 270 = West) | |||
-g print greeting corresponding to day's time (6, 12, 18, 24, 0) | |||
-hi print hour of clock (h0 = official, h1,h2,h3 variants) | |||
-m print 12 month names | |||
-m3 print 12 months' abbreviations (3 letters) | |||
-s print 4 seasons of the year | |||
-w print 7 weekdays | |||
-w2 print 7 weekdays' abbreviations (2 letters) | |||
-p parse word, return digits or cardinal number | |||
-t test against file consisting of lines with: digits tab number-word | |||
-l ara,ar ara - Arabic (ديسمبر) | |||
cze,ces,cs cze - Czech (čeština) | |||
chi,zho,zh chi - Chinese 中文 (zhōngwén) | |||
dan,da dan - Danish (Dansk) | |||
deu,ger,de deu - German (Deutsch) | |||
eng,en eng - English | |||
epo,eo epo - Esperanto | |||
est,et est - Estonian | |||
fin,fi fin - Finnish | |||
fra,fr fra - French (Français) | |||
gle,ga,ir gle - Irish (Gaeilge) | |||
geo,ka,kat geo - Georgian | |||
gre,ell,el gre - Greek (Ελληνικά) | |||
hun,mag,hu hun - Hungarian (Magyar) | |||
ice,isl,is ice - Icelandic (íslenska) | |||
ita,it ita - Italian (Italiano) | |||
jpn,ja jpn - Japanese | |||
kor,ko kor - Korean | |||
lat,la lat - Latin (Latinum) | |||
lav,lv lav - Latvian (Latviešu) | |||
lit,lt lit - Lithuanian (Lietuvių) | |||
nld,nl nld - Dutch (Nederlands) | |||
nor,no nor - Norwegian (Norsk) | |||
pol,pl pol - Polish (Polski) | |||
por,pt por - Portuguese (Português) | |||
roh,rm roh - Rumantsch Grischun | |||
ron,ro,rum ron - Romanian (Română) | |||
rus,ru rus - Russian (Русский) | |||
slo,slk,sk slo - Slovak | |||
slv,sl slv - Slovenian (Slovenščina) | |||
spa,es spa - Spanish (Español) | |||
swe,sv swe - Swedish (Svenska) | |||
tha,th tha - Thai (ไทย) | |||
tlh tlh - Klingon (tlhIngan-Hol) | |||
tur,tr tur - Turkish (Türkçe) | |||
vie,vi vie - Vietnamese (tiếng Việt) | |||
braille braille - Braille Code | |||
morse morse - Morse Code | |||
roman roman - Roman Numbers | |||
=== Possible Future Extensions === | === Possible Future Extensions === | ||
* angle to compass direction (north, east ...) | * angle to compass direction (north, east ...) | ||
Line 39: | Line 100: | ||
* Genealogical hierarchy | * Genealogical hierarchy | ||
* More languages | * More languages | ||
* Declination of number words (especially ordinal numbers) | * Declination of number words (especially ordinal numbers) | ||
* Evangelists/Gospels | * Evangelists/Gospels | ||
Line 45: | Line 105: | ||
* Bible books | * Bible books | ||
* Astrological periods (Sagittarius, Cancer ...), link to ''churchcal'' | * Astrological periods (Sagittarius, Cancer ...), link to ''churchcal'' | ||
* Euro countries | * Euro countries | ||
* RGB color values | * RGB color values | ||
* Nebulae (M-number) with their galactical positions | * Nebulae (M-number) with their galactical positions |
Latest revision as of 13:41, 11 September 2016
NumWord generates and parses number words in natural languages. You can enter a sequence of digits, and the program will write the number word as it is spelled in the desired language. For example, it tells you that
- 194706 is einhundertvierundneunzigtausendsiebenhundertsechs in German, or that
- mille neuf cent quarante-sept is 1947 in French.
The program can be used for vocabulary trainers, calendar programs, spelling clocks, check amounts in writing and the like.
In western languages, the higher numbers use common latin prefixes for powers of 10^3 or 10^6: "mill", "bill", "trill" and so on. In German, the program spells numbers up to (10^6)^20 = "Vigintilliarden".
Implemented Languages
ara
- Arabic (ديسمبر) cze
- Czech (ceština) chi
- Chinese (中文 (zhōngwén)) dan
- Danish (Dansk) deu
- German (Deutsch) eng
- English epo
- Esperanto est
- Estonian fin
- Finnish fra
- French (Français) gle
- Irish (Gaeilge) geo
- Georgian gre
- Greek (Ελληνικά) hun
- Hungarian (Magyar) ice
- Icelandic (íslenska) ita
- Italian (Italiano) jpn
- Japanese kor
- Korean lat
- Latin (Latinum) lav
- Latvian (Latviešu) lit
- Lithuanian (Lietuviu) nld
- Dutch (Nederlands) nor
- Norwegian (Norsk) pol
- Polish (Polski) por
- Portuguese (Português) roh
- Rumantsch Grischun ron
- Romanian (Româna) rus
- Russian (Русский) slo
- Slovak slv
- Slovenian (Slovenšcina) spa
- Spanish (Español) swe
- Swedish (Svenska) tha
- Thai (ไทย) tlh
- Klingon (tlhIngan-Hol) tur
- Turkish (Türkçe) vie
- Vietnamese (ti?ng Vi?t).
Furthermore, there is roman
for Roman numbers (MCMXLVII = 1947) , braille
for Braille codes, and most Unicode symbols can be displayed.
Categories which can be spelled
The application maps numbers to the following categories of words in natural languages:
- Digits as Word 0- (spell the number as word)
- Word as Digits (parse number word and return the number as digits)
- Month 1-12 (january - december)
- Month's abbreviation 1-12
- Weekday 1-7 (monday - sunday)
- Weekday's abbreviation 1-7
- Season 1-4 (spring, summer, autumn, winter)
- Greeting 1-4 (morning, noon, evening, night)
- Time of aay 00:00 - 24:00 h
- offical Time - variant 1
- Time - variant 2
- Time - variant 3
- Cardinal direction 0-360 degrees
- Planet in the Solar System -1, 0, 1-8 (Moon, Sun, Mercury - Neptun)
Generally, when the field Digits/Word is empty or zero, a set of test cases is displayed.
Applications
- Fun for children
- eLearning
- Printing of the amount in words on checks
- Internationalized calendar
- Extraction of numbers and dates from long texts like the bible
Commandline Usage
usage: java org.teherba.numword.NumwordCommand [-l iso [-c|-m[3]|-s|-w[2]|-g] [number]] java org.teherba.numword.NumwordCommand -l iso (-f|-t) filename java org.teherba.numword.NumwordCommand -l iso -p number-word java org.teherba.numword.NumwordCommand -l iso [-m[3]|-s|-w[2]] -p [month|weekday|season] -f find and replace number words in text file -c print cardianal number word (default) -d print compass direction (parameter is degrees, 270 = West) -g print greeting corresponding to day's time (6, 12, 18, 24, 0) -hi print hour of clock (h0 = official, h1,h2,h3 variants) -m print 12 month names -m3 print 12 months' abbreviations (3 letters) -s print 4 seasons of the year -w print 7 weekdays -w2 print 7 weekdays' abbreviations (2 letters) -p parse word, return digits or cardinal number -t test against file consisting of lines with: digits tab number-word -l ara,ar ara - Arabic (ديسمبر) cze,ces,cs cze - Czech (čeština) chi,zho,zh chi - Chinese 中文 (zhōngwén) dan,da dan - Danish (Dansk) deu,ger,de deu - German (Deutsch) eng,en eng - English epo,eo epo - Esperanto est,et est - Estonian fin,fi fin - Finnish fra,fr fra - French (Français) gle,ga,ir gle - Irish (Gaeilge) geo,ka,kat geo - Georgian gre,ell,el gre - Greek (Ελληνικά) hun,mag,hu hun - Hungarian (Magyar) ice,isl,is ice - Icelandic (íslenska) ita,it ita - Italian (Italiano) jpn,ja jpn - Japanese kor,ko kor - Korean lat,la lat - Latin (Latinum) lav,lv lav - Latvian (Latviešu) lit,lt lit - Lithuanian (Lietuvių) nld,nl nld - Dutch (Nederlands) nor,no nor - Norwegian (Norsk) pol,pl pol - Polish (Polski) por,pt por - Portuguese (Português) roh,rm roh - Rumantsch Grischun ron,ro,rum ron - Romanian (Română) rus,ru rus - Russian (Русский) slo,slk,sk slo - Slovak slv,sl slv - Slovenian (Slovenščina) spa,es spa - Spanish (Español) swe,sv swe - Swedish (Svenska) tha,th tha - Thai (ไทย) tlh tlh - Klingon (tlhIngan-Hol) tur,tr tur - Turkish (Türkçe) vie,vi vie - Vietnamese (tiếng Việt) braille braille - Braille Code morse morse - Morse Code roman roman - Roman Numbers
Possible Future Extensions
- angle to compass direction (north, east ...)
- Ordinal numbers (second, third, fourth ...)
- Fractions (half, third, quarter ...)
- Multiples (twice, three times ...), or threefold
- Prefixes for positive (deca, hecto, kilo ...) and negative powers of 10 (dezi, centi, milli ...)
- Genealogical hierarchy
- More languages
- Declination of number words (especially ordinal numbers)
- Evangelists/Gospels
- Apostel/Disciples
- Bible books
- Astrological periods (Sagittarius, Cancer ...), link to churchcal
- Euro countries
- RGB color values
- Nebulae (M-number) with their galactical positions
- International telephone prefix numbers and corresponding country names (49 = Germany)
- numerical country codes
- Towns by their postal codes
- ISBN book seller prefixes
- Hexadecimal, binary, octal converter
- Text -> SMS digits (multiple or T9)
- reference to Wikipedia article for the number, in specified language (done)
- show on Abacus
Problems
- For right-to-left languages (Arabic) the lists are not always displayed properly.
- Testing is rather difficult if you are not a native speaker of the language.
- Some languages declinate words for small numbers (1,2,3). The current implementation doesn't attempt to handle variations for gender, numerus and case properly, except for singular/plural of million, milliard, billion etc. Dual is also not yet supported.
- It is an interesting fact that the numbering schemes in most languages use powers of 1000, but there are exceptions where special words for (powers of) 10000 are used (Klingon, SinoSpeller).
- For the languages based on powers of 1000, a fixed set of Latin prefixes is used for numbers >= 1 million (c.f. BaseSpeller.wordN000). For some languages these prefixes must be modified (cyrillic character set, c -> k in German).
- Some languages still use different number word sets for living entities, cattle, money etc. There is currently no attempt to handle these variations.