There are two variants of the application: for desktop (Windows 98/2000/XP) and for
Pocket PC (PPC 2000/2002). Minimal hardware configuration: 3 MB file space, 2 MB memory
(by usage of Large rhyming dictionary).
Remarks for PocketRhymes:
Large lexical database (1.7 million wordforms) was created in basis on Grammatical dictionary of Russian language by A.A.Zalizniak (ZGD) that contains base wordforms (lemmas) of ~100,000 words with full morphological description (including syllabic accent models).
Morphological module developed by Maxim Ushakov was used for synthesis of all accented wordforms. It generates regular paradigms qualitatively but does not support synthesis of grammatical exclusions (for example, personal pronoun and cardinal numeral). When the synthesizer did not allow to build up full paradigm, dictionary increased by lemma only. Hypothetical and hard in making wordforms (for example, singular short forms of adjectives on -ский, plural form of дно etc.) was not included in the dictionary (see ZGD, p.8).
Then each wordform split up pretonic part (prefix) and posttonic part (clausula) and lists of unique clausulas and prefixes were formed. As a rule the prefix list holds <10% from dictionary size and is quite loaded in memory to accelerate search of rhymes.
When generating rhyming database additional limitations of wordform and clausula length, syllable number at wordform and clausula, quantity of rich rhymes for clausula were applied (see Statistics of Large rhyming dictionary). In particular, our dictionary does not include rhymes of five and more syllables. Such rhymes are not practically used for Russian versification.
In addition to prefix and clausula, part of speech (POS) of any wordform was kept. For short, following parts of speech are determined:
| nouns | (n) | существительные, |
| adjectives | (a) | прилагательные, |
| pronouns | (pn) | местоимения, |
| verbs | (v) | глаголы, |
| participles | (p) | причастия, |
| verbal adverbs | (va) | деепричастия, |
| numerals | (nm) | числительные, |
| adverbs | (ad) | наречия, |
| others | (~) | союзы, частицы, междометия, предлоги, предикативы. |
Adverbs are selectively presented in Zalizniak's dictionary as long as the most of adverbs are adverbial adjectives (темно, легко, красиво etc.). Therefore we have duplicated ~900 short adjectives qua adverbs.
Note that the morphological synthesizer is used on dictionary creating stage and is not component of the application. Paradigm generation for all words is in progress several minutes. This is reason that we refused to run-time wordform synthesis (during search of rhymes).
Changes in Rhyming dictionaryThere are not rhyming dictionaries with frequency wordform selection in Rhymes 2.0 package. A necessity of the selection fell away after optimization of storage dictionary format.
When packing the dictionary two ways were applied: the first is internal packing of duplicate strings and structures, the second is block packing by zlib library. The block archive structure allows to load in memory only data, which is necessary for search of current rhyme (see Criteria of rhyming quality). Although high packing power was achieved (~5 times), this have not an effect on resulting search speed.
Many errors were revised in an electronic variant of Zalizniak's dictionary (from a linguistic package Starling by S.A.Starostin). 4-th edition of Zalizniak's dictionary was used for the revising (Moscow, Russkii yazyk, 2003). In addition, some errors of wordform synthesis algorithm were corrected. Now
Some of errors were automatically revealed by comparison with morphological dictionary of Dialing system. Unfortunately, Dialing morphology does not contain base phonetic characteristics (accents and ё letter) for the time present. Therefore accents are verified by electronic variants of Orthographic dictionary (ed. by V.V.Lopatin) and Dictionary of accents (common nouns) by M.V.Zarva.
Due to repeated parsing of Zalizniak's entries
the lexicon increases by grammatical homonyms (~250 lemmas) and accent variants (~180 lemmas) from entries such as
бондарь мо 2a [//бондарь мо 2b].
We added ~240 new lemmas from 4-th edition (for the most part, adjectives).
On the whole, synthesis of paradigms with anomalous accidence (ребёнок – pl. дети, быть – 1 person sing. есть and the like; there are ~1200 such lemmas in Zalizniak's dictionary) did not undergo a serious change and produces still non-existent wordforms. To clean its we used Dialing morphological database (in contrast to Rhymes 1.* where Frequency dictionaries are applied to such cleaning).
A procedure was as follows. If a Zalizniak's lemma was present in Dialing we verify all wordforms of the lemma for availability in Dialing. Dialing morphological dictionary does not contain some rarely used wordforms (first of all noun plurals and short adjectives and participles) that are freely produced by Zalizniak's morphology. In order to do not lose these forms we limit oneself to verification of anomalous paradigms only. As a result, pronouns (1.6%) and participles (0.3%) undergo the greatest cleaning. Erroneous wordforms for other parts of speech are less 0.1%. Note that Dialing allows to reveal spelling errors and not accent errors.
Rhymes are searched in the dictionary only among words whose clausulas have the same syllable numbers and stressed phoneme as the pattern word has. (гонять-гнать unlike гнуть-гнать).
A rhyming quality of the word pair is evaluated by phonetic comparison of the clausulas and pretonic phoneme. At present realization of the comparison is primitive enough and does not take into account many features of Russian phonetics (such as verbs inflexion -тся = [ца], дождь = [дож'ж'] and others). Common kinds of phonetic differences are provided with some numerical weights. Proper rhyming quality is estimated by a penalty, which is a sum of the weights. The more penalty the worse rhyme. Perfect rhyme corresponds to zero penalty.
Lexical database of synonyms were made by L.I.Kolodyadjnaya on basis of Dictionary of synonyms of Russian language (ed. by A.P.Evgenjeva). This database contains ~22,000 words and 5,500 synonymic sets (entries).
The application shows as tree view all entries where pattern was found. Variant of headword (dominant) is put in brackets and variant of synonym/antonym of headword is put as subnode. Icon near word shows relation to pattern (synonym, antonym, initial word). Roman numeral near dominant is homonymous variant number. Arabic numeral shows synset number with the same dominant. Initial order of entries and words was kept in review.
You can get detailed statistics of all dictionary databases among descriptions on download page.
This project would have been hardly continued if Maxim Ushakov had not placed at our disposal the excellent morphological package. We express grateful acknowledgement to Boris Smilga and Sergey A. Starostin for the electronic variant of Grammatical dictionary, Sergey Sharov and Sergey Slepov for the Russian frequency dictionaries, Mike Matsnev for his wonderful program HaaliReader whose opened sources lightened essentially porting Rhymes under Pocket PC. Individual thanks to AOT team: Alexey Sokirko, Dmitry Pankratov who participated actively in the comparison of Zalizniak's and Dialing morphologies.
| Send yours comments and remarks to support(a)rifmovnik.ru | The last modification: 30 April 2010 |