Electronic Word Lists

Title

Electronic Word Lists

Subject

Electronic Word Lists: Mari, Mordvin and Udmurt. With SFOu WordListTool 1.3. Lexica Societatis Fenno-Ugricae XXXI:1. 2007. ISBN 978-952-5150-98-8.  

Electronic Word Lists: Komi, Chuvash and Tatar. With SFOu WordListTool 1.4. Ed. Jorma Luutonen et al. Lexica Societatis Fenno-Ugricae XXXI:2. 2016. ISBN 978-952-5667-79-0.

Description

History and purpose

In the past, large word lists with word class labels have been published by Finno-Ugrian Society in the form of reverse dictionaries. The main purpose of such word lists is to serve as a source for the study of derivation and word structure. Reverse dictionaries of Mari (2002) and Mordvin (2004) were produced as joint projects by the workers of the Research Unit for Volgaic Languages at the University of Turku and their partners from the Middle-Volga region. By-products of the projects were computer files containing large vocabularies of the afore mentioned languages.

Although traditional printed dictionaries are user-friendly and suitable for most simple research tasks, it was clear that the computer files would prove to be more useful research materials for users wanting to search for words matching a combination of different search parameters, or if the user’s aim is to compare different languages.

On the basis of such considerations, a new project was launched. It had two goals: a) to supplement the collection of electronic vocabularies with a new word list of the Udmurt language, and b) to produce a computer program specifically designed for handling such word lists. To achieve the first goal, an electronic word list, similar to those of Mari and Mordvin already available, was developed on the basis of the Udmurt reverse dictionary that was printed in Izhevsk in 1992. To accomplish the second aim, an agreement was made between Finno-Ugrian Society (Helsinki) and Turku University of Applied Sciences to produce a user interface program for electronic word lists. The program development work was carried out in the years 2005–2007, and its fruit is the SFOu WordListTool program.

Introduction

The Finno-Ugrian Society has published electronic word lists of the following languages: Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar. The electronic word lists are intended to be sources for the study of word derivation and word structure. The total number of entries in the six lists is ca. 327,000. Each entry word is provided with labels that indicate language, word class and dictionary sources. SFOu WordListTool is a computer program that has been specially developed for handling such lists. The alternative user interface languages of the program are English, Russian and Finnish.

There are two versions of each word list, one with normal alphabetisation (beginning from the first letter of the word) and the other with reverse alphabetical order (beginning from the end of the word). The names of the files are as follows:

• mari_alph.txt
• mordva_alph.txt
• udmurt_alph.txt
• komi_alph.txt
• chuvash_alph.txt
• tatar_alph.txt

• mari_rev.txt
• mordva_rev.txt
• udmurt_rev.txt
• komi_rev.txt
• chuvash_rev.txt
• tatar_rev.txt

The character encoding of the files is Unicode (UTF-8). In the files, the material is arranged in four columns:

1) the word
2) language
3) word class
4) sources

The meanings of the words are not given in the word list. Technically, the files are plain text Comma Separated Value (CSV) files. This simply means that a comma character (,) separates the fields for different types of information (word, language, word class, sources) in each line of the file.

One can handle with the word list files using word processors that can cope with Unicode characters, or with the help of such programs as Microsoft Excel. It is, however, suggested that the user utilizes the SFOu WordListTool program that has been developed specifically for these kinds of word lists.

The documentation of the word lists

The following documents contain detailed descriptions of the word lists:

• Mari, Mordvin and Udmurt: Descriptions_en.pdf (in Russian Descriptions_ru.pdf)
• Komi, Chuvash and Tatar: Descriptions_en_2016.pdf (in Russian Descriptions_ru_2016.pdf)
• There is also a Finnish description of the Tatar word list: Tat_sanalista_kuvaus.pdf

Summary descriptions, as well as advice for the use of the word lists, can be found in the following documents:

• Mari, Mordvin and Udmurt: Booklet_en.pdf (in Russian Booklet_ru.pdf)
• Komi, Chuvash and Tatar: QuickStartManual_en.pdf
  (in Finnish QuickStartManual_fi.pdf, in Russian QuickStartManual_ru.pdf)

Although the Booklet and QuickStartManual documents were written as manuals for the use of word lists with help of the SFOu WordListTool program, they also contain practical information for those not using this program.

The word list and program packages

The word lists, the accompanying documentation and the SFOu WordListTool program are here made available in three packages. The files with names ending in .exe are self-extracting archives. Please read the instructions and licences before extracting the materials. Note that the new word lists licence (2016) also covers the lists in the earlier package (2007).

1) Electronic Word Lists: Komi, Chuvash and Tatar. With SFOu WordListTool 1.4. Lexica Societatis Fenno-Ugricae XXXI:2 (2016)

The first package contains the word lists of the Komi, Chuvash and Tatar languages, as well as the new Windows version of the program SFOu WordListTool. System Requirements: free hard disk space 50 MB, 512 MB of memory, Windows 2000/XP/Vista/7/8. The contents of this package constitute the publication Lexica Societatis Fenno-Ugricae XXXI:2 (ISBN 978-952-5667-79-0).

Instructions_WLT1.4.pdf
WordLists_licence_2016.pdf
SFOuWLT_licence_2016.pdf
SFOu_WLT_1.4_Win.exe

2) Only word lists and their documentation: Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar (2007 + 2016)

The second package combines the word lists of the 2007 and 2016 publications (Lexica Societatis Fenno-Ugricae XXXI:1-2) and makes them available without an accompanying program. The materials include the word lists of the Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar languages, together with all relevant documents.

Instructions_Lists.pdf
WordLists_licence_2016.pdf
Wordlists&Documents.exe

3) Electronic Word Lists: Mari, Mordvin and Udmurt. With SFOu WordListTool 1.3. Lexica Societatis Fenno-Ugricae XXXI:1 (2007)

The Mari, Mordvin and Udmurt word lists were originally published in a CD in 2007. This package included the first version 1.3 of the SFOu WordListTool program. The contents of the aforementioned CD are now made available through the internet. The licences are shown during the installation procedure.

Instructions_WLT1.3.pdf
SFOu WordListTool 1.3 CD contents.exe


Electronic Word Lists: Mari, Mordvin and Udmurt. With SFOu WordListTool 1.3
. Ed. Jorma Luutonen et al. Lexica Societatis Fenno-Ugricae XXXI:1. ISBN 978-952-5150-98-8. Helsinki 2007.
Electronic Word Lists: Komi, Chuvash and Tatar. With SFOu WordListTool 1.4. Ed. Jorma Luutonen et al. Lexica Societatis Fenno-Ugricae XXXI:2. ISBN 978-952-5667-79-0. Helsinki 2016.

Creator

Jorma Luutonen et al. (ed.)

Publisher

Suomalais-Ugrilainen Seura

Date

2007–2016

Language

Finnish, English, Russian; Mari, Mordvin, Udmurt, Komi, Chuvash, Tatar