Corpus Investigations of Medieval Slavonic Manuscripts: Statistically Important N-Grams (Collocations) of Old Russian Chronicles
Table of contents
Share
Metrics
Corpus Investigations of Medieval Slavonic Manuscripts: Statistically Important N-Grams (Collocations) of Old Russian Chronicles
Annotation
PII
S207987840009440-3-1
DOI
10.18254/S207987840009440-3
Publication type
Article
Status
Published
Authors
Viktor Baranov 
Affiliation: Izhevsk State Technical University
Address: Russian Federation, Izhevsk
Abstract

The paper deals with the current status of preparation of Slavonic historical textual corpora and requirements for them from the point of view of processing, search and demonstration of linguistic data. It is stressed that the main causes of the slow development of this line are high labor expenditures of manual creation of machine-readable transcriptions and their tagging and the necessity of training of special corpus managers providing access to data and its visualization. It is emphasized that one of the lines of use of corpus data of current importance is its analysis with the help of quantitative and statistic methods. There is a description of the functional possibilities of the historical corpus “Manuscript” comprising medieval Slavonic manuscripts of the 10th — 15th centuries (manuscripts.ru). The possibilities of the module of n-grams for revelation of grammatically and semantically set expressions characterizing the text subjects are demonstrated on the example of subcorpus of three Old Russian chronicles (Laurentian, Hypatian, Radzivilovsky). Statistic measures Mutual Information and T-score help to reveal the lists of relatively rare and more frequent set expressions. MI-lists include proper names, pair names, set biblical and Slavonic-bookish subordinating constructions. T-score lists give information on the events, goals, persons, outputs and their characteristics. A conclusion on the efficiency of application of statistic measures to automatic finding of the semantically and thematically important expressions in the historical sources is made.

Keywords
Russian chronicle, linguistic statistics, n-gram, collocation
Received
19.10.2019
Publication date
12.05.2020
Number of characters
64254
Number of purchasers
19
Views
237
Readers community rating
0.0 (0 votes)
Cite Download pdf 100 RUB / 1.0 SU

To download PDF you should sign in

Full text is available to subscribers only
Subscribe right now
Only article
100 RUB / 1.0 SU
Whole issue
1000 RUB / 10.0 SU
All issues for 2020
1200 RUB / 20.0 SU

References

1. Baranov V. A. Istoricheskij korpus kak tsel' i instrument korpusnoj paleoslavistiki // Scripta & e-Scripta: The Journal of Interdisciplinary Mediaeval Studies. Vol. 14—15. Sofia, 2015. C. 39—62 [Ehlektronnyj resurs]. URL: https://tinyurl.com/ycsr4skc (data obrascheniya: 06.05.2018).

2. Baranov V. A. Kolichestvennyj i statisticheskij analiz srednevekovykh slavyanskikh tekstov: instrumentarij korpusa «Manuskript» i metodika ego ispol'zovaniya // Tsifrovaya gumanitaristika: resursy, metody, issledovaniya: materialy Mezhdunar. nauch. konf. (g. Perm', 16—18 maya 2017 g.): v 2 ch. Perm. gos. nats. issled. un-t. Perm', 2017. Ch. 1. S. 40—49.

3. Baranov V. A. Kolichestvennyj i statisticheskij analiz srednevekovykh tekstov: klyuchevye slova slavyanskikh sluzhebnykh minej XI—XIV vv. // Estestvennonauchnye metody v tsifrovoj gumanitarnoj srede: materialy Vseros. nauch. konf. s mezhdunar. uchastiem (g. Perm', 15—18 maya 2018 g.). Perm. gos. nats. issled. un-t. Perm', 2018. S. 73—77.

4. Baranov V. A. Modul' n-gramm istoricheskogo korpusa «Manuskript»: strukturnye i lingvisticheskie parametry // Nauchnoe nasledie V. A. Bogoroditskogo i sovremennyj vektor issledovanij Kazanskoj lingvisticheskoj shkoly: tr. i mater. mezhdunar. konf. (Kazan', 31 okt. — 3 noyab. 2016 g.): v 2 t. / pod obsch. red. K. R. Galiullina, E. A. Gorobets, G. A. Nikolaeva. Kazan', 2016. T. 1. S. 50—61.

5. Baranov V. A. Opyt sozdaniya modulya n-gramm sistemy «Manuskript» i otsenki ehffektivnosti ego ispol'zovaniya dlya poiska kollokatsij v korpuse M. V. Lomonosova // Intellektual'nye sistemy v proizvodstve. 2016. № 4 (31). S. 124—131.

6. Baranov V. A. Organizatsiya poiska i demonstratsii kollokatsij v korpuse «Manuskript» // Problemy istorii, filologii, kul'tury. 3 (45). M.; Magnitogorsk; Novosibirsk, 2014. S. 275—277.

7. Baranov V. A. Statisticheski znachimye slova kak kharakteristika srednevekovogo slavyanskogo teksta (na materiale kollektsii Apostolov istoricheskogo korpusa «Manuskript») // Gumanitarnoe obrazovanie i nauka v tekhnicheskom vuze. Sbornik dokladov Vserossijskoj nauchno-prakticheskoj konferentsii s mezhdunarodnym uchastiem (Izhevsk, 24—27 oktyabrya 2017 g.). Izhevsk, 2017. S. 359—369.

8. Kochetkova N. A. Statisticheskie yazykovye metody. Kollokatsii i kolligatsii // Cyberleninka.ru [Ehlektronnyj resurs]. URL: http://cyberleninka.ru/article/n/statisticheskie-yazykovye-metody-kollokatsii-i-kolligatsii (data obrascheniya: 06.05.2018).

9. Lukashevich N. V., Logachev Yu. M. Kombinirovanie priznakov dlya avtomaticheskogo izvlecheniya terminov // Vychislitel'nye metody i programmirovanie. T. 2. 2010. S. 108—116 [Ehlektronnyj resurs]. URL: https://elibrary.ru/download/elibrary_15272886_42034432.pdf (data obrascheniya: 06.05.2018).

10. Mironova D. M. Avtomatizirovannaya klassifikatsiya drevnikh rukopisej (Na materiale 525 spiskov slavyanskogo Evangeliya ot Matfeya XI—XVI vv.): dis. … kand. filol. nauk: 10.02.21 — Prikladnaya i matematicheskaya lingvistika. SPb., 2018.

11. Nechunaeva N. A., Nechunaev A. V. Tipologiya rukopisej slavyanskikh minej XI—XIV vv. i metody informatsionnogo poiska // El’Manuscript—2016. Rašytinis palikimas ir skaitmeninės technologijos: VI tarptautinė mokslinė konferencija, Vilnius, 2016 m. rugpjūčio 22—28 d. Pranešimai / ats. red. V. Baranovas, T. Timčenko. Vilnius; Iževskas, 2016. S. 274—274.

12. Khokhlova M. V. Ehksperimental'naya proverka metodov vydeleniya kollokatsij // Slavica Helsingiensia 34. Instrumentarij rusistiki: Korpusnye podkhody / pod red. A. Mustajoki, M. V. Kopoteva, L. A. Biryulina, E. Yu. Protasova. Khel'sinki, 2008. S. 343—357 [Ehlektronnyj resurs]. URL: https://preview.tinyurl.com/ybrkzcbw (data obrascheniya: 06.05.2018).

13. Yagunova E. V., Pivovarova L. M. Ot kollokatsij k konstruktsiyam // Russkij yazyk: konstruktsionnye i leksiko-semanticheskie podkhody / otv. red. S. S. Saj. SPb., 2013. (Acta Linguistica petropolitana. Trudy Instituta lingvisticheskikh issledovanij RAN / otv. red. N. N. Kazanskij, E. V. Yagunova, L. M. Pivovarova [Ehlektronnyj resurs]. URL: https://preview.tinyurl.com/y7h4gv6z (data obrascheniya: 06.05.2018).

14. Yagunova E. V., Pivovarova L. M. Priroda kollokatsij v russkom yazyke. Opyt avtomaticheskogo izvlecheniya i klassifikatsii na materiale novostnykh tekstov // Sb. NTI. Ser. 2. № 6. M., 2010 [Ehlektronnyj resurs]. URL: http://medialing.spbu.ru/upload/files/file_1394529742_4311.pdf, http://webground.su/services.php?param=priroda_collac&part=priroda_collac.htm (data obrascheniya: 06.05.2018).

15. Evert S. Association Measures // Computational Approaches to Collocations [Ehlektronnyj resurs]. URL: http://collocations.de/AM/index.html (data obrascheniya: 06.05.2018).