A major area of ongoing linguistic research at the Slavic seminar includes non-standardised varieties. This also involves the development of annotated linguistic corpora, with a particular focus on dialectal and historical collections and spoken language corpora. The current selection of corpora is available on https://gitlab.uzh.ch/uzh-slavic-corpora
Macedonian Spoken Corpus
Pre-Standardized Balkan Slavic Literature
The corpus includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens).
Contact: Ivan Šimko
Serbian Forms of Address
Map Task Corpus of heritage BCMS
The corpus consists of 30 short transcripts of elicited map task conversations between heritage speakers of BCMS living in German-speaking Switzerland. The corpus is searchable on an interactive platform that supports various types of annotation and metadata querying, as well as custom annotations.
Contact: Dolores Lemmenmeier-Batinić