Zurich Corpora of Slavic Varieties (ZuCoSlaV)

A major area of ongoing linguistic research at the Slavic seminar includes non-standardised varieties. This also involves the development of annotated linguistic corpora, with a particular focus on dialectal and historical collections and spoken language corpora. The current selection of corpora is available on  https://gitlab.uzh.ch/uzh-slavic-corpora

Macedonian Spoken Corpus

Bild

The corpus comprises transcriptions of audio files collected in a series of field research trips in the Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019.

Contact: Anastasia Escher

 

Pre-Standardized Balkan Slavic Literature

Bild

The corpus includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens).

Contact: Ivan Šimko

 

Torlak

Bild

The corpus contains transcripts of interviews about traditional culture and history with speakers of Torlak from the Timok area. It comprises 500,697 tokens representing 80 h of recording.

Contact: Teodora Vuković

 

 

Serbian Forms of Address

Bild

The corpus contains transcripts of interviews about forms of address that Serbian speakers use in colloquial and formal settings. It consists of 171,552 tokens, corresponding to about 19 h of recording.

PhD project of Sonja Ulrich

Contact: Dolores Lemmenmeier-BatinićSonja Ulrich