![]() |
Irina Sekerina Psychology and Linguistics College of Staten Island and CUNY irina.sekerina@csi.cuny.edu |
![]() |
Aleksandra Skorobogatova Psychology CUNY Graduate Center as.skorobogatova@gmail.com |
![]() |
Anna Smirnova Henriques LIAAC Pontificia Univ. Católica de São Paulo anna.smirnova.liaac@gmail.com |
Participants: | 16 |
Type of Study: | interview |
Location: | São Paolo, Brazil |
Media type: | audio |
DOI: | doi:10.21415/CJV6-JY66 |
Sekerina, I. A., Skorobogatova, A. S., & Smirnova Henriques, A. (2025). Brazilian Portuguese-Russian Corpus (BraPoRus). Retrieved from https://biling.talkbank.org/ doi:
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Brazilian Portuguese-Russian Corpus (BraPoRus) is a collection of the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Now in their 70s and 80s, they speak an isolated version of Russian that has been “frozen” for 100 years and can be described as moribund. They grew up in Russian-speaking families and are literate in Russian, but they never even visited Russia. As of 2025, 34 participants from this population have been enrolled and their naturalistic speech samples have been collected. BraPoRus (v.1.0) contains 16 participants who performed Task 1 “Monologue about family history”. For each participant, one 15-min. segment was extracted from the semi-structured interview that lasted on average one hour.
The participants were video (.mp4) or audio (.mp3) recorded remotely either on the Zoom or smartphone during the COVID-19 pandemic. All the video files were converted to the audio format. The audio files were processed by BatchAlign2 (Liu & MacWhinney, 2024), i.e., transcribed into .cha format in the Cyrillic orthography, split into utterances, aligned with the audio sources, morphologically (%mor) and syntactically (%gra) tagged, and translated (%xtra) into English. At this point, the morphological and syntactic tagging has not been manually checked and may contain unresolved ambiguities.
Information regarding each session is provided in this table.
Each participant signed an informed consent form and allowed their data to be publicly shared. Data collection was approved by the Ethics Committee of Pontifícia Universidade Católica de São Paulo (CAAE 09079219.9.0000.5482).