BilingBank BraPoRus Corpus

Irina Sekerina
Psychology and Linguistics
College of Staten Island and CUNY
irina.sekerina@csi.cuny.edu

Aleksandra Skorobogatova
Psychology
CUNY Graduate Center
as.skorobogatova@gmail.com

Anna Smirnova Henriques
LIAAC
Pontificia Univ. Católica de São Paulo
anna.smirnova.liaac@gmail.com

Participants:	16
Type of Study:	interview
Location:	São Paolo, Brazil
Media type:	audio
DOI:	doi:10.21415/CJV6-JY66

Citation information

Sekerina, I. A., Skorobogatova, A. S., & Smirnova Henriques, A. (2025). Brazilian Portuguese-Russian Corpus (BraPoRus). Retrieved from https://biling.talkbank.org/ doi:

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Brazilian Portuguese-Russian Corpus (BraPoRus) is a collection of the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Now in their 70s and 80s, they speak an isolated version of Russian that has been “frozen” for 100 years and can be described as moribund. They grew up in Russian-speaking families and are literate in Russian, but they never even visited Russia. As of 2025, 34 participants from this population have been enrolled and their naturalistic speech samples have been collected. BraPoRus (v.1.0) contains 16 participants who performed Task 1 “Monologue about family history”. For each participant, one 15-min. segment was extracted from the semi-structured interview that lasted on average one hour.

The participants were video (.mp4) or audio (.mp3) recorded remotely either on the Zoom or smartphone during the COVID-19 pandemic. All the video files were converted to the audio format. The audio files were processed by BatchAlign2 (Liu & MacWhinney, 2024), i.e., transcribed into .cha format in the Cyrillic orthography, split into utterances, aligned with the audio sources, morphologically (%mor) and syntactically (%gra) tagged, and translated (%xtra) into English. At this point, the morphological and syntactic tagging has not been manually checked and may contain unresolved ambiguities.

Information regarding each session is provided in this table.

Each participant signed an informed consent form and allowed their data to be publicly shared. Data collection was approved by the Ethics Committee of Pontifícia Universidade Católica de São Paulo (CAAE 09079219.9.0000.5482).

Acknowledgements

We are very grateful to the participants and their families. Irina Sekerina was partially supported by two PSC-CUNY grants (TRADB #66406-00-01 and TRADB #68423-00-01) and a Fulbright Core Program award to Brazil. Anna Smirnova Henriques was partially supported by postdoctoral fellowship PNPD/CAPES (Programa Nacional de Pós-Doutorado da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, the process number 88882.315378/2019-01) for her project “Russian speakers and Brazilian Portuguese: an interdisciplinary study project”, and Aleksandra S. Skorobogatova by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo, the process number 2022/01119-0) for her undergraduate research project “Corpus of heritage Russian in Brazil and narrative analysis in Russian-Brazilian Portuguese bilinguals”. During the corpus data collection, both Brazilian researchers were affiliated to Pontifı́cia Universidade Católica de São Paulo and co-supervised by Dr. Sandra Madureira from Laboratório Integrado de Análise Acústica e Cognição/ LIAAC.