ESRC Centre for Research on Bilingualism
|Type of Study:||naturalistic|
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Patagonia corpus of Welsh-Spanish bilingual speech was recorded in late 2009 and transcribed from 2010 to 2011 as part of a research project funded by the Economic and Social Research Council (ESRC). The main theoretical aim of the project was to test alternative models of code-switching with Welsh-Spanish data.
The corpus consists of 43 audio recordings and their corresponding transcripts of informal conversation between two or more speakers, involving a total of 94 speakers from Patagonia, Argentina. Participants were recruited via a social network approach: as only a very small percentage of the inhabitants of Patagonia are fluent in both Spanish and Welsh, names of bilingual speakers were sought from local contacts in advance of the fieldworkers’ visit. In total, the corpus consists of 195,190 words of text from just under 21 hours of recorded conversation. The transcriptions (in CHAT format) are linked to the digitized recordings through sound links at the end of each main tier. Most recordings were in stereo, and were made using Marantz, Zoom or Microtrack digital audio recorders.
The recordings were made at a place convenient for the speakers, e.g. at their homes or workplaces. After setting up the equipment the researcher would leave the speakers to talk freely with one another. In some cases the researcher re-entered briefly during the recording. This is noted in the transcripts and speech by the researcher is usually not transcribed. The first five minutes of all recordings after the point when the researcher left the room have been deleted, in case the participants’ speech was initially affected by the presence of the recorder.
At the end of each recording all participants were asked to fill in questionnaires providing background information regarding their age, gender, location of places lived, etc, in order to provide information for sociolinguistic analysis. They were also asked to sign consent forms giving permission for their recording and its transcript to be used for research purposes and to be submitted to online linguistic archives. The consent form included the provision that the names of speakers and other people named in the recording would be replaced by pseudonyms in the transcript. In the case of children of 16 years or younger, a consent form was also signed by a parent or guardian.
Conditions of use: The corpus is being made available under the GNU General Public License version 3 or later (http://gnu.org/copyleft/gpl.html). Researchers who use it are requested to subscribe to the TalkBank Code of Ethics and acknowledge the corpus as set out below.
Acknowledgments: Please refer to the corpus as the Bangor Patagonia corpus, and provide a link to the website by which you accessed the corpus. We request that a copy of any publications that make use of this corpus be sent to us at the above address.
Canonical version of the data: The most up-to-date version of the data as well as more detailed documentation is available on http://bangortalk.org.uk.