filmov
tv
Bringing improved person metadata into digital archives
Показать описание
Presentation given by Jeff Good and Pierpaolo Di carlo at the Language Documentation and Archiving conference #LDA2022
Commonly used metadata standards for language documentation center on the description of language resources rather than the people involved in their creation. OLAC metadata (Simons & Bird 2008), for example, only allows for the standardized encoding of an individual’s role in the creation of a resource. The IMDI standard (Broeder & Wittenburg 2006) is more expansive, allowing individuals to be specified for properties such as their age and level of education, but it is still quite limited, lacking, among other things, the ability to specify the ways individuals are categorized in a local cultural system (e.g., by clan), as well as their kinship relations. Moreover, these standards do not allow for systematic description of individuals’ multilingual repertoires, even though this can be crucial to understanding patterns of language use. Tools to support the creation of documentary metadata show comparable limitations. For instance, lameta (Hatton et al. 2021) allows individuals to be associated with multiple languages but does not require information about how they use them. A result of this state of affairs is that language archives lack information that is both of interest to community members and important for research.
This talk will propose solutions to these problems by drawing on insights from three domains: (i) data from work on the documentation of small-scale multilingualism (Pakendorf et al. 2021), (ii) research on the development of content management systems for Indigenous language communities (Christen 2015), and (iii) existing metadata schemes for encoding information about people, such as Friend of a Friend (FOAF; Graves et al. 2009). In particular, it will develop a conceptual model for metadata about individuals in documentary contexts and propose a sample RDF/XML encoding of data collected via a sociolinguistic questionnaire in a documentary project to demonstrate how metadata collection practices can be improved in this domain.
Commonly used metadata standards for language documentation center on the description of language resources rather than the people involved in their creation. OLAC metadata (Simons & Bird 2008), for example, only allows for the standardized encoding of an individual’s role in the creation of a resource. The IMDI standard (Broeder & Wittenburg 2006) is more expansive, allowing individuals to be specified for properties such as their age and level of education, but it is still quite limited, lacking, among other things, the ability to specify the ways individuals are categorized in a local cultural system (e.g., by clan), as well as their kinship relations. Moreover, these standards do not allow for systematic description of individuals’ multilingual repertoires, even though this can be crucial to understanding patterns of language use. Tools to support the creation of documentary metadata show comparable limitations. For instance, lameta (Hatton et al. 2021) allows individuals to be associated with multiple languages but does not require information about how they use them. A result of this state of affairs is that language archives lack information that is both of interest to community members and important for research.
This talk will propose solutions to these problems by drawing on insights from three domains: (i) data from work on the documentation of small-scale multilingualism (Pakendorf et al. 2021), (ii) research on the development of content management systems for Indigenous language communities (Christen 2015), and (iii) existing metadata schemes for encoding information about people, such as Friend of a Friend (FOAF; Graves et al. 2009). In particular, it will develop a conceptual model for metadata about individuals in documentary contexts and propose a sample RDF/XML encoding of data collected via a sociolinguistic questionnaire in a documentary project to demonstrate how metadata collection practices can be improved in this domain.