Skip to main content

£1.8m for online resource of contemporary Welsh language

14 October 2015

Welsh language letters in wood

University to lead multi-institution project to develop first mass corpus

Cardiff University is to play a key role in developing the first ever large-scale corpus of the Welsh language, compiling an initial data set of 10 million Welsh words.

The University’s School of English, Communication and Philosophy has secured £1.8 million in funding from the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Council (AHRC) for the interdisciplinary, collaborative project, entitled The National Corpus of Contemporary Welsh, or Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC).

Commencing in March 2016, the project will run for three and a half years. It will draw on expertise from Swansea, Bangor and Lancaster Universities and break new ground as both a language resource and a model of corpus construction.

The corpus - a large collection of texts, or a body of written or spoken material for linguistic analysis - will represent Welsh language use across all communication types. This will include spoken, written and digital language, encompassing different genres, language varieties (regional and social) and contexts.

Contributors will be drawn from the 562,000 Welsh speakers in the UK, who will contribute via crowdsourcing digital technologies and community collaboration.

Further detail on the project’s construction and the ways in which users will be able to participate will be shared once it is live in 2016. 

Dr Dawn Knight, from Cardiff University’s School of English, Communication and Philosophy, who is leading the project, said: “What we hope to achieve is the development of the first large-scale living and evolving corpus, representing the Welsh language across communication types and informed by real, current, users of the language.

“We will be engaging with the public in a number of ways, and using new technologies to do so. This is a project about the past, present and future use of the Welsh language and will inform us about variation and change in real language use, such as regional differences or use of mutations over time.

“The project will have a positive impact on the work of translators, publishers, policy-makers, language technology developers and academics, and a bespoke toolkit will be constructed for teachers and learners, integrating basic corpus functionalities for the exploration of language use.”

The range of stakeholders for the project - including the Welsh Government, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa and University of Wales Dictionary – are representative of the linguistic, cultural and social relevance of the project.