Skip to main content

St David’s Day kick off for Welsh language project

2 March 2016


As Welsh people around the world celebrated St David’s Day, a new project which aims to document contemporary use of the Welsh language got underway

Led by Cardiff University’s School of English, Communication and Philosophy, in collaboration with Swansea, Bangor and Lancaster Universities, the £1.8m National Corpus of Contemporary Welsh, or Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC), project commenced on 1st March 2016. CorCenCC is funded by the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Council (AHRC).

The funding enables the development of the first ever large-scale corpus - a collection of texts, or a body of written or spoken material for linguistic analysis - of the Welsh language.

Contributors will be drawn from the 562,000 Welsh speakers in Wales, who will contribute via crowdsourcing digital technologies, and the completed corpus will be freely available online to anyone wishing to view or use it.

The community-driven project is the first of its kind in that it aims to capture and represent Welsh language use across all communication types, including spoken, written and digital mediums, from people of all backgrounds. This will allow individuals to identify and explore the Welsh language as it is actually used, rather than relying on more formal approaches of how it 'should' be used.

Dr Dawn Knight, from Cardiff University’s School of English, Communication and Philosophy, who is leading the project, said: “Since securing the funding in late 2015 the project team across the partner universities has been planning towards the go live date. We have recruited five excellent research assistants and will shortly be announcing the involvement of high-profile project ambassadors very soon.

“This is an exciting time for us and we look forward to beginning the work of building the corpus. We will be launching our innovative data collection app, and reaching out to contributors from demographically diverse backgrounds, so that the data gathered represents the richness and variety of the Welsh language.”

The corpus project has linguistic, cultural and social relevance. Engaging with the public through new technologies will play a significant role in tracking variation and change in real language use, such as regional differences or the use of mutations over time.

The project will have a positive impact on the work of translators, publishers, policy-makers, language technology developers and academics, and a bespoke toolkit will be constructed for teachers and learners, integrating basic corpus functionalities for the exploration of language use.

The interdisciplinary, collaborative project will run for three and a half years and key stakeholders include the Welsh Government, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa and University of Wales Dictionary.