Skip to main content

National Corpus of Contemporary Welsh

1 March 2017

Dr Dawn Knight

A major new project to record the Welsh language and explore the ways in which it is used today is underway.

Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh (CorCenCC) project, aims to develop the first ever large-scale collection of Welsh words representing the full range of language used by people in everyday life.

It was officially launched on 28th February 2017 at the Pierhead Building in Cardiff.

Breaking new ground

The launch event attended by Alun Davies AM, Minister for Lifelong Learning and Welsh Language, gave guests the chance to find out more about the project, which is a collaboration between Cardiff, Swansea, Lancaster and Bangor universities, and is breaking new ground in creating a large-scale, open access corpus of contemporary Welsh language.

Speakers at the CorCen CC launch

Backed by high-profile ambassadors - poet Damian Walford-Davies, musician and presenter Cerys Matthews, broadcaster Nia Parry and international rugby referee Nigel Owens - CorCenCC is community-driven and uses mobile and digital technologies to enable public collaboration.

A demonstration of a new data collection app which enables Welsh speakers from all walks of life to contribute to the project, was on show at the event. CorCenCC partners and ambassadors also shared their impressions of how the resource will impact on their research, and on the Welsh language community more widely.

'The first large-scale living and evolving corpus'

The research team aim for the corpus to contain 10 million words of Welsh language, providing concrete evidence about modern Welsh language use for academic researchers, teachers, language learners, dictionary makers, translators, and anyone interested in the way Welsh is used across different speakers and genres.

Dr Dawn Knight, project lead from Cardiff University’s School of English, Communication and Philosophy said: “What we aim to achieve is the development of the first large-scale living and evolving corpus, representing the Welsh language across communication types and informed by real, current, users of the language. We will be engaging with the public in a number of ways, and using new technologies to do so, including the CorCenCC crowdsourcing app..."

"The use of crowdsourced corpus data is relatively unheard of, and represents a new direction to complement more traditional language collection methods.”

Dr Dawn Knight Reader

Steve Morris, Swansea University added: “This is a project about the past, present and future use of the Welsh language and will inform us about variation and change in real language use, such as regional differences or use of mutations over time..."

"By putting speakers themselves in charge of their contributions to the corpus, they can be sure that the recordings they share will be the most natural and accurate representation possible of their everyday Welsh.”

Steve Morris Swansea University

Minister for Lifelong Learning and the Welsh Language, Alun Davies, said: “I am very pleased to attend the launch of this exciting project today. Not only will this work give us a real record of how Welsh is actually being used, but it will also feed into our aim of developing the role of the Welsh language in technology which will be key if we are to meet our target of a million Welsh speakers by 2050.”

CorCen CC logo

CorCenCC is funded by the Economic and Social Research Council and the Arts and Humanities Research Council. The project also involves Welsh Government; National Assembly for Wales; The National Library of Wales; WJEC-CBAC; Welsh for Adults; S4C; BBC; y Lolfa; and the Dictionary of the Welsh Language. Additional funding for the launch was received from the British Council; the School of English, Communication and Philosophy (ENCAP), Cardiff University and Research Institute for Arts and Humanities (RIAH), Swansea University.

Share this story

The School combines the highest levels of traditional scholarship with innovative approaches to its core interests in language, communication, literature, critical theory, and philosophy.