Indexing the world's species
Using big data to help save flora and fauna from extinction.
The loss of biodiversity is an issue of global concern, it has prompted global campaigns to halt the rate of species extinction.
A major hurdle in any initiative was the lack of any form of definitive list of the world's species. Species data was scattered across hundreds of local databases, created and interpreted differently by many scientists. No uniform, agreed catalogue existed.
Achieving consistent data
Since 1999, members of the School of Computer Science & Informatics have conducted research on the distributed data management infrastructure and associated tools for creating the Catalogue of Life, performing quality checking through conflict resolution techniques, and delivering its species data to users.
Because the Catalogue is assembled from species records prepared and updated by many groups of experts around the world, the infrastructure enabling the Catalogue uses a federated approach that resolves the problems associated with diverse and often conflicting information sources from multiple content providers.
Our research has led to an infrastructure that incorporates tools for preparing the Catalogue and for maintaining its consistency. This was achieved by creating a scalable architecture.
Using this software the Catalogue of Life has expanded every year from an initial version created as a prototype using the research, with 12 databases and 200,000 species, to its present state of 132 databases with 1.6 million species and 30,000 web users per month.
This federated database is the most complete set of species data anywhere in the world, comprised of 1.6 million entries.
It is utilised by governments across the globe for nature conservation, import control and predicting the effects of climate change.
The Catalogue is endorsed by the international UN Convention on Biodiversity (CBD). It is the world's most authoritative source of peer-reviewed information about the names (Latin scientific names and common names) of the world's species of plants, animals, fungi and micro-organisms.
Its coverage has extended from 600,000 species in the late 90s to 1,600,000 species now. It was used in the preparation of the IUCN Red List to check information about species being added to the endangered species list. Other users include charities, specialists, scientists, publishers, students and members of the public worldwide.
The Catalogue of Life underpins the Global Biodiversity Information Facility (GBIF), the global NGO set up to make biodiversity data from all countries available with compatible species naming. It also underpins the Encyclopedia of Life (EoL).
- Jones, A. C. , White, R. J. and Orme, E. 2011. Identifying and relating biological concepts in the Catalogue of Life. Journal of Biomedical Semantics 2 (1) 7. (10.1186/2041-1480-2-7)
- Jones, A. C. et al. 2010. Evolution of the Catalogue of Life Architecture. Lecture Notes in Computer Science 6279 , pp.485-496. (10.1007/978-3-642-15384-6_52)
- Xu, X. et al. 2002. Design and performance evaluation of a web-based multi-tier federated system for a catalogue of life. Presented at: 4th International Workshop on Web Information and Data Management McLean, VA, USA 8 November 2002. Proceedings of the 4th International Workshop on Web Information and Data Management. New York, NY: ACM. , pp.104-107. (10.1145/584931.584954)
- Embury, S. M. et al., 2001. Adapting integrity enforcement techniques for data reconciliation. Information Systems 26 (8), pp.657-689. (10.1016/S0306-4379(01)00044-8)
- Xu, X. et al. 2001. Experiences with a Hybrid Implementation of a Globally Distributed Federated Database System. Lecture Notes in Computer Science 2118 , pp.212-222. (10.1007/3-540-47714-4_20)
- Jones, A. C. et al. 2000. Techniques for effective integration, maintenance and evolution of species databases. Presented at: 12th International Conference on Scientific and Statistical Database Management Berlin, Germany 26-28 July 2000. Published in: Günther, O. and Lenz, H. eds. Proceedings: 12th International Conference on Scientific and Statistical Database Management, 2000, Berlin, Germany, 26-28 Jul 2000. Los Alamitos, CA: IEEE. , pp.3-13. (10.1109/SSDM.2000.869774)