Qualitas Corpus Clone Collection

The Qualitas Corpus Clone Collection (Collection) is part of the Qualitas Corpus (Corpus), which is a curated collection of software systems intended to be used for empirical studies of code artefacts. The Collection consists of data describing possible code clones — code fragments that are in some way similar to each other — found in most systems in the Corpus. The hope is that the accuracy of this data will be established (that is, error bounds will be provided) and, ideally, improved over time. All data provided should include its provenance — where the values came from. This will help provide some idea of how much the data can be trusted.


24 January 2013
The first release for the Collection is planned for 1 May 2013.


Collection Catalogue Download the collection
Structure of the Collection Description of data
Provenance information Citing the collection
Development status and plans History
FAQ Glossary

Manuscripts and Publications

Related Information