Acquiring the Qualitas Corpus

The Qualitas Corpus is available by contacting Ewan Tempero. Identify the distribution required and email him.

There are several distributions of the corpus available. These include earlier releases, as well as different variants of the current release. They are listed below with the most recent releases first.

20130901

r

The recent versions release; the 112 systems, but only the most recent version of each system that we have. (Note that for some systems, mainly those that appear to be no longer active, the version we have can be quite old). This is intended for breadth studies. It is 3.31 GiB uninstalled and 9.65 GiB installed (not including jre).

e

The evolution release; the 15 systems for which we have 10 or more versions, a total of 579 versions. This is intended for studies on software evolution. It is 16.75 GiB uninstalled and 62.98 GiB installed (not including jre).

f
The complete corpus consists of 754 versions from 112 systems. It is 20.34 GiB uninstalled and 73.80 GiB installed (not including jre). This is not available as a single distribution. The 'r' and 'e' distributions contain almost all of the complete corpus. This 'f' distribution contains what is not contained in one (or both) of the 'r' or 'e' distributions (0.84 GiB). To get the complete corpus, download each of the 'r', 'e', and 'f' distributions and unpack the archive files in the same place. The installation will then work as for each individual distribution.

Past

These are distributions that have been made available in the past. They are not directly available now, but can be provided if required.
20120401r

The recent versions release; the 111 systems, but only the most recent version of each system that we have. (Note that for some systems, mainly those that appear to be no longer active, the version we have can be quite old). This is intended for breadth studies. It is 3.27 GiB uninstalled and 9.57 GiB installed (not including jre).

20120401e

The evolution release; the 14 systems for which we have 10 or more versions, a total of 486 versions. This is intended for studies on software evolution. It is 12.12 GiB uninstalled and 45.64 GiB installed (not including jre).

20120401f
The complete corpus consists of 661 versions from 111 systems. It is 15.69 GiB uninstalled and 56.20 GiB installed (not including jre). This is not available as a single distribution. The 'r' and 'e' distributions contain almost all of the complete corpus. This 'f' distribution contains what is not contained in one (or both) of the 'r' or 'e' distributions. To get the complete corpus, download each of the 'r', 'e', and 'f' distributions and unpack the archive files in the same place. The installation will then work as for each individual distribution.
20101126r

The recent versions release; the 106 systems, but only the most recent version of each system that we have. (Note that for some systems, mainly those that appear to be no longer active, the version we have can be quite old). This is intended for breadth studies. It is 2.9 GiB uninstalled and 8.5 GiB installed (not including jre).

20101126e

The evolution release; the 13 systems for which we have 10 or more versions, a total of 414 versions. This is intended for studies on software evolution. It is 9.3 GiB uninstalled and 31.9 GiB installed (not including jre).

20101126
The complete corpus consists of 585 versions from 106 systems. It is 12.5 GiB uninstalled and 42.0 GiB installed (not including jre). This is not distributed by default, but is available by request.
20100719

This is the complete corpus, with all 100 systems and every version of each system that we have. (9.42GiB/10.12GB distributed, 32.80GiB when installed)

20100719r

The recent releases; the 100 systems, but only the most recent version of each system that we have. (Note that for some systems the version we have can be quite old). Useful if you only do breadth studies, and not studies of system evolution, and so don't need the complete distribution. The contents of this distribution should be a proper subset of the complete distribution. The separate identifier is therefore only necessary to identify the distribution being used in case there are issues. (1.39GiB/1.48GB distributed, 4.52GiB installed)

20090202

Replaced by 20100719. Useful if you want to replicate studies based on this release.

This is the complete corpus, with all 100 systems and every version of each system that we have. This release has two distributions available:

20090202r

Replaced by 20100719r. Useful if you want to replicate studies based on this release.

The recent releases; the 100 systems, but only the most recent version of each system that we have. (Note that for some systems the version we have can be quite old). Useful if you only do breadth studies, and not studies of system evolution, and so don't need the complete distribution. The contents of this distribution should be a proper subset of the complete distribution. The separate identifier is therefore only necessary to identify the distribution being used in case there are issues. (1.2GiB distributed, 3.5GiB installed)

20080603c

The corrected version of the 20080603 release. Useful if you really want to reproduce studies done on this version, but don't want to have to find the relevant versions of systems from the complete corpus or acquire the complete corpus. (2.8GiB distributed, 8.6GiB installed)

20080603

You really really only want exactly what was used in studies on this release. (2.8GiB distributed, 8.6GiB installed)