Qualitas Corpus Metadata

A crucial part of the Qualitas Corpus, and what really makes it a curated collection, rather than just a bunch of downloaded files, is the information that has been gathered on the different systems and versions, generally referred to as metadata. There are different types of metadata, and it is presented in different ways, as described below.

Attributes
Each System or Sysver has a number of attributes whose values are recorded (specifically in the summary.csv and .properties files described below).
summary.csv
This is a single file for the corpus with tab-separated entries for each sysver that contains the values of the attributes mentioned above of the sysver into one place. Where the attributes are system-level attibutes, each sysver entry will have the same value. This file is found in the top-level metadata directory. The values in this file for one sysver are also found in the sysver's .properties file.

This is new as of release 20100719 and was restructured for 20120401.

.properties file
This file exists for every sysver, and contains the attribute values for that version. The same information is found as for the sysver entry in the summary.csv file. This file is formatted so that it can be easily managed using java.util.Properties.

As of 20120401, this file contains the same information as is found in summary.csv.

contents.csv
This file exists for every sysver (in the sysver metadata directory) and contains the details for every Java type found in either the bin or src (or both) for that version.

This is new as of release 20100719.

Third-party library use
This describes which systems in the corpus use third party libraries, and of those, which do or do not distribute those libraries. This is new as of release 20100719.