Qualitas Corpus Metadata
A crucial part of the Qualitas Corpus, and what really makes it a curated
collection, rather than just a bunch of downloaded files, is the information
that has been gathered on the different systems and versions, generally
referred to as metadata. There are different types of metadata, and
it is presented in different ways, as described below.
- Attributes
-
Each System or Sysver has a
number of attributes whose values are recorded (specifically
in the summary.csv and .properties files described
below).
- summary.csv
-
This is a single file for the corpus with tab-separated entries for each
sysver that contains the values of the attributes mentioned above of the
sysver into one place. Where the attributes are system-level attibutes, each
sysver entry will have the same value. This file is found in the
top-level metadata directory. The
values in this file for one sysver are also found in the sysver's
.properties file.
This is new as of release 20100719
and was restructured for 20120401.
- .properties file
- This file exists for every sysver, and
contains the attribute values for that version. The same information is
found as for the sysver entry in the summary.csv file. This file
is formatted so that it can be easily managed using
java.util.Properties.
As of 20120401, this file contains
the same information as is found in summary.csv.
- contents.csv
-
This file exists for every sysver (in the
sysver metadata directory) and contains the
details for every Java type found in either the
bin or src (or both) for that version.
This is new as of
release 20100719.
- Third-party library use
-
This describes which systems in the corpus
use third party libraries, and of those, which
do or do not distribute those libraries.
This is new as of release 20100719.