Qualitas Corpus Clone Collection: Confidence Level

Confidence Level is an ordinal-scale metric indicating the level of confidence there is regarding the reliability of a measurement. It is primarily intended to be used for the "clone/not clone" designation given to a candidate pair.

A confidence level reflects the amount of evidence, and the agreement of said evidence, in support of the measurement. Generally, the more evidence there is, the higher the confidence. However, if there is lot of evidence, but it is inconsistent (for example some says a candidate pair is a clone pair but other evidence says it is not) then that will lower the confidence level. Another interpretation of confidence level is that it indicates the probability that the measurement might change in light of more evidence.

A confidence level provides little information about the error in the measurement. It could be that a measurement has the lowest level of confidence, but is absolutely correct — it is just that there is no other support evidence.

The current measurement values are:

Lowest
There is some evidence in support of the measurement, but there's also some reason to doubt the measurement (typically inconsistent evidence).
Low
There is some evidence (typically the output of a tool) in support of the measurement but there is no corroborating evidence at all.
Medium
There is more than one source of evidence that supports the measurement.
High
There are many and varied sources of evidence that supports the measurement, or there is some other information about the entity being measured that gives high confidence in the measurement.
Highest
There is no expectation that the measurement will change.
Comment: I have thought about having a value lower than "Lowest", something like "Inconsistent", to indicate that what evidence there is is not in agreement. I'm unconvinced it is needed for now, given the suggested measurement protocol below.

Towards a measurement protocol

The confidence level metric is (a) ordinal and (b) somewhat subjective. Below are some notes on how to assign a confidence level.