Fold Path |
A path prefix that is elided when displaying paths
|
|
mete-cmcd gives absolute paths to the source files in the
specification of code
fragments. This means there is a lot of long, repeated
information. The fold path parameter says what can be sensibly elided
in the code fragment specifications.
|
Difference Threshold |
The largest value of the normalised CM difference that is
considered a clone pair
|
|
mete-cmcd computes a "different" score between each pair of
methods, where 0 means no difference (other than whitespace and
comments) and larger scores mean more different. This parameters
indicates the maximum difference score used in the data set to
determine whether a candidate
pair is a clone pair.
|
Minimum AST nodes |
The definition of 'small' (in AST nodes) for omitting small methods.
|
|
mete-cmcd is implemented using
an ANTLR parser, and does its
analysis by walking the AST trees produced by the parser. Frequently,
clone detectors do not report matches of "small" code fragments, as
small fragments can often look very similar just due to their size
(e.g., two fragments consisting of a declaration of an integer
variable and its initialisation). mete-cmcd determines whether
or not a code fragment is small by determining how many nodes there are
in its AST. This parameter indicates what the smallest code fragment
to consider according to this means of determining size.
|
Size ratio threshold |
If the method size (measure as number of nodes in AST) ratio is
more than this then not clones.
|
|
If code fragments are of quite different sizes, then it is
unlikely that one is a clone of another. This parameter value is the
ratio used to determine whether or not to proceed with the
difference computation.
|
Text difference threshold |
If the method texts differ by more than this ratio then not clones.
|
|
The means by which mete-cmcd determines the difference score
can produce a small score for what are clearly quite different
methods. To avoid such false positives, a simple text comparison is
done between the methods first. If the difference is greater than
this parameter value, then the fragments are considered not clones.
|
Comments ignored |
Whether to include comments when comparing methods |
|
When determining the text difference, this parameter indicates whether
or not comments should be considered. Mainly this is useful for
performance.
|
Sysver |
Identification of what was analysed (typically a corpus System Version)
|
|
What code base is being analysed (identified using
the Corpus identifier)
|
Files |
Number of files analysed
|
|
Not all files in the corpus are analysed. The ones that are analysed
are those for which
the contents.csv
file indicates that the file contains source code that is considered to be
developed for the system under analysis, and for which there is both source
and binary (byte code) versions in the corpus.
|
Methods |
Number of methods, not counting constructors or methods that are too small.
|
|
Since mete-cmcd identifies clone pairs at the method level of
granularity, it is useful to know how many methods were considered.
|
ELOC (Methods only) |
ELOC for methods, not counting constructors or methods that are too small. ELOC is lines of code, not counting lines that are blank, contain only comments, or only braces.
|
|
The sum of the ELOC of methods that were analysed.
|
Clone pairs |
Number of clone pairs
|
|
What it says.
|
Clusters |
Number of clusters
|
|
What it says.
|
Code clones |
ELOC of code that is in a clone pair (proportion of ELOC).
|
|
The sum or the ELOC over all code fragments that appear in a clone
pair (and, in paratheses, the proportion with respect to the
ELOC (Methods only) measurement).
|
Cloned code |
Sum of ELOC for code in a cluster minus the size of smallest
fragment, summed over all clusters. (proportion of ELOC)
|
|
Sum of the ELOC(cloned) values over all
clusters. (the proportion with respect to the
ELOC (Methods only) measurement).
|
To be considered a clone pair (that is, listed in this section), a
pair of code fragments must meet the various conditions described above
(see Parameters) and neither code fragment
can be a Java constructor.
Each line of the clone pair information describes one clone pair, and
is divided into the following fields:-
Cluster |
A unique ID identifying the cluster the clone pair belongs to.
|
|
Each cluster has an ID that is unique with
each file. Further information about the cluster a code fragment
belongs to is given in the Cluster information
below.
|
File1 |
The name of the file (sans foldpath prefix) containing the lexically first method in the clone pair.
|
|
This value prefixed by the foldpath gives the absolute path to the
source file containing the "lexically first" code fragment. While there
is no inherent order to the fragments in a clone pair, it is useful
for presentation purposes (e.g. sorting clone pairs) to have a
well-defined order of fragments. The one chosen is the lexical ordering
determined by the method name.
|
Method1 |
The name of the lexically first method in the clone pair.
|
|
mete-cmcd works at the method granularity, that is, all code
fragments it considers are methods. These methods are identified by
the (Java) name of the method, the types of the parameters, and the
fully-qualified name of the class the method is declared in. All of
this information is given in this field.
|
Location1 |
The beginning and ending line numbers in the file where the lexically first method can be found.
|
|
The line numbers in the source file that bound the "first" code
fragment. These are the physical line numbers referring to the file
exactly as it appears. No normalisation or transformation of any
kind is assumed.
|
ELOC1 |
The number of lines of code in the lexically first method.
|
|
One indication of code fragment size can be determined by its
beginning and ending line numbers, however this will include such
things as blank lines, and so may be misleading in some way.
The ELOC metric is a lines-of-code
variant that does not count blank lines, lines consisting only of
comments, or lines consisting only of braces.
|
Nodes1 |
The number of nodes in the AST for the the lexically first method.
|
|
This is a size measurment based on the AST for the code fragment.
In order for a pair of code fragments to be listed, this value has
to be as large as the Minimum AST nodes
parameter value.
|
File2 |
The name of the file (foldpath common prefix) containing the
lexically second method in the clone pair.
|
|
Same as for File1 but for the other code fragment.
|
Method2 |
The name of the lexically second method in the clone pair.
|
|
Same as for Method1 but for the other code fragment.
|
Location2 |
The beginning and ending line numbers in the file where the lexically second method can be found.
|
|
Same as for Location1 but for the other code fragment.
|
ELOC2 |
The number of lines of code in the lexically second method.
|
|
Same as for ELOC1 but for the other code fragment.
|
Nodes2 |
The number of nodes in the AST for the the lexically second method.
|
|
Same as for Nodes1 but for the other code fragment.
|
Diff |
The normalised difference score between the two methods.
|
|
The difference score used to determine whether or not
a pair of code fragments is a clone pair. That is, to be listed
as a clone pair, this value has to be smaller than the
Difference Threshold parameter
value.
|
RawDiff |
The raw difference score between the two methods.
|
|
This is the difference score produced by the basic algorithm used
by mete-cmcd. However this value is sensitive to the size of
the code fragment, so the actual difference score used is normalised
by the code fragment size.
|
Cluster |
A unique ID identifying the cluster the clone pair belongs to.
|
|
This ID is unique only within the dataset. It may match an ID in
another dataset. It has no meaning other than to identify a
cluster.
|
Pairs |
Number of clone pairs in cluster
|
|
This is one indication of cluster size (that is, number of "edges").
|
Methods |
Number of distinct methods in cluster
|
|
This is another indication of cluster size (number of "vertices")
|
ELOC |
Sum of ELOC for all methods in cluster
|
|
This provides one indication of "cloned code" there is.
|
ELOC(cloned) |
ELOC for all but the smallest method
|
|
If a clone pair was formed by one code fragment being copied (and then
perhaps modified), then in any cluster there is a fragment that is
the "original". And technically, the original fragment is not a clone.
So the "cloned code" is all code in the cluster other than the original.
Because there is no way for mete-cmcd to tell which is the
original, but also because all code fragments are roughly the same
size (due to the Size ratio threshold), a
good indication of how much code has been cloned can be given by just
picking one fragment in the cluster as a proxy for the original. In order
to ensure the same answer is given every time, the smallest fragment is
chosen.
|
Data from clone analysis.
Tool: mete-cmcd: 2013-01-29T1615
Timestamp: Wed Jan 30 10:56:32 NZDT 2013
Parameters
Fold Path: /opt/qualitas/QualitasCorpus-20120401/Systems/ant/ant-1.8.2/src A path prefix that is elided when displaying paths
Difference Threshold: 45 The largest value of the normalised CM difference that is considered a clone pair
Minimum AST nodes: 50 The definition of 'small' (in AST nodes) for omitting small methods.
Text difference threshold: 0.5 If the method texts differ by more than this ratio then not clones.
Size ratio threshold: 0.65 If the method size (measure as number of nodes in AST) ratio is more than this then not clones.
Comments ignored: true Whether to include comments when comparing method text.
Global Values
Sysver: ant-1.8.2 Identification of what was analysed (typically a corpus System Version)
Files: 843 Number of files analysed
Methods: 2974 Number of methods, not counting constructors or methods that are too small.
ELOC (Methods only): 49791 ELOC for methods, not counting constructors or methods that are too small. ELOC is lines of code, not counting lines that are blank, contain only comments, or only braces.
Clone pairs: 963 Number of clone pairs
Clusters: 299 Number of clusters
Code clones: 10302 (0.21) ELOC of code that is in a clone pair (proportion of ELOC).
Cloned code: 6571 (0.13) ELOC of code in clone pair minus size of smallest fragment (proportion of ELOC)
Clone Pair Information
# 1. Cluster A unique ID identifying the cluster the clone pair belongs to.
# 2. File1 The name of the file (sans foldpath prefix) containing the lexically first method in the clone pair.
# 3. Method1 The name of the lexically first method in the clone pair.
# 4. Location1 The beginning and ending line numbers in the file where the lexically first method can be found.
# 5. ELOC1 The number of lines of code in the lexically first method.
# 6. Nodes1 The number of nodes in the AST for the the lexically first method.
# 7. File2 The name of the file (foldpath common prefix) containing the lexically second method in the clone pair.
# 8. Method2 The name of the lexically second method in the clone pair.
# 9. Location2 The beginning and ending line numbers in the file where the lexically second method can be found.
# 10. ELOC2 The number of lines of code in the lexically second method.
# 11. Nodes2 The number of nodes in the AST for the the lexically second method.
# 12. Diff The normalised difference score between the two methods.
# 13. RawDiff The raw difference score between the two methods.
Cluster File1 Method1 Location1 ELOC1 Nodes1 File2 Method2 Location2 ELOC2 Nodes2 Diff RawDiff
C89 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.forceLoadClass(String) (645,654) 6 61 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.forceLoadSystemClass(String) (672,681) 6 61 0.00 0.00
C110 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.getCertificates(File, String) (1191,1202) 9 95 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.getJarManifest(File) (1169,1178) 7 66 31.29 31.29
C87 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.getResource(String) (868,904) 23 256 /apache-ant-1.8.2/src/main/org/apache/tools/ant/AntClassLoader.java org.apache.tools.ant.AntClassLoader.getResourceAsStream(String) (692,722) 23 183 30.01 30.01
...
C1 /apache-ant-1.8.2/src/main/org/apache/tools/ant/types/resources/selectors/ResourceSelectorContainer.java org.apache.tools.ant.types.resources.selectors.ResourceSelectorContainer.dieOnCircularReference(Stack, Project) (110,126) 12 114 /apache-ant-1.8.2/src/main/org/apache/tools/ant/types/selectors/BaseSelectorContainer.java org.apache.tools.ant.types.selectors.BaseSelectorContainer.dieOnCircularReference(Stack, Project) (331,347) 12 115 0.00 0.00
C1 /apache-ant-1.8.2/src/main/org/apache/tools/ant/types/selectors/AbstractSelectorContainer.java org.apache.tools.ant.types.selectors.AbstractSelectorContainer.dieOnCircularReference(Stack, Project) (325,340) 11 113 /apache-ant-1.8.2/src/main/org/apache/tools/ant/types/selectors/BaseSelectorContainer.java org.apache.tools.ant.types.selectors.BaseSelectorContainer.dieOnCircularReference(Stack, Project) (331,347) 12 115 0.00 0.00
Cluster Information
# 1. Cluster A unique ID identifying the cluster the clone pair belongs to.
# 2. Pairs Number of clone pairs in cluster
# 3. Methods Number of distinct methods in cluster
# 4. ELOC Sum of ELOC for all methods in cluster
# 5. ELOC(cloned) ELOC for all but the smallest method
Cluster Pairs Methods ELOC ELOC(cloned)
C110 1 2 16 9
C89 1 2 12 6
C87 1 2 46 23
...
C1 161 27 300 293
|