Difference between revisions of "Class Metrics"

From OntoMetrics
Jump to: navigation, search
Line 6: Line 6:
 
The connectivity of a class (Conn(Ci)) is defined as the total number of relationships instances of the class have with instances of other classes (NIREL).
 
The connectivity of a class (Conn(Ci)) is defined as the total number of relationships instances of the class have with instances of other classes (NIREL).
  
<math>Conn(Ci)=|NIREL(Ci)|</math>
+
Conn(Ci)=|NIREL(Ci)|
  
 
==Class Fulness==
 
==Class Fulness==
Line 14: Line 14:
 
Formally, the fullness (F) of a class Ci is defined as the actual number of instances that belong to the subtree rooted at Ci (Ci (I)) compared to the expected number of instances that belong to the subtree rooted at Ci (Ci`(I)).
 
Formally, the fullness (F) of a class Ci is defined as the actual number of instances that belong to the subtree rooted at Ci (Ci (I)) compared to the expected number of instances that belong to the subtree rooted at Ci (Ci`(I)).
  
F=|Ci (I)|/Ci`(I)
+
F=|Ci (I)|/Ci`(I)
  
 
The result of the formula will be a percentage representing the actual coverage of instances compared to the expected coverage. In most cases, this measure is an indication of how well the instance extraction process performed. For example, a KB where most classes have a low F would require more data extraction. On the other hand, a KB where most classes are almost full would indicate that it reflects more closely the knowledge encoded in the schema.
 
The result of the formula will be a percentage representing the actual coverage of instances compared to the expected coverage. In most cases, this measure is an indication of how well the instance extraction process performed. For example, a KB where most classes have a low F would require more data extraction. On the other hand, a KB where most classes are almost full would indicate that it reflects more closely the knowledge encoded in the schema.
Line 20: Line 20:
 
==Class Importance==
 
==Class Importance==
  
This metrics calculates the percentage of instances that belong to classes at the inheritance subtree rooted at the current class with respect to the total number of instances. This metric is important in that it will help in identifying which areas of the schema are in focus when the instances are added to the knowledgebase (KB). Although this measure doesn’t consider the domain characteristics, it can still be used to give an idea on what parts of the ontology are considered focal and what parts are on the edges.
+
This metric calculates the percentage of instances that belong to classes at the inheritance subtree rooted at the current class with respect to the total number of instances. This metric is important in that it will help in identifying which areas of the schema are in focus when the instances are added to the knowledgebase (KB). Although this measure doesn’t consider the domain characteristics, it can still be used to give an idea on what parts of the ontology are considered focal and what parts are on the edges.
  
 
The importance of a class (Imp(Ci)) is defined as the percentage of the number of instances that belong to the inheritance subtree rooted at Ci in the KB (inst(Ci)) compared to the total number of class instances in the KB (CI).
 
The importance of a class (Imp(Ci)) is defined as the percentage of the number of instances that belong to the inheritance subtree rooted at Ci in the KB (inst(Ci)) compared to the total number of class instances in the KB (CI).
  
Imp(Ci)= |INST(Ci)| / |KB(CI)|
+
Imp(Ci)= |INST(Ci)| / |KB(CI)|
  
 
==Class Inheritance Richness==
 
==Class Inheritance Richness==
Line 32: Line 32:
 
Formally, the inheritance richness (IRc) of class Ci is defined as the average number of subclasses per class in the subtree. The number of subclasses for a class Ci is defined as |HC(C1,Ci)| and the number of nodes in the subtree is |C'|.
 
Formally, the inheritance richness (IRc) of class Ci is defined as the average number of subclasses per class in the subtree. The number of subclasses for a class Ci is defined as |HC(C1,Ci)| and the number of nodes in the subtree is |C'|.
  
IRC = ((SUM Cj⋲ C') |HC(C1,Ci)|) / |C'|
+
IRC = ((SUM Cj⋲ C') |HC(C1,Ci)|) / |C'|
  
 
The result of the formula will be a real number representing the average number of classes per schema level. The interpretation of the results of this metric depends highly on the nature of the ontology. Classes in an ontology that represents a very specific domain will have low IRC values, while classes in an ontology that represents a wide domain will usually have higher IRC values.
 
The result of the formula will be a real number representing the average number of classes per schema level. The interpretation of the results of this metric depends highly on the nature of the ontology. Classes in an ontology that represents a very specific domain will have low IRC values, while classes in an ontology that represents a wide domain will usually have higher IRC values.
Line 41: Line 41:
 
This metric indicates the existence of human readable descriptions in the ontology, such as comments, labels, or captions. This metric can be a good indication if the ontology is going to be queried and the results listed to users. Formally, the readability (Rd) of a class Ci is defined as the sum of the number attributes that are comments and the number of attributes that are labels the class has.
 
This metric indicates the existence of human readable descriptions in the ontology, such as comments, labels, or captions. This metric can be a good indication if the ontology is going to be queried and the results listed to users. Formally, the readability (Rd) of a class Ci is defined as the sum of the number attributes that are comments and the number of attributes that are labels the class has.
  
Rd = |A, A = rdfs:comment| + |A, A=refs:label|
+
Rd = |A, A = rdfs:comment| + |A, A=refs:label|
  
 
The result of the formula will be an integer representing the availability of human-readable information for the instances of the current class.
 
The result of the formula will be an integer representing the availability of human-readable information for the instances of the current class.

Revision as of 11:42, 1 June 2016

Class Metrics examine the classes and relationships of ontologies.

Class Connectivity

This metric is intended to give an indication of what classes are central in the ontology based on the instance relationship graph (where nodes represent instances and edges represent the relationships between them). This measure works in tandem with the importance metric mentioned next to create a better understanding of how focal some classes function. This measure can be used to understand the nature of the ontology by indicating which classes play a central role compared to other classes.

The connectivity of a class (Conn(Ci)) is defined as the total number of relationships instances of the class have with instances of other classes (NIREL).

Conn(Ci)=|NIREL(Ci)|

Class Fulness

This metric details the knowledgebase (KB) average population metric mentioned above. It would be mainly used by an ontology developer interested in knowing how well the data extraction was with respect to the expected number of instances of each class. This is helpful in directing the extraction process to any resources that will add instances belonging to classes that are not full.

Formally, the fullness (F) of a class Ci is defined as the actual number of instances that belong to the subtree rooted at Ci (Ci (I)) compared to the expected number of instances that belong to the subtree rooted at Ci (Ci`(I)).

F=|Ci (I)|/Ci`(I)

The result of the formula will be a percentage representing the actual coverage of instances compared to the expected coverage. In most cases, this measure is an indication of how well the instance extraction process performed. For example, a KB where most classes have a low F would require more data extraction. On the other hand, a KB where most classes are almost full would indicate that it reflects more closely the knowledge encoded in the schema.

Class Importance

This metric calculates the percentage of instances that belong to classes at the inheritance subtree rooted at the current class with respect to the total number of instances. This metric is important in that it will help in identifying which areas of the schema are in focus when the instances are added to the knowledgebase (KB). Although this measure doesn’t consider the domain characteristics, it can still be used to give an idea on what parts of the ontology are considered focal and what parts are on the edges.

The importance of a class (Imp(Ci)) is defined as the percentage of the number of instances that belong to the inheritance subtree rooted at Ci in the KB (inst(Ci)) compared to the total number of class instances in the KB (CI).

Imp(Ci)= |INST(Ci)| / |KB(CI)|

Class Inheritance Richness

This measure details the schema IRS metric mentioned above and describes the distribution of information in the current class subtree per class. This measure is a good indication of how well knowledge is grouped into different categories and subcategories under this class.

Formally, the inheritance richness (IRc) of class Ci is defined as the average number of subclasses per class in the subtree. The number of subclasses for a class Ci is defined as |HC(C1,Ci)| and the number of nodes in the subtree is |C'|.

IRC = ((SUM Cj⋲ C') |HC(C1,Ci)|) / |C'|

The result of the formula will be a real number representing the average number of classes per schema level. The interpretation of the results of this metric depends highly on the nature of the ontology. Classes in an ontology that represents a very specific domain will have low IRC values, while classes in an ontology that represents a wide domain will usually have higher IRC values.


Class Readability

This metric indicates the existence of human readable descriptions in the ontology, such as comments, labels, or captions. This metric can be a good indication if the ontology is going to be queried and the results listed to users. Formally, the readability (Rd) of a class Ci is defined as the sum of the number attributes that are comments and the number of attributes that are labels the class has.

Rd = |A, A = rdfs:comment| + |A, A=refs:label|

The result of the formula will be an integer representing the availability of human-readable information for the instances of the current class.

Class Relationship Richness

This is an important metric reflecting how much of the relationships defined for the class in the schema are actually being used at the instances level. This is another good indication of the utilization of the knowledge modeled in the schema.

The relationship richness (RR) of a class Ci is defined as the percentage of the number of relationships that are being used by instances Ii that belong to Ci (P(Ii,Ij)) compared to the number of relationships that are defined for Ci at the schema level (P(Ci,Cj)).