Class Metrics
Class Metrics examine the classes and relationships of ontologies.
Contents
Class Connectivity
This metric is intended to give an indication of what classes are central in the ontology based on the instance relationship graph (where nodes represent instances and edges represent the relationships between them). This measure works in tandem with the importance metric mentioned next to create a better understanding of how focal some classes function. This measure can be used to understand the nature of the ontology by indicating which classes play a central role compared to other classes.
The connectivity of a class (Conn(Ci)) is defined as the total number of relationships instances of the class have with instances of other classes (NIREL).
Conn(Ci)=|NIREL(Ci)|
Class Fulness
This metric details the knowledgebase (KB) average population metric mentioned above. It would be mainly used by an ontology developer interested in knowing how well the data extraction was with respect to the expected number of instances of each class. This is helpful in directing the extraction process to any resources that will add instances belonging to classes that are not full.
Formally, the fullness (F) of a class Ci is defined as the actual number of instances that belong to the subtree rooted at Ci (Ci (I)) compared to the expected number of instances that belong to the subtree rooted at Ci (Ci`(I)).
F=|Ci (I)|/Ci`(I)
The result of the formula will be a percentage representing the actual coverage of instances compared to the expected coverage. In most cases, this measure is an indication of how well the instance extraction process performed. For example, a KB where most classes have a low F would require more data extraction. On the other hand, a KB where most classes are almost full would indicate that it reflects more closely the knowledge encoded in the schema.
Class Importance
This metric calculates the percentage of instances that belong to classes at the inheritance subtree rooted at the current class with respect to the total number of instances. This metric is important in that it will help in identifying which areas of the schema are in focus when the instances are added to the knowledgebase (KB). Although this measure doesn’t consider the domain characteristics, it can still be used to give an idea on what parts of the ontology are considered focal and what parts are on the edges.
The importance of a class (Imp(Ci)) is defined as the percentage of the number of instances that belong to the inheritance subtree rooted at Ci in the KB (inst(Ci)) compared to the total number of class instances in the KB (CI).
Imp(Ci)= |INST(Ci)| / |KB(CI)|
Class Inheritance Richness
This measure details the schema IRS metric mentioned above and describes the distribution of information in the current class subtree per class. This measure is a good indication of how well knowledge is grouped into different categories and subcategories under this class.
Formally, the inheritance richness (IRc) of class Ci is defined as the average number of subclasses per class in the subtree. The number of subclasses for a class Ci is defined as |HC(C1,Ci)| and the number of nodes in the subtree is |C'|.
IRC = ((SUM Cj⋲ C') |HC(C1,Ci)|) / |C'|
The result of the formula will be a real number representing the average number of classes per schema level. The interpretation of the results of this metric depends highly on the nature of the ontology. Classes in an ontology that represents a very specific domain will have low IRC values, while classes in an ontology that represents a wide domain will usually have higher IRC values.
Class Readability
This metric indicates the existence of human readable descriptions in the ontology, such as comments, labels, or captions. This metric can be a good indication if the ontology is going to be queried and the results listed to users. Formally, the readability (Rd) of a class Ci is defined as the sum of the number attributes that are comments and the number of attributes that are labels the class has.
Rd = |A, A = rdfs:comment| + |A, A=refs:label|
The result of the formula will be an integer representing the availability of human-readable information for the instances of the current class.
Class Relationship Richness
This is an important metric reflecting how much of the relationships defined for the class in the schema are actually being used at the instances level. This is another good indication of the utilization of the knowledge modeled in the schema.
The relationship richness (RR) of a class Ci is defined as the percentage of the number of relationships that are being used by instances Ii that belong to Ci (P(Ii,Ij)) compared to the number of relationships that are defined for Ci at the schema level (P(Ci,Cj)).
Sources
- Samir Tartir, I. Budak Arpinar, Amit P. Sheth: Ontological Evaluation and Validation
In: Theory and Applications of Ontology: Computer Applications 2010, pp 115-130
http://link.springer.com/chapter/10.1007%2F978-90-481-8847-5_5 - Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, and Boanerges Aleman-meza:
Ontoqa: Metric-based ontology quality analysis.
In: IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, 2005.
http://cobweb.cs.uga.edu/~budak/papers/ontoqa.pdf