Search in Blogs of Narayana Rao

Loading

Thursday, July 14, 2011

Ontology in Computer Science by Sinuhe Arroyo

Reshared under Creative Commons Attribution-Noncommercial 3.0 License
(There should not be any exchange of money between the receiver of this document and giver or distributor of this document)

Source Knol: Ontology
by Sinuhe Arroyo

The core concept behind the Semantic Web is the representation of data in a machine interpretable way. Ontologies facilitate the means to realize such representation. They characterize formal and consensual specifications of conceptualizations, providing a shared and common understanding of a domain as data and information machine-processable semantics, which can be communicated among agents (organizations, individuals, and software) [10]. Ontologies put in place the means to describe the basic categories and relationships of things by defining entities and types of entities within its framework

Ontologies bring together two essential aspects that are necessary to enhance the Web with semantic technology. Firstly, they provide machine processability by defining formal information semantics. Secondly, they provide machine-human understanding due to their ability to specify conceptualization of the real-world. By these means, ontologies link machine processable content with human meaning using a consensual terminology as connecting element [10].

This knol explores the concepts and ideas behind metadata glossaries. It depicts the most relevant paradigms with the aim of showing their different pros and cons, while evolving towards the concept of ontology. It pays special attention to the relation between ontologies and Knowledge bases and the differences between light weight and heavy weight ontologies. Furthermore, the knol introduces relevant ontology languages, i.e. RDF, OWL and, WSML. Doing so, we examine their main features, applications and core characteristics.

Metadata Glossary: From Controlled Vocabularies to Ontologies
Controlled vocabularies
A controlled vocabulary is a finite list of preferred terms used for the purpose of easing content retrieval. Controlled vocabularies consist of pre-defined, authorized terms, which is in sharp contrast to natural language vocabularies that typically evolve freely without restrictions. Controlled vocabularies can be used for categorizing content, building labeling systems or defining database schemas among others. A catalogue is a good example of a controlled vocabulary.

Taxonomies
The discipline taxonomy refers to the science of classification. The term has its etymological root in the Greek word “taxis”, meaning “order” and “nomos”, with the meaning of “law” or “science”.

In our context, taxonomy is best defined as a set of controlled vocabulary terms. Each individual vocabulary term is known as “taxa”. Taxa identify units of meaningful content in a given domain. Taxonomies are usually arranged following a hierarchical structure, grouping kinds of things into some order (e.g. alphabetical list).

A good example for a taxonomy is the Wikispecies [1] project, which aims at creating a directory of species. In the “Taxonavigation” the path in the taxonomy leading to the species is depicted.

Thesauri
The term "thesaurus" has its etymological root in the ancient Greek word “θησαυρός”, which evolved into the Latin word “thesaurus”. In both, the cultures, thesaurus meant "storehouse" or "treasury", in the sense of repository of words[1]. A thesaurus is therefore similar to a dictionary with the difference that it does not provide word definitions, its scope is limited to a particular domain, entry terms are single-word or multi-word entries and that it facilitates limited cross-referencing among the contained terms e.g. synonyms and antonyms [2], [3].

A thesaurus should not be considered as an exhaustive list of terms. Rather they are intended to help differentiating among similar meanings, so that the most appropriate one for the intended purpose can be chosen. Finally, thesauri also include scope notes, which are textual annotations used to clarify the meaning and the context of terms.

In a nutshell, a thesaurus can be defined as a taxonomy expressed using natural language that makes explicit a limited set of relations among the codified terms.

The AGROVOC Thesaurus [4] developed by the Food and Agriculture Organization of the United Nations (FAO) is a good example of a thesaurus.

Ontologies
In philosophy, ontology is the study of being or existence. It constitutes the basic subject matter of metaphysics [3], which has the objective of explaining existence in a systematic manner, by dealing with the types and structures of objects, properties, events, processes and relations pertaining to each part of reality.

Recently, the term ontology was adapted in computer science, where ontologies are similar to taxonomies in that they represent relations among terms. However, ontologies offer a much richer meaning representation mechanism for the relationships among concepts, i.e. terms, and attributes. This is the reason because they are, nowadays, the preferred mechanism to represent knowledge.

In 1993 Gruber provided one of the most widely adopted definitions of Ontology.

“An ontology is an explicit specification of a conceptualization”.

Gruber’s definition was further extended by Borst in 1997. In his work Construction of Engineering Ontologies [5] he defines ontology as follows.

“Ontologies are defined as a formal specification of a shared conceptualization”.

Studer, Benjamins and Fensel [6] further refined and explained this definition in 1998. In their work, the authors defined an ontology as:
“a formal, explicit specification of a shared conceptualization".

Formal: Refers to the fact that an ontology should be machine-readable.

Explicit: Means that the type of concepts used, and the restrictions on their use are explicitly defined.

Shared: Reflects the notion that the ontology captures consensual knowledge, that is, it is not the privilege of some individual, but accepted by a group”.

Conceptualization: Refers to an abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon.

Ontologies and knowledge bases
The relation between ontologies and knowledge bases is a controversial topic. It is not clear whether an ontology can only contain the abstract schema (example: concept Person) or also the concrete instances of the abstract concepts (example: Tim Berners-Lee). When drawing a clear line between abstract schema definitions and the instance level, one runs into the problem that in some cases instances are required in order to specify abstract concepts. An example that illustrates this problem is the definition of the concept “New Yorker” as the concept of persons living in New York: in this case, the instance “New York” of city is required in order to specify the concept.

A number or authors have tackled this problem and identified the limits and relationships among existing definitions.

Two definitions, the first one provided by Bernaras et. al [8], and the second one by Swartout [7], clearly identify the relationship between ontologies and knowledge bases.

"An ontology provides the means for describing explicitly the conceptualization behind the knowledge represented in a knowledge base".

“An ontology is a set of structured terms that describes some domain or topic. The idea is that an ontology provides a skeletal structure for a knowledge base“.


Lightweight vs. heavyweight ontologies
Depending on the axiomatization richness of ontologies one can distinguish between heavyweight and lightweight ontologies. Those that make intensive use of axioms to model knowledge and restrict domain semantics are referred to as heavyweight ontologies [10]. On the other hand, those ontologies that make scarce or no use of axioms to model knowledge and clarify the meaning of concepts in the domain are referred to as lightweight ontologies. Lightweight ontologies are a subclass of heavyweight ontologies, typically predominantly a taxonomy, with very few cross-taxonomical links (also known as “properties”), and with very few logical relations between the classes. Davies, Fensel et al. [9] emphasize the importance of such lightweight ontologies:

"We expect the majority of the ontologies on the Semantic Web to be lightweight. […] Our experiences to date in a variety of Semantic Web applications (knowledge management, document retrieval, communities of practice, data integration) all point to lightweight ontologies as the most commonly occurring type.”

Ontologies and folksonomies


Ontology Languages
Resource Description Framework
Web Ontology Language
OWL Lite
OWL DL
OWL Full
Web Service Modeling Language
WSML-Core
WSML-DL
WSML-Flight
WSML-Rule
WSML-Full




References

1. http://species.wikimedia.org
2. National Information Standards Organization (NISO). (2003). (ANSI/NISO Z39.19-2003, 2003: 1). http://www.niso.org/home.
3. Wikipedia. www.wikipedia.org
4. Food and Agriculture Organization of the United Nations (FAO).AGROVOC Thesaurus. (1980). http://www.fao.org/aims/ag_intro.htm.
5. Borst W. N. (1997). Construction of Engineering Ontologies. Centre for Telematica and Information Technology, University of Tweenty. Enshede, The Netherlands
6. Studer, R., V.R. Benjamins and Fensel, D. (1998). Knowledge engineering: principles and methods. IEEE Transactions on Data and Knowledge Engineering 25(1-2):161-197.
7. Swartout, B., Patil, R., Knight, K., and Russ, T. (1996). Toward distributed use of large-scale ontologies. In Proceedings of 10th Knowledge Acquisition for Knowledge-Based Systems Worskhop. Banff, Canada.
8. Bernaras A., Laresgoiti I. and Corera J. (1996). Building and reusing ontologies for electrical network applications. In: Wahlster W. (ed) Eurpean Conference on Artificial Intelligence (ECAI’96). Budapest, Hungary. John Wiley and Sons, Chichest...
9. Davies, J., D. Fensel, et al., Eds. (2002). Towards the Semantic Web: Ontology-driven Knowledge Management, Wiley.
10. Fensel, D. (2001). Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce, Springer-Verlag, Berlin, 2001.

No comments: