META-data and web ranking by John Gekas
Web ranking methodologies applied by search engines rely on a variety of ranking factors; however, one of the most fundamental factors that has been employed for ranking purposes is the HTML META-tags. In a few words, META tags consist of descriptive keywords that describe what the corresponding web page is all about. Web authors and/or webmasters have been using META-tags since the dawn of the search engines era, in order to provide appropriate descriptive keywords for indexing by the search engine indexing systems (aka web crawlers, web robots and so on). They have been so easy to use that their widespread adaptability is no wonder: any kind of descriptive content can be used as META-tags, and no real knowledge of web authoring / web programming is required in order to place them on a web page. In fact, their use has been so versatile that they have also been subject to abuse (keyword spamming and so on).
Intro to semantics
The reliability and robustness of HTML META-tags could indeed become a matter of dispute, in the context of web ranking. Their use, however, provides a practical and hands-on example on how important semantics are in the context of web ranking, and computer science in general. In layman’s terms, semantics could be defined as information about information. In other words, it is information whose purpose is not to be displayed itself, but rather is attached to a primary information source (say, a web page) and its purpose is to describe the content of that primary source. As an example, again, think of the META-tags: these keywords are not displayed as part of any web page; they rather describe what the content of their corresponding web page consists of.
The area of semantics of course, goes a long way beyond simple keyword descriptions. The ultimate goal of semantic technology is to provide information that can be “understood” not by humans, but rather by software programs. Taking into account that the term “understanding”, as used in the context of human intelligence, cannot be applied to computers, we can define “understanding” as positioning a notion/term into relative context within a broader collection of notions.
In order to understand this concept, let us take another look at HTML META-tags: let us suppose that a particular web page, specializing in automobile racing news (such as F1 news, for example) has included the terms “cars” and “racing” as part of its META-tags collection. This way, the web page would be ranked highly by search engines in relation to search queries that contain these terms. However, if we assume that a (hypothetical) search engine has compiled a data model of notions/concepts for the purpose of higher search result relevancy, then the term “car” would be placed as a subclass of the term, say, “automobile”. In other words, it would be defined that a car is a kind of automobile, but it is a narrower term. Similarly, the term “racing” would be defined as a subclass of the broader term “sports”. This way, the search engine would have the ability to rank the specific web page in relation to the broader or narrower terms that are related to the page’s META-tags, and thus provide more accurate search results.
The data model mentioned above is the heart of web-based semantics, and called the ontology. An ontology can be defined as “…a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts” (according to the Wikipedia definition). Ontologies are usually compiled in a variety of purpose-specific XML formats, and can also be represented as graph networks. Ontologies represent concepts in terms of classes, properties and relations among them. In more detail, common components of ontologies include:
· Individuals: instances or objects (the basic or “ground level” objects)
· Classes: sets, collections, concepts, classes in programming, types of objects, or kinds of things.
· Attributes: aspects, properties, features, characteristics, or parameters that objects (and classes) can have
· Relations: ways in which classes and individuals can be related to one another
· Function terms: complex structures formed from certain relations that can be used in place of an individual term in a statement
· Restrictions: formally stated descriptions of what must be true in order for some assertion to be accepted as input
· Rules: statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form
· Axioms: assertions (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of application. This definition differs from that of “axioms” in generative grammar and formal logic. In those disciplines, axioms include only statements asserted as a prior knowledge. As used here, “axioms” also include the theory derived from axiomatic statements.
· Events: the changing of attributes or relations.
An example ontology schema is represented as a graph in the following figure:
Ontologies provide a highly-structured set of domain knowledge, whose purpose is not merely web search and web ranking. Semantics play an important role in expert systems, answering engines, data mining and artificial intelligence. Providing more insight in the above areas is out of the scope of this introductory article, but certain aspects may be the focus of future ones.
Here is a list of useful informational material on semantics and ontologies:
Ontology formats / languages:
Ontology engineering platforms / tools: