Web Ranking Algorithms and Applications

web rank search webandrank blog.thumbnail Web Ranking Algorithms and ApplicationsDisclaimer: the following posting for the Webandrank blog is based on the research paper titled “Web Service Ranking in Service Networks” by John Gekas, accepted at the 3rd European Semantic Web Conference (ESWC 2006), as well as other copyrighted material, referenced at the end of the posting. For more information on the presented subjects and the accompanying research papers, please visit https://sites.google.com/site/johngekas/Publications

Web Ranking Algorithms and Applications – an introduction

Web Ranking basics

From the SEO marketing point of view, it is assumed that ranking of web sites / documents depends on 2 main factors:
1.    Relativity of the website content to the search terms in question, and
2.    Popularity of the website.

In other words, the more relevant the website’s content is deemed in comparison to specific search terms, the higher it will rank. Furthermore, the more popular a website is (ie. the more traffic it generates and/or the more links it has), the more “authoritative” it is considered to be, and thus the higher it is ranked.

The above notions sound intuitive, but how does it actually work? How exactly can the content of a website (or any other piece of information, for that matter) be considered “relevant” to specific search terms? In fact, web ranking has not only been the focus of the SEO marketing community, but of computer science research in general. To be more precise, by the term “web ranking” we do not strictly refer to the ranking of website content in relation to search engine query terms; we rather refer to the relativity ranking of any informational source, which may or may not be in relation to a particular query. In this sense, web ranking could not only be applied to web site content, but also to RSS feeds, web services and any other informational sources1. The purpose, of course, is similar to that of SEO: to provide better, more meaningful results to informational queries.

Two prominent examples of web ranking algorithms are Page and Brin’s PageRank algorithm [1], and Kleinberg’s HITS algorithm [2]. The PageRank algorithm was designed and developed by Larry Page and Sergey Brin in 1998, and formed the algorithmic basis of Google’s prototype search engine system. PageRank employs a link-based approach in web ranking: the more links that come out of the web page, the more useful it is considered to be, and thus the higher it is ranked. HITS, on the other hand, another fundamental link-based ranking algorithm developed by Jon Kleinberg while working for IBM, distinguishes between web pages described as hubs (pages with many outgoing links pointing to other pages), and authorities (pages with many incoming links coming from other pages).

Both ranking algorithms described above have been employed in search engine systems and play a prominent role in search engine technology. By inspection of their fundamental principles, we can note the following:

1.    Both algorithms regard relevance in terms of connectivity, rather than text content: documents are ranked higher the more well-connected they are within the network (World Wide Web).
2.    Both algorithms are global by nature: they do not compute ranking weights in relation to a particular search query; rather, they estimate ranking weights of web pages based on their position within the network.

We can see that web ranking in this context is regarded as analysing a web-based document’s connectivity principles within a network of inter-connected resources, and using this information in order to rank the document’s usefulness or relevance within the network. In this sense, web ranking employs algorithmic techniques from graph networks and link analysis.

Network analysis is a rich field of research, employed in a variety of application areas, such as social network analysis, bibliometrics (citation analysis), and so on. A good introduction to the subject can be found in Wikipedia articles [3] and [4]. In terms of web ranking, the most important connectivity attribute that is employed is the node degree: the number of nodes a specific node is connected (linked) to, within the network. In the case of web pages, a page’s degree is the number of links from, or to, that page. Node degree can be further distinguished in in-degree (incoming links) and out-degree (outgoing links).

Web Ranking in alternative application areas – web service networks

The exact same principles are applied when employing web ranking methodologies to different kinds of networks, such as networks of web services [5], [6], [7]. By the term web service we refer to remotely accessible software components, that can be invoked and executed over the web. Web services can also be composed into workflows, that consist of the subsequent execution of more that one web service. Web services, like any other software program, accepts specific types of input, based on which provides specific types of output. For example, let us consider a fictional weather report web service: such a service may accept longitude and latitude coordinates as input, and return a current weather report for the given coordinates (in XML format).

It is reasonable to assume that the notion of the web link, as used in the context of web documents on the World Wide Web does not apply in the case of web services. A link between 2 web services Wi and Wj can be defined as follows:

We assume there is a link LwiWj  between Wi and Wj if and only if the output data provided by Wi share at least one common data item with the input data required by Wj.In this case, web services Wi and Wj can be considered compatible and can be consecutively placed in a composed workflow.

In this context, the web ranking mechanisms that can be employed can be categorised in two different ways: (a) local and global, depending on whether local or global network knowledge is needed for their estimation, and (b) absolute and relative, depending on whether the measurement is of absolute scope or refers to a particular search query. This categoprization scheme of the employed web ranking mechanisms is shown in the following Figure:

————————————————————————————————————————-

webandrank search Web Ranking Algorithms and Applications

 ————————————————————————————————————————

We will not go into a detailed description of each of the above ranks, as this is out of the scope of this high-level presentation. However, it would be useful to briefly describe some of them and some of the cases they can be useful in:

Degree-based Ranks: The ADR, RDR and PR ranks are based on the degree of a web service – the number of services a web service can “feed”, as a normalised percentage. RDR shows the part of the ADR that belongs in the semantic data type category specified in the request. ADR and RDR are both simple and “light” estimates of how important a service is, since they can be calculated directly, without requiring global knowledge of the service network. PR (similar to the PageRank algorithm used by the Google search engine) of course is global by nature, making it “heavier” to estimate, but also richer in informational value.

Hubs-Authorities – based Ranks: HAR and HITS both examine the relation between the number of services that link to a specific service (in-degree) and the number of services that service links to (out-degree). This is important, since there are web services and semantic data types whose one degree type tends to be much higher than the other.

Non-Functional Ranks: A number of the presented service ranks focus on the non-functional aspects of service composition. For instance, QR (which is calculated with regard to a specific Quality of Service – QoS  attribute) has the form of a percentage ranging from 0 to 1: this rank examines the specified QoS value of the services a service links to, estimates an average, and places it on a normalized [0,…,1] scale compared to the range of the QoS values found within the service network. Such a rank can be extremely useful when a service composition request explicitly declares QoS restrictions. Also, the FR and WR ranks examine how many alternative routes exist between two web services, which can be useful when we are interested in the reliability of a workflow solution (i.e. if a part of the solution is unavailable, will the solution fail?).

Non-Connectivity Ranks: Even though most ranks presented here are related on connectivity aspects of the web service networks, this does not always have to be the case. A useful rank that is not related to connectivity degrees is the SSR (semantic similarity rank): this rank evaluates how semantically related the data provided by two web services are. For this purpose, we assume semantic data items are defined in some form of classification/ontology – SSR is estimated as the graph distance in the ontology graph, between the data types in question.

Notes

1 Web ranking could in theory be applied to any web-based data source, provided that sufficient metadata exists that semantically describes the source’s informational content. The use of compatible semantic metadata models (eg. ontologies), or the automatic inference of metadata information from data sources based on specific heuristics is out of the scope of the current presentation, and will be discussed in a different posting.

2 Semantic similarity refers to how similar two data types are, when viewed as part of an overall semantic metadata model (ontology). For instance, it is logical to assume that web service input parameters ProductCode and ProductQuantity (accepted as input on a fictional transactional web service), are not very similar to parameters Longitude and Latitude (utilized by a fictional weather web service). How semantic similarity is algorithmically measured is out of the scope of this presentation.

Bibliography

[1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW Conference,
volume 7, 1998.

[2] J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999. Also appears as IBM Research Report RJ 10076, May 1997. http://www.cs.cornell.edu/home/kleinber/auth.ps.

[3] http://en.wikipedia.org/wiki/Social_network

[4] http://en.wikipedia.org/wiki/Network_theory

[5] John Gekas, Web Service Ranking in Service Networks, 3rd European Semantic Web Conference (ESWC ’06), June 11-14 2006, Budva, Montenegro.

[6] John Gekas, Maria Fasli: Automatic Web Service Composition Based on Graph Network Analysis Metrics. OTM Conferences (2) 2005: 1571-1587.

[7] Silvestri F., Puppin D., Laforenza D., Orlando S.: Toward a Search Engine for Software Components. IEEE Web Intelligence, Beijing, China, September 20-24, 2004.