XO Communications' Blog

Semantic Web - From Data Silos to “Web of Data”?

[ 0 ] January 11, 2013 | By

Imagine a world where the documents you retrieve from web sites have no links to any other document in any site. In terms of data on the Web today, this is where we are. Data on different sites are like silos and are not linked. The way we make sense is by retrieving data from different sites, understanding the context and analyzing them to make conclusions. A person can assert that the word “Mustang” that shows up in an automobile database is not the same as the word “Mustang” retrieved from an animal kingdom database. Today, the search generally does not attach meanings to the word but yields a set of links to other sites where additional information can be found. The problem with the majority of data currently on the Web is that it’s in this form and it’s difficult to use on a large scale because there’s no global system for publishing data in such a way as it can be easily processed by anyone.

The vision of Semantic Web (SW) was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. In his 1998 paper on Semantic Web Roadmap Tim Berners-Lee articulated his vision as “The Semantic Web is a web of data, in some ways like a global database “. He describes the rationale for developing as “The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well-defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web”.

SW is essentially a framework to link metadata (data about data) of data stored in disparate databases on the Web so that it will allow machines to query these databases and yield enriched results. SW is an extension of the existing World Wide Web. It provides a standardized way of expressing the relationships between web pages, to allow machines to understand the meaning of hyperlinked information. With SW enabling linking of various databases, it would be possible for machines to find information and relations that would not be available from any single database. Instead of data being in web silos, we will have a layer where the data is stored, a layer which maps and abstracts and a layer for Web Data applications. Work on SW is done in W3C and “is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners.” The activity is carried out in several W3C groups”.

How

A segment of the Semantic Web pertaining to Yo-Yo Ma. Image courtesy of WWW2003

Resource Description Framework (RDF) was proposed as the underpinning for linking. RDF Triple is a labeled connection between two resources. The triple is formed by Subject, Predicate / Attribute / Property and Object, each of these is a Uniform Resource Identifier (URI). This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. Here is an example in a paper titled “Semantic Search” of a segment of Semantic Web pertaining to Yo-Yo Ma, the renowned cellist.

Besides RDF, there are other triples like Turtle, N3 and Json for describing the semantic content of pages. To be able to send over the network and receive to reconstruct, serialization solutions are needed and XML was thought to be one.

To make it easy for existing sites to integrate RDF-based data, W3C developed RDFa (attribute extensions to existing markup). Instead of having a separate section in the document for the structure another method is to embed the structured data directly into HTML pages. RDFa is a set of extensions to (X)HTML being proposed by W3C that allows RDF to be encoded within an (X) HTML page

Other candidates, Microformat (Not a new W3C standard but uses existing standards) and Microdata for HTML5 (W3C standard) define or use a set of attributes to augment presentation-oriented (X) HTML documents with structured data.

Reality

While the vision is for linking web databases globally, what is happening in real life? SW vision has been slow to be realized. However, some of the aspects of SW are being included in their products by industry leaders.

In 2010, W3C expressed great satisfaction with Facebook and Best Buy adopting RDFa.

In 2010, Google acquired Metaweb, maker of Freebase and a leader in Semantic Web. A Wall Street Journal article on 14 March 2012 stated that “Google is undergoing a major, long-term overhaul of its search-engine, using what’s called Semantic Web search to enhance the current system in the coming years. The move, starting over the next few months, will impact the way people can use the search engine as well as how the search engine examines sites across the Web before ranking them in search results”. People familiar with the initiative say that Google users will be able to browse through the company’s upcoming “Knowledge Graph,” or its ever-expanding database of information about “entities”—people, places and things—the “attributes” of those entities and how different entities are connected to one another.

In June 2011, Google, Microsoft and Yahoo! proposed a common markup vocabulary, Schema.org, based on the Microdata format, simplifying the job of webmasters who want to give meaning to their web pages content. Google says they chose Microdata because they wanted to “focus on just one format” and “a single format will improve consistency across search engines relying on the data”. The benefit for webmasters is the fact that one markup will work both with Google and Bing, they not having to provide separate markup for different search engines. It is expected the other search companies to adapt their engines to support Schema.org.

In May 2012, Google introduced their Knowledge Graph. In the official Google blog post it appeared as “Introducing the Knowledge Graph: things, not strings”. “For more than four decades, search has essentially been about matching keywords to queries”. “We’re proud of our first baby step—the Knowledge Graph—which will enable us to make search more intelligent, moving us closer to the “Star Trek computer”. “The new Knowledge Graph project, rolling out to English-language Google Search users over the next few days, provides more data snippets alongside its query results than the search engine currently provides. The results are based on Google’s new database of 500 million people, places, and things”, says Jack Menzel, Product Management Director of Search at Google. Menzel says there are 3.5 billion attributes and connections between these things in the database. Knowledge Graph provides data to users without requiring them to go to the sites that the data may come from. Google has expanded Knowledge Graph to Italian, French, Japanese and Russian.

Facebook’s Open Graph Protocol is built on W3C SW standards RDF and RDFa. Facebook, like Google and Yahoo, can consume RDFa. In 2008 Microsoft acquired semantic search startup Powerset. Bing has access to semantically indexed Wikipedia content which is used to deliver special types of search results for faster answers. It is reported to be developing an entity engine that attaches relevant terms and relationships to objects, and then tags relevant Web pages with the object ID, a Web concept. Some SW are surprised that leading companies use SW principles in their newly developed products and enhancements but don’t acknowledge they use SW technology.

Internet giants such as Google, Yahoo, Facebook and Microsoft who have already collected a large volume of data are implementing semantic search technologies and semantic web crawlers. Instead of returning a list of web pages by using brute force, they are linking their databases and injecting semantics into the searches so that the query will yield the information the searcher intended to get, instead of sending the searcher to go to a string of sites to find the real information they were looking for.

Major web search engines like Google and Bing incorporate some elements of semantic search. Guha et al. distinguish two major forms of search: navigational and research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic search is not applicable to navigational searches. In research search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that they are trying to get to. Rather, the user is trying to locate a number of documents which together will give them the information they are trying to find. Semantic search lends itself well with this approach that is closely related with exploratory search. Rather than using ranking algorithms such as Google’s PageRank to predict relevancy, semantic search uses semantics, or the science of meaning in language, to produce highly relevant search results. In most cases, the goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.

Semantic Web and Big Data

Experts see the intersection of SW and Big Data soon – both dealing with data (structured, unstructured) and relationships between them to elicit insights. The article in Semanticweb.com titled “Big Data & Semantic Web: An Ideal Marriage” states that the rise of Big Data could help spur the adoption of Semantic Web technologies. Yarc Data, a Cray company specializing in Big Data graph-analytics, announced that they will join W3C to promote SW technologies.

What do you think is the future for the realization of Tim Berners-Lee’s vision of Semantic Web?

Please Like, tweet and +1 this post using the buttons above left. You can also subscribe to The Pulse by email or RSS.

Free White Paper:

Why SIP Makes Sense

Business SIP, like VoIP services, is rapidly becoming a when rather than an if question in today's enterprise. This white paper provides an overview of the Session Initiation Protocol (SIP) standard, its drivers, benefits, and barriers to implementation.

Download Now

Tags: ,

Category : Industry Trends

About Ramani Pandurangan: I've spent the last ten years overseeing the architecture, design, testing and implementation of voice over IP (VoIP) technologies and platforms across the XO network. My role includes support of the company's award-winning business and wholesale VoIP services, and the transformation of the XO voice network from a circuit switched environment to one based on VoIP and softswitch platforms. I'm an active speaker and panelist in the telecommunications and networking event circuit and participate in several standards bodies including ATS and ITU. I graduated from the University of Waterloo with a Master's degree in Computer Science. I also hold a Master's in Electrical Engineering from the University of New Brunswick and an MBA from McGill University. I enjoy tennis, hockey (Montreal Canadians fan), music, and Star Trek. View author profile.

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.