Google, Twitter and Facebook build the semantic web

来源:百度文库 编辑:神马文学网 时间:2024/03/29 19:53:58
02 August 2010 by Jim Giles
Magazine issue 2771. Subscribe and save
A TRULY meaningful way of interacting with the web may finally be here, and it is called the semantic web. The idea was proposed over a decade ago by Tim Berners-Lee, among others. Now a triumvirate of internet heavyweights - Google, Twitter and Facebook - are making it real.
The defining characteristic of the semantic web is that information should be stored in a machine-readable format. Crucially, that would allow computers to handle information in ways we would find more useful, because they would be processing the concepts within documents rather than just the documents themselves.
Imagine bookmarking a story about Barack Obama: your computer will store the URL, but it has no way of knowing whether the content relates to politics or, say, cookery. If, however, each web page were to be tagged with information about its content, we can ask the web questions and expect sensible answers.
It is a wildly attractive idea but there have been few practical examples. That's about to change.
Google's acquisition this month of Metaweb, a San Francisco-based semantic search company is a step in the right direction. Metaweb owns Freebase, which is an open-source database. Why would Google want Freebase? Partly because it contains information on more than 12 million web "entities", from people to scientific theories. But mostly because of the way in which Freebase accumulates its knowledge - it is almost as if a person were doing it, making links between pieces of information in a way that makes sense to them.
Freebase entries, culled from sources such as Wikipedia, are tagged so that computers can understand what each is about and link them together. Freebase lists, for example, that one entry for "Chicago" is about a city and another describes the hit musical. Entries are also linked to other relevant entries, such as other towns or shows.
Freebase's tags and links will help Google develop smarter searches. For example, you may be able to request a list of "colleges on the US west coast with tuition fees under $30,000", or "actors over 40 who have won at least one Oscar". So Jack Menzel, Google's director of product management, wrote in ablog post.
With smart searches, you can ask for a list of colleges on the US west coast with fees under $30,000
Google isn't alone. Recently details emerged of Twitter's "annotations", a system that allows tweets to be tagged with information that will not appear in the message but can be read by computers. A tweet about a film, for example, might let you link straight to a movie trailer or the Amazon page for its DVD. A test version may be launched this summer.
Meanwhile, Facebook's changes to its Open Graph protocol also have a semantic element. The protocol allows web developers whose sites are devoted to specific topics, such as a restaurant, to add tags and a "like" button to their site. The tags tell Facebook's servers what the page is about - perhaps including the restaurant location - and when one of its users clicks the button, a link is established between that site and their Facebook profile.
The moves by Facebook and Twitter could change the very nature of how we interact with the web. Software writers will be able to build applications that search for bars and restaurants your Facebook friends have enjoyed, or movies and books your Twitter contacts say were over-hyped. Facebook's involvement should help overcome one of the biggest hurdles faced by the semantic web - persuading website owners to tag their content (see "Solving the chicken-and-egg problem").
Joshua Shinavier, a PhD student at Rensselaer Polytechnic Institute in Troy, New York, has developed an application that runs searches of tweets using the location data they contain.
Shinavier's software, which he plans to release next week, uses the websiteGeonames to convert the latitude and longitude information in the tags into place names. It then looks up those places in DBpedia, a version of Wikipedia built along similar lines to Freebase. The combination of DBpedia and Geonames will make it possible to search for all tweets made from specific types of places, such as college towns or coastal regions.
While users may find that the semantic web can help them get to grips with some complex questions, its main attraction may be for advertisers. "The whole play is about advertising," says Alex Iskold of Adaptive Blue, a New York-based start-up that focuses on semantic technologies. "Better data will mean better ads."
So advertisers may seize on the capabilities promised by tools like Shinavier's to probe consumer tastes in specific regions. Facebook's semantic tags will also appeal to advertisers, who can use them to explore the connections between users and interests.
Berners-Lee's vision may finally be here, but it comes with something he did not ask for - adverts finely tailored to our likes and dislikes. And those of our friends.
Solving the chicken-and-egg problem
You could argue that the semantic web is a classic example of the chicken-and-egg problem. The only way to create a web that's intuitive for users and where the pages are comprehensible to computers, too, is for web pages to be tagged. But without tags on web pages, there is no incentive to build applications that can use them. And without the apps in place there is no reason to tag websites.
Facebook is working on it: websites that include the social networking site's "like" button and appropriate tags now get links from Facebook pages. So powerful is Facebook that many other sites are expected to provide the appropriate tags. In much the same way that web developers have tweaked sites to improve their Google ranking, playing along with Facebook should improve their visibility.
"This is why we're all so excited," says Alex Iskold of Adaptive Blue, a New York-based start-up that focuses on semantic technologies. "The incentive problem has been solved."
Facebook's like button doesn't solve the incentive problem completely, though. If you can find a way to attach tags to users' blogs and tweets you have a much richer source of data.
One of Adaptive Blue's products, an entertainment recommendation system called GetGlue, may help. Instead of forcing users to generate tweet tags manually, websites can use GetGlue to automatically produce tags based on URLs contained in tweets. If a message contains a link to movie bible IMDb's page for Inception, for example, GetGlue will tag the message appropriately.