The Semantic Web - eng version - 刀目村的专栏 - CSDNBlog

来源:百度文库 编辑:神马文学网 时间:2024/04/28 04:57:52
May 17, 2001
The Semantic Web
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
By Tim Berners-Lee, James Hendler and Ora Lassila
Theentertainment system was belting out the Beatles' "We Can Work It Out"when the phone rang. When Pete answered, his phone turned the sounddown by sending a message to all the other local devices that had avolume control. His sister, Lucy, was on the line from the doctor'soffice: "Mom needs to see a specialist and then has to have a series ofphysical therapy sessions. Biweekly or something. I'm going to have myagent set up the appointments." Pete immediately agreed to share thechauffeuring.
BY MIGUEL SALMERON
At the doctor's office,Lucy instructed her Semantic Web agent through her handheld Webbrowser. The agent promptly retrieved information about Mom'sprescribed treatment from the doctor's agent, looked up several listsof providers, and checked for the ones in-plan for Mom's insurancewithin a 20-mile radius of her home and with a rating of excellent orvery good on trusted rating services. It then began trying to find amatch between available appointment times (supplied by the agents ofindividual providers through their Web sites) and Pete's and Lucy'sbusy schedules. (The emphasized keywords indicate terms whosesemantics, or meaning, were defined for the agent through the SemanticWeb.)
In a few minutes the agent presented them with a plan. Pete didn'tlike it桿niversity Hospital was all the way across town from Mom'splace, and he'd be driving back in the middle of rush hour. He set hisown agent to redo the search with stricter preferences about locationand time. Lucy's agent, having complete trust in Pete's agent in thecontext of the present task, automatically assisted by supplying accesscertificates and shortcuts to the data it had already sorted through.
Almost instantly the new plan was presented: a much closer clinicand earlier times梑ut there were two warning notes. First, Pete wouldhave to reschedule a couple of his less important appointments. Hechecked what they were梟ot a problem. The other was something about theinsurance company's list failing to include this provider underphysical therapists: "Service type and insurance plan status securelyverified by other means," the agent reassured him. "(Details?)"
Lucy registered her assent at about the same moment Pete wasmuttering, "Spare me the details," and it was all set. (Of course, Petecouldn't resist the details and later that night had his agent explainhow it had found that provider even though it wasn't on the properlist.)
Expressing Meaning
Pete and Lucy could use their agents to carryout all these tasks thanks not to the World Wide Web of today butrather the Semantic Web that it will evolve into tomorrow. Most of theWeb's content today is designed for humans to read, not for computerprograms to manipulate meaningfully. Computers can adeptly parse Webpages for layout and routine processing梙ere a header, there a link toanother page梑ut in general, computers have no reliable way to processthe semantics: this is the home page of the Hartman and Strauss PhysioClinic, this link goes to Dr. Hartman's curriculum vitae.
TheSemantic Web will bring structure to the meaningful content of Webpages, creating an environment where software agents roaming from pageto page can readily carry out sophisticated tasks for users. Such anagent coming to the clinic's Web page will know not just that the pagehas keywords such as "treatment, medicine, physical, therapy" (as mightbe encoded today) but also that Dr. Hartman works at this clinic onMondays, Wednesdays and Fridays and that the script takes a date rangein yyyy-mm-dd format and returns appointment times. And it will "know"all this without needing artificial intelligence on the scale of 2001'sHal or Star Wars's C-3PO. Instead these semantics were encoded into theWeb page when the clinic's office manager (who never took Comp Sci 101)massaged it into shape using off-the-shelf software for writingSemantic Web pages along with resources listed on the Physical TherapyAssociation's site.
ADVERTISEMENT
The Semantic Web is not a separate Web but an extension of thecurrent one, in which information is given well-defined meaning, betterenabling computers and people to work in cooperation. The first stepsin weaving the Semantic Web into the structure of the existing Web arealready under way. In the near future, these developments will usher insignificant new functionality as machines become much better able toprocess and "understand" the data that they merely display at present.
The essential property of the World Wide Web is its universality.The power of a hypertext link is that "anything can link to anything."Web technology, therefore, must not discriminate between the scribbleddraft and the polished performance, between commercial and academicinformation, or among cultures, languages, media and so on. Informationvaries along many axes. One of these is the difference betweeninformation produced primarily for human consumption and that producedmainly for machines. At one end of the scale we have everything fromthe five-second TV commercial to poetry. At the other end we havedatabases, programs and sensor output. To date, the Web has developedmost rapidly as a medium of documents for people rather than for dataand information that can be processed automatically. The Semantic Webaims to make up for this.
Like the Internet, the Semantic Web will be as decentralized aspossible. Such Web-like systems generate a lot of excitement at everylevel, from major corporation to individual user, and provide benefitsthat are hard or impossible to predict in advance. Decentralizationrequires compromises: the Web had to throw away the ideal of totalconsistency of all of its interconnections, ushering in the infamousmessage "Error 404: Not Found" but allowing unchecked exponentialgrowth.
Knowledge Representation
For the semantic web to function,computers must have access to structured collections of information andsets of inference rules that they can use to conduct automatedreasoning. Artificial-intelligence researchers have studied suchsystems since long before the Web was developed. Knowledgerepresentation, as this technology is often called, is currently in astate comparable to that of hypertext before the advent of the Web: itis clearly a good idea, and some very nice demonstrations exist, but ithas not yet changed the world. It contains the seeds of importantapplications, but to realize its full potential it must be linked intoa single global system.
BY MIGUEL SALMERON
WEB SEARCHES TODAY
Traditional knowledge-representationsystems typically have been centralized, requiring everyone to shareexactly the same definition of common concepts such as "parent" or"vehicle." But central control is stifling, and increasing the size andscope of such a system rapidly becomes unmanageable.
Moreover, these systems usually carefully limit the questions thatcan be asked so that the computer can answer reliably?or answer at all.The problem is reminiscent of G鰀el's theorem from mathematics: anysystem that is complex enough to be useful also encompassesunanswerable questions, much like sophisticated versions of the basicparadox "This sentence is false." To avoid such problems, traditionalknowledge-representation systems generally each had their own narrowand idiosyncratic set of rules for making inferences about their data.For example, a genealogy system, acting on a database of family trees,might include the rule "a wife of an uncle is an aunt." Even if thedata could be transferred from one system to another, the rules,existing in a completely different form, usually could not.
Semantic Web researchers, in contrast, accept that paradoxes andunanswerable questions are a price that must be paid to achieveversatility. We make the language for the rules as expressive as neededto allow the Web to reason as widely as desired. This philosophy issimilar to that of the conventional Web: early in the Web'sdevelopment, detractors pointed out that it could never be awell-organized library; without a central database and tree structure,one would never be sure of finding everything. They were right. But theexpressive power of the system made vast amounts of informationavailable, and search engines (which would have seemed quiteimpractical a decade ago) now produce remarkably complete indices of alot of the material out there. The challenge of the Semantic Web,therefore, is to provide a language that expresses both data and rulesfor reasoning about the data and that allows rules from any existingknowledge-representation system to be exported onto the Web.
Adding logic to the Web梩he means to use rules to make inferences,choose courses of action and answer questions梚s the task before theSemantic Web community at the moment. A mixture of mathematical andengineering decisions complicate this task. The logic must be powerfulenough to describe complex properties of objects but not so powerfulthat agents can be tricked by being asked to consider a paradox.Fortunately, a large majority of the information we want to express isalong the lines of "a hex-head bolt is a type of machine bolt," whichis readily written in existing languages with a little extravocabulary.
Two important technologies for developing the Semantic Web arealready in place: eXtensible Markup Language (XML) and the ResourceDescription Framework (RDF). XML lets everyone create their owntags梙idden labels such as or that annotate Web pages or sections oftext on a page. Scripts, or programs, can make use of these tags insophisticated ways, but the script writer has to know what the pagewriter uses each tag for. In short, XML allows users to add arbitrarystructure to their documents but says nothing about what the structuresmean.
--------------------------------------------------------------------------------
The Semantic Web will enable machines to COMPREHEND semantic documents and data, not human speech and writings.
--------------------------------------------------------------------------------
Meaningis expressed by RDF, which encodes it in sets of triples, each triplebeing rather like the subject, verb and object of an elementarysentence. These triples can be written using XML tags. In RDF, adocument makes assertions that particular things (people, Web pages orwhatever) have properties (such as "is a sister of," "is the authorof") with certain values (another person, another Web page). Thisstructure turns out to be a natural way to describe the vast majorityof the data processed by machines. Subject and object are eachidentified by a Universal Resource Identifier (URI), just as used in alink on a Web page. (URLs, Uniform Resource Locators, are the mostcommon type of URI.) The verbs are also identified by URIs, whichenables anyone to define a new concept, a new verb, just by defining aURI for it somewhere on the Web.
Human language thrives when using the same term to mean somewhatdifferent things, but automation does not. Imagine that I hire a clownmessenger service to deliver balloons to my customers on theirbirthdays. Unfortunately, the service transfers the addresses from mydatabase to its database, not knowing that the "addresses" in mine arewhere bills are sent and that many of them are post office boxes. Myhired clowns end up entertaining a number of postal workers梟otnecessarily a bad thing but certainly not the intended effect. Using adifferent URI for each specific concept solves that problem. An addressthat is a mailing address can be distinguished from one that is astreet address, and both can be distinguished from an address that is aspeech.
The triples of RDF form webs of information about related things.Because RDF uses URIs to encode this information in a document, theURIs ensure that concepts are not just words in a document but are tiedto a unique definition that everyone can find on the Web. For example,imagine that we have access to a variety of databases with informationabout people, including their addresses. If we want to find peopleliving in a specific zip code, we need to know which fields in eachdatabase represent names and which represent zip codes. RDF can specifythat "(field 5 in database A) (is a field of type) (zip code)," usingURIs rather than phrases for each term.
Ontologies
Of course, this is not the end of the story, becausetwo databases may use different identifiers for what is in fact thesame concept, such as zip code. A program that wants to compare orcombine information across the two databases has to know that these twoterms are being used to mean the same thing. Ideally, the program musthave a way to discover such common meanings for whatever databases itencounters.
A solution to this problem is provided by the third basic componentof the Semantic Web, collections of information called ontologies. Inphilosophy, an ontology is a theory about the nature of existence, ofwhat types of things exist; ontology as a discipline studies suchtheories. Artificial-intelligence and Web researchers have co-opted theterm for their own jargon, and for them an ontology is a document orfile that formally defines the relations among terms. The most typicalkind of ontology for the Web has a taxonomy and a set of inferencerules.
The taxonomy defines classes of objects and relations among them.For example, an address may be defined as a type of location, and citycodes may be defined to apply only to locations, and so on. Classes,subclasses and relations among entities are a very powerful tool forWeb use. We can express a large number of relations among entities byassigning properties to classes and allowing subclasses to inherit suchproperties. If city codes must be of type city and cities generallyhave Web sites, we can discuss the Web site associated with a city codeeven if no database links a city code directly to a Web site.
Inference rules in ontologies supply further power. An ontology mayexpress the rule "If a city code is associated with a state code, andan address uses that city code, then that address has the associatedstate code." A program could then readily deduce, for instance, that aCornell University address, being in Ithaca, must be in New York State,which is in the U.S., and therefore should be formatted to U.S.standards. The computer doesn't truly "understand" any of thisinformation, but it can now manipulate the terms much more effectivelyin ways that are useful and meaningful to the human user.
With ontology pages on the Web, solutions to terminology (and other)problems begin to emerge. The meaning of terms or XML codes used on aWeb page can be defined by pointers from the page to an ontology. Ofcourse, the same problems as before now arise if I point to an ontologythat defines addresses as containing a zip code and you point to onethat uses postal code. This kind of confusion can be resolved ifontologies (or other Web services) provide equivalence relations: oneor both of our ontologies may contain the information that my zip codeis equivalent to your postal code.
Our scheme for sending in the clowns to entertain my customers ispartially solved when the two databases point to different definitionsof address. The program, using distinct URIs for different concepts ofaddress, will not confuse them and in fact will need to discover thatthe concepts are related at all. The program could then use a servicethat takes a list of postal addresses (defined in the first ontology)and converts it into a list of physical addresses (the second ontology)by recognizing and removing post office boxes and other unsuitableaddresses. The structure and semantics provided by ontologies make iteasier for an entrepreneur to provide such a service and can make itsuse completely transparent.
Ontologies can enhance the functioning of the Web in many ways. Theycan be used in a simple fashion to improve the accuracy of Websearches梩he search program can look for only those pages that refer toa precise concept instead of all the ones using ambiguous keywords.More advanced applications will use ontologies to relate theinformation on a page to the associated knowledge structures andinference rules. An example of a page marked up for such use is onlineathttp://www.cs.umd.edu/~hendler.If you send your Web browser to that page, you will see the normal Webpage entitled "Dr. James A. Hendler." As a human, you can readily findthe link to a short biographical note and read there that Hendlerreceived his Ph.D. from Brown University. A computer program trying tofind such information, however, would have to be very complex to guessthat this information might be in a biography and to understand theEnglish language used there.
For computers, the page is linked to an ontology page that definesinformation about computer science departments. For instance,professors work at universities and they generally have doctorates.Further markup on the page (not displayed by the typical Web browser)uses the ontology's concepts to specify that Hendler received his Ph.D.from the entity described at the URIhttp://www.brown.edu ?the Web page for Brown. Computers can also find that Hendleris a member of a particular research project, has a particular e-mailaddress, and so on. All that information is readily processed by acomputer and could be used to answer queries (such as where Dr. Hendlerreceived his degree) that currently would require a human to siftthrough the content of various pages turned up by a search engine.
In addition, this markup makes it much easier to develop programsthat can tackle complicated questions whose answers do not reside on asingle Web page. Suppose you wish to find the Ms. Cook you met at atrade conference last year. You don't remember her first name, but youremember that she worked for one of your clients and that her son was astudent at your alma mater. An intelligent search program can siftthrough all the pages of people whose name is "Cook" (sidestepping allthe pages relating to cooks, cooking, the Cook Islands and so forth),find the ones that mention working for a company that's on your list ofclients and follow links to Web pages of their children to track downif any are in school at the right place.
Agents
BY MIGUEL SALMERON
Another vital feature will be digital signatures, which areencrypted blocks of data that computers and agents can use to verifythat the attached information has been provided by a specific trustedsource. You want to be quite sure that a statement sent to youraccounting program that you owe money to an online retailer is not aforgery generated by the computer-savvy teenager next door. Agentsshould be skeptical of assertions that they read on the Semantic Webuntil they have checked the sources of information. (We wish morepeople would learn to do this on the Web as it is!)
Many automatedWeb-based services already exist without semantics, but other programssuch as agents have no way to locate one that will perform a specificfunction. This process, called service discovery, can happen only whenthere is a common language to describe a service in a way that letsother agents "understand" both the function offered and how to takeadvantage of it. Services and agents can advertise their function by,for example, depositing such descriptions in directories analogous tothe Yellow Pages.
Some low-level service-discovery schemes are currently available,such as Microsoft's Universal Plug and Play, which focuses onconnecting different types of devices, and Sun Microsystems's Jini,which aims to connect services. These initiatives, however, attack theproblem at a structural or syntactic level and rely heavily onstandardization of a predetermined set of functionality descriptions.Standardization can only go so far, because we can't anticipate allpossible future needs.
--------------------------------------------------------------------------------
Properly designed, the Semantic Web can assist the evolution of human knowledge as a whole.
--------------------------------------------------------------------------------
TheSemantic Web, in contrast, is more flexible. The consumer and produceragents can reach a shared understanding by exchanging ontologies, whichprovide the vocabulary needed for discussion. Agents can even"bootstrap" new reasoning capabilities when they discover newontologies. Semantics also makes it easier to take advantage of aservice that only partially matches a request.
A typical process will involve the creation of a "value chain" inwhich subassemblies of information are passed from one agent toanother, each one "adding value," to construct the final productrequested by the end user. Make no mistake: to create complicated valuechains automatically on demand, some agents will exploitartificial-intelligence technologies in addition to the Semantic Web.But the Semantic Web will provide the foundations and the framework tomake such technologies more feasible.
Putting all these features together results in the abilitiesexhibited by Pete's and Lucy's agents in the scenario that opened thisarticle. Their agents would have delegated the task in piecemealfashion to other services and agents discovered through serviceadvertisements. For example, they could have used a trusted service totake a list of providers and determine which of them are in-plan for aspecified insurance plan and course of treatment. The list of providerswould have been supplied by another search service, et cetera. Theseactivities formed chains in which a large amount of data distributedacross the Web (and almost worthless in that form) was progressivelyreduced to the small amount of data of high value to Pete and Lucy梐plan of appointments to fit their schedules and other requirements.
In the next step, the Semantic Web will break out of the virtualrealm and extend into our physical world. URIs can point to anything,including physical entities, which means we can use the RDF language todescribe devices such as cell phones and TVs. Such devices canadvertise their functionality梬hat they can do and how they arecontrolled梞uch like software agents. Being much more flexible thanlow-level schemes such as Universal Plug and Play, such a semanticapproach opens up a world of exciting possibilities.
For instance, what today is called home automation requires carefulconfiguration for appliances to work together. Semantic descriptions ofdevice capabilities and functionality will let us achieve suchautomation with minimal human intervention. A trivial example occurswhen Pete answers his phone and the stereo sound is turned down.Instead of having to program each specific appliance, he could programsuch a function once and for all to cover every local device thatadvertises having a volume control ?the TV, the DVD player and even themedia players on the laptop that he brought home from work this oneevening.
The first concrete steps have already been taken in this area, withwork on developing a standard for describing functional capabilities ofdevices (such as screen sizes) and user preferences. Built on RDF, thisstandard is called Composite Capability/Preference Profile (CC/PP).Initially it will let cell phones and other nonstandard Web clientsdescribe their characteristics so that Web content can be tailored forthem on the fly. Later, when we add the full versatility of languagesfor handling ontologies and logic, devices could automatically seek outand employ services and other devices for added information orfunctionality. It is not hard to imagine your Web-enabled microwaveoven consulting the frozen-food manufacturer's Web site for optimalcooking parameters. >
Evolution of Knowledge
BY MIGUEL SALMERON
ELABORATE, PRECISE
SEARCHES
The semantic web is not"merely" the tool for conducting individual tasks that we havediscussed so far. In addition, if properly designed, the Semantic Webcan assist the evolution of human knowledge as a whole.
Human endeavor is caught in an eternal tension between theeffectiveness of small groups acting independently and the need to meshwith the wider community. A small group can innovate rapidly andefficiently, but this produces a subculture whose concepts are notunderstood by others. Coordinating actions across a large group,however, is painfully slow and takes an enormous amount ofcommunication. The world works across the spectrum between theseextremes, with a tendency to start small梖rom the personal idea梐nd movetoward a wider understanding over time.
An essential process is the joining together of subcultures when awider common language is needed. Often two groups independently developvery similar concepts, and describing the relation between them bringsgreat benefits. Like a Finnish-English dictionary, or aweights-and-measures conversion table, the relations allowcommunication and collaboration even when the commonality of concepthas not (yet) led to a commonality of terms.
The Semantic Web, in naming every concept simply by a URI, letsanyone express new concepts that they invent with minimal effort. Itsunifying logical language will enable these concepts to beprogressively linked into a universal Web. This structure will open upthe knowledge and workings of humankind to meaningful analysis bysoftware agents, providing a new class of tools by which we can live,work and learn together.
--------------------------------------------------------------------------------
Further Information:
Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor.
Tim Berners-Lee, with Mark Fischetti. Harper San Francisco, 1999.
An enhanced version of this article is on the Scientific American Web site, with additional material and links.
World Wide Web Consortium (W3C):www.w3.org/
W3C Semantic Web Activity:www.w3.org/2001/sw/
An introduction to ontologies:www.SemanticWeb.org/knowmarkup.html
Simple HTML Ontology Extensions Frequently Asked Questions (SHOE FAQ):www.cs.umd.edu/projects/plus/SHOE/faq.html
DARPA Agent Markup Language (DAML) home page:www.daml.org/
© 1996-2006 Scientific American, Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.