The linguist‘s search engine

来源：百度文库编辑：神马文学网时间：2024/04/28 03:30:41

JULY 2004
THE LINGUIST扴 SEARCH ENGINE:
GETTING STARTED GUIDE
Philip Resnik and Aaron Elkiss
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742-3275
{resnik,aelkiss}@umd.edu
Abstract
The World Wide Web can be viewed as a naturally occurring resource
that embodies the rich and dynamic nature of language, a data
repository of unparalleled size and diversity. However, current Web
search methods are oriented more toward shallow information retrieval
techniques than toward the more sophisticated needs of linguists.
Using the Web in linguistic research is not easy.
It will, however, be getting easier. This report introduces the
Linguist‘s Search Engine, a new linguist-friendly tool that makes it
possible to retrieve naturally occurring sentences from the World Wide
Web on the basis of lexical content and syntactic structure. Its aim
is to help linguists of all stripes in conducting more thoroughly
empirical exploration of evidence, with particular attention to
variability and the role of context.
Keywords: Search engines, linguistics, parsing, corpora.
This research sponsored by the National Science Foundation under ITR IIS0113641.
The Linguist抯 Search Engine: Getting Started Guide
Philip Resnik1,2 and Aaron Elkiss2
1Department of Linguistics and
2Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
{resnik,aelkiss}@umd.edu
Introduction
A highly influential (some would say dominant) tradition in modern linguistics is built on the use of linguists‘ introspective judgments on sentences they have created. The judgment as grammatical or ungrammatical, the presentation of a minimal pair, whether or not a particular structure is felicitous given an intended interpretation ?these are very often the working materials of the linguist, the data that help to confirm or disconfirm hypotheses and lead to the acceptance, refinement, or rejection of theories.
Although naturally occurring sentences are currently accorded less emphasis by many linguists, the use of text corpora has a tradition in the greater linguistic enterprise (e.g., Oostdijk and de Hann, 1994). And with the emergence of the World Wide Web, we have before us a naturally occurring resource that embodies the rich and dynamic nature of language, a data repository of unparalleled size and diversity. Unfortunately, current Web search methods are oriented more toward shallow information retrieval techniques than toward the more sophisticated needs of linguists. Using the Web in linguistic research is not easy.
The tool introduced in this getting-started guide is designed to make it easier. The Linguist‘s Search Engine (LSE) is a new linguist-friendly facility that makes it possible to retrieve naturally occurring sentences from the World Wide Web on the basis of lexical content and syntactic structure. With the Linguist抯 Search Engine, it will be easier to take advantage of a huge body of naturally occurring evidence ?in effect, treating the Web as a searchable linguistically annotated corpus.
Why should this matter? As Sapir (1921) points out, 揂ll grammars leak.?SPAN style="mso-spacerun: yes"> Abney (1996) elaborates: 揫A]ttempting to eliminate unwanted readings . . . Is like squeezing a balloon: every dispreference that is turned into an absolute constraint to eliminate undesired structures has the unfortunate side effect of eliminating the desired structure for some other sentence.?SPAN style="mso-spacerun: yes"> Moreover, Chomsky (1972) remarks that "crucial evidence comes from marginal constructions; for the tests of analyses often come from pushing the syntax to its limits, seeing how constructions fare at the margins of acceptability.‘‘ It is not surprising, therefore, that judgments on crucial evidence may differ among individuals; as linguists we have all shared the experience of the student in the syntax talk who hears the speaker declare a crucial example ungrammatical, and whispers to his friend, ?I>Does that sound ok to you?SPAN style="mso-spacerun: yes"> The fact is, language is variable (again, Sapir, 1921) ?yet in the effort to make the study of language manageable, a dominant methodological choice has been to place variability and context outside the scope of investigation.
.
While there are certainly arguments to made for focusing theory development on accounting for observed generalizations, rather than trying to account for individual sentences (perforce including exceptions to generalizations) as data, an alternative to narrowing the scope of investigation is to make it easier to investigate a wider scope in interesting ways. A central goal of our work, therefore, is to help theory development to be informed by a more thoroughly empirical exploration of real-world observable evidence, an approach that explicitly acknowledges and explores the roles of variability and context, using naturally occurring examples in concert with constructed data and introspective judgments. In short, to make it easier for more linguists to do the things that some linguists already do with corpora.
Now, as noted above, using corpora in linguistics is not new, and certainly there are quite a few resources available to the determinedly corpus-minded linguist (and corpus-minded linguists using them). These include large data gathering and dissemination efforts (such as the British and American National Corpora, the Linguistic Data Consortium抯 Gigaword corpora, CHILDES, and many others), important and highly productive efforts to annotate naturally occurring language in linguistically relevant ways (from the Brown Corpus through the Penn Treebank and more recent annotation efforts such as PropBank and FrameNet), and tools designed to permit searches on linguistic criteria (ranging from concordancing tools such as Wordsmith, Scott 1999, to tree-based searches such as tgrep, and beyond to grammatical search facilities such as Gsearch, Corley et al. 2001). When it comes to exploiting linguistically rich annotations in large corpora for linguistic research, however, Manning (2003) describes the situation aptly, commenting, 搃t remains fair to say that these tools have not yet made the transition to the Ordinary Working Linguist without considerable computer skills.?/SPAN>
Getting Started with the LSE
The LSE is designed to be a tool for the Ordinary Working Linguist without considerable computer skills. As such, it was designed with the following criteria in mind:
Must minimize learning/ramp-up time
Must have a linguist-friendly 搇ook and feel?/SPAN>
Must permit real-time interaction
Must permit large-scale searches
Must allow search using linguistic criteria
Must be reliable
Must evolve with real use
The design and implementation of the LSE, guided by these desiderata, is a subject for another document. The subject of this document is the first criterion.   Since the LSE is a tool designed for hands-on exploration, we introduce it not by providing a detailed reference manual, but by providing a walk-through of some hands-on exploration.   This is organized as a series of steps for the user to try out himself or herself ?what to type, or click, or open, or close, accompanied by screen shots showing and explaining what will happen as a result.
Two words of caution. First, the LSE is a work in progress, and as such, parts of it are likely to evolve rapidly ?indeed, feedback from real users trying it out should play a critical role in its further development. This means that before too long, the screen shots or directions in this guide may be out of date. If the interface is well enough designed, a user starting with this guide should still be able to explore the LSE抯 various features, even if the screen details or the exact operations have changed somewhat. But the reader should be aware of the potential discrepancies.
Second, no tool can substitute for a researcher抯 judgment.   The LSE will, one hopes, make it easier to work with large quantities of naturally occurring data in ways that some linguists will care about.   But one must be aware of all the customary cautions that come to mind when working with naturally occurring data, or with any search engine, for that matter. Questions that must be asked include things like: Is the source of this example a native speaker of English? Am I looking at written language or transcribed speech? Are the data I抦 looking at providing an adequate (or adequately balanced, if that matters) sample of the language with respect to the phenomena I抦 investigating?   Is any particular 揾it?in a search really an example of the phenomenon I抦 looking for, or might it be a false positive?
Rather than ending with caution, though, let me end this introduction with encouragement. The LSE is a Field of Dreams endeavor, built on faith that 搃f you build it, they will come.?SPAN style="mso-spacerun: yes">   We抳e built it, or at least a first version of it. Will it turn out to be a useful tool for studying language? That抯 a question for the readers of this document: the community of users who will, we hope, find ways to employ the LSE with insight and creativity.
Acknowledgments
It抯 traditional to put acknowledgments at the conclusion of a document, but it is to be hoped that momentarily the reader will be having too much fun with the LSE to pay attention to details placed at the end.
The LSE is part of a collaboration between the author and Christiane Fellbaum of Princeton University on using the Web as a source of empirical data for linguistic research, sponsored by NSF ITR IIS0113641; this collaboration also includes Mari Broman Olsen of Microsoft.
The primary implementor for the LSE is Aaron Elkiss, with contributions by Jesse Metcalf-Burton, Girish Joshi, Mohammed ?SPAN class=SpellE>RafiKhan, Saurabh Khandelwal, and G. Craig Murray.   Critical tools underlying the LSE, without which this work would be unimaginable, include Adwait Ratnaparhki抯 MXTERMINATOR and MXPOST, Eugene Charniak抯 stochastic parser, Dekang Lin抯 Minipar parser (searches not currently available), Douglas Rohde抯 tgrep2, and a host of publicly available tools for construction of Web applications.
The author of this guide appreciates the early efforts and comments of the students in his spring 2003 lexical semantics seminar, which provided early feedback on a rather more preliminary version of the LSE.   I am also grateful for the inspiration and lucid argumentation of empirically minded linguists Steve Abney and Chris Manning, for stimulating discussions with Bob Frank, Mark Johnson, and Paul Smolensky, and especially for the staggeringly important work of George Miller, Mitch Marcus, and Brewster Kahle (and their many collaborators) in producing WordNet , the Penn Treebank, and the Internet Archive.
I抦 sure these acknowledgments are incomplete; apologies to anyone I抳e missed. Ditto for relevant bibliographic citations?all feedback is welcome.
First steps: Logging in and Query By Example
(For the impatient reader: focus on the instructions in bold face type.)
You access the LSE via your Web browser at http://lse.umiacs.umd.edu. Although a number of browsers should work, at the moment Internet Explorer 6 and Mozilla 1.5 are most likely to work well. Create an account using the Register link - fill out the form with your name, email address and desired username and password. After clicking Submit you should receive a confirmation page; an email will also be sent to the address you specified with an introductory message. You can then return to the LSE index page and click the Log In link. Use the username and password you specified when registering.
The first example we will work with is from the discussion of Pollard and Sag (1994) in Manning (2003). The following introspective judgments are given for complements of the verb consider, illustrating the claim that it cannot take as complements.
1(a) We consider Kim to be an acceptable candidate
(b) We consider Kim an acceptable candidate
(c) We consider Kim quite acceptable
(d) We consider Kim among the most acceptable candidates
(e) *We consider Kim as an acceptable candidate
(f) *We consider Kim as quite acceptable
(g) *We consider Kim as among the most acceptable candidates
(h) *We consider Kim as being among the most acceptable candidates
Do naturally occurring data support Pollard and Sag抯 judgment that 1(e) cannot be used to mean the same thing as 1(a)?
Once having logged in to the LSE, you will find yourself in the Query By Example (QBE) page. This is designed to make it easy for a linguist to say 揊ind me more examples like this one?without having to know the syntactic details underlying the LSE抯 annotations.   The LSE currently uses a rather 搗anilla?style of syntactic constituency annotation (of the Penn Treebank variety).
Type the sentence 揥e consider Kim as an acceptable candidate?into the Example Sentence space, and then click Parse. After a moment, you should see a parse tree for the sentence show up in the Tree Editor space.
Right-click on the VP node in the parse tree. This will bring up a menu of tree-editing operations. Select Remove all but subtree. You will see the tree display change so that only the VP subtree remains ?we抮e interested in sentences containing this VP structure but we don抰 care about what抯 in the subject position, or whether or not it抯 a matrix sentence.
Right-click on the NNP above Kim to bring up the same menu. This time, select Remove subtree. This will leave the NP dominated by VP, removing the unnecessary detail below ?we care that the VP have an NP argument, but not what that NP contains.
Right-click on the NP node with DT, JJ, and NN as children and bring up the menu again. This time select Remove All Children.
At this point, your tree should look like the tree in the screen above. You have specified that you want verb phrases headed by consider where the VP also dominates an NP and a PP headed by as.
Now click the Update Query button. This automatically (re-)generates a query based on the tree structure you have specified.
The resulting query is displayed in the Query area.   This is just a textual representation of the tree displayed in the tree editor. Your query searches for sentences with parses that contain your query as a subtree.
Advanced users can edit the query here or in the screen that follows. See the 揟ips, Hints, and Advanced Features?section for a detailed example.
Click Search to move from Query by Example to the main search interface.
The Query Interface
The Example Sentence, Tree Editor, and Query inputs have been collapsed and the Query Options tab button has been expanded. Let抯 look at this page from top to bottom.
The Load Query tab allows you to recall queries that you抳e saved using the Save Query tab. This can be useful for modifying previous queries, or for trying out a query on a new source of sentences. Leave this alone for the moment, since we want to execute the query just created via Query By Example.
In the tab currently selected are the display options for the results that have been returned at the bottom of the screen. The results can be displayed in standard format or in Keyword in Context (KWIC) format. With KWIC, you can select a keyword or phrase you selected will be highlighted in the sentences. You can click the Offensive Content Filter check box to apply a simple keyword-based filter that will suppress URLs and sentences likely to be offensive. Click Apply to make any changes take affect.
The Download Results button allows you to download the returned sentences to your computer.
The Webpage Source combo box allows you to choose what collection of sentences to look in. The default, the LSE Web Collection, is currently a collection of about three million sentences collected from Web pages that are stored on the Internet Archive (www.archive.org). This static resource is a useful starting point for exploration; a little later you抣l be shown how to create for yourself new collections of sentences from the Web that are likely to be of interest to you. Leave the source set to the LSE Web Collection for now.
Click on the Save Query tab. In the Description box at the bottom, type 揷onsider NP as NP?and then click Save Query. This saves the query with a readable description to retrieve it by. Click on Query Options and click on Search.
Scroll down to get the view below, showing the first six hits.
Looking at Results Returned by a Query
The screen above shows results of your query. Notice that the 揾its?are organized in standard search engine fashion, showing the number of matching sentences found, the URL of the page where each sentence was found, the sentence itself, navigation buttons to get to the next and previous twenty hits, etc.
With the exception of the first hit, most of these look like good counterexamples to the claim in (1e).
Click the Annotation link above hit number 4. This will bring you to a screen like this one.
Notice that this shows the previous and following sentence context and the constituency parse for the sentence. Scroll down to see a graphical reprsentation of the parse tree- clicking this or on Use this sentence for Query by Example will load the parse tree in the tree editor - the tree is also easier to view in the tree editor. For now, close this window to go back to the list of hits.
Next, click on the Archived link. This brings you to the Web page containing the sentence, as stored on the Internet Archive:
You can use your Web browser抯揊ind?function to find the sentence on the page. You can go back and click Current to see the current version of the page, which may have changed (and therefore may or may not still contain the sentence).
Note that this page doesn‘t appear to be written by a native English speaker. So this might not be the best example to use to refute the claim in (1e). Returning to the results page, there are many other examples that seem to refute the claim - results 7 and 8 are from a work by Darwin, for example.
Another Query by Example
Let抯 try another Query by Example. This time we抣l look for instances of a construction (Goldberg, 1995) ?in this case sentences containing things like 搕he ADJer the NP the ADJer the NP?   Go back to the Query by Example page (you can click on it on the navigation bar at the top or bottom of most LSE pages) and type in, as the example sentence, 揟he bigger the house the higher the price?/SPAN> (without the double quotes).    Then click Parse.
As an exercise, use the tree editing functionality to modify the parse tree so it looks like the tree on the screen below.
Remember, you right click on nodes to do things with them. You can also right click on the white space in the tree editor. Notice that the right-click menu includes Undo, which will undo your last operation if you make a mistake. You can also select Revert, or click the Cancel button at the bottom, to revert back to what the tree looked like before you started editing it. If you use the Add Node?/I> option, you抣l get a pop-up box in which to type the label of the new node you抮e adding.
When your tree looks like the tree above, click Update Query to re-generate the query pattern. You抣l have noticed that the parser really didn抰 know what to make of this construction. But that doesn抰 stop you from being able to edit the structure to generalize it (even if you don抰 know that JJR is the Penn Treebank symbol for comparative adjective), and it doesn抰 matter whether or not you agree with the structure as long as the resulting pattern can do a reasonable job of locating sentences with the same structure.
Click Proceed to Search.
Then enter the description 揟he ADJer the NP the ADJer the NP?/B> and click the Save Query button. Finally, click Submit Query.
This returns 13 results, but suppose we want more, or suppose none had been found at all? Three million sentences may seem like a large number to search, but it‘s tiny relative to the size of the entire Web. It抯 not surprising that any given construction might not appear frequently or at all in this particular random sample. What you really want to do is a Web-scale search, so that you can look for your structure on a non-random sample.
Building Your Own Collections
Let抯 use the LSE to do a large-scale Web search for instances of the 搕he ADJer the NP the ADJer the NP?construction. To start, go to the My Collections page and click on Add New Collection Definition.
The Collection space allows you to give a descriptive name to this collection.   Type 揷omparatives?into the collection box as illustrated above. In the Description area, type 揟he Xer the NP1 the Yer the NP2?/SPAN> this is a short prose description of the collection of sentences you抮e building from the Web.
The Add New Search area is the heart of the collection building process. The key idea is (a) to use the Altavista search engine to find pages that are likely to contain sentences of interest, and then (b) to automatically extract those sentences of interest into a searchable LSE collection.
The first step is done by entering a query into AltaVista Search Terms and using the Try Search on AltaVista to search for pages that are likely to contain sentences of interest. This can take a few iterations; for example, simply entering 揵igger smaller longer poorer卐tc.?as an Altavista query won抰 work ?it results in pages that contain word lists, rather than pages where those words are used in sentences.
Here抯 an illustration of how to refine your Altavista query. In the query box type 攖he bigger the?/SPAN>. Include the double quotes, which tells Altavista you抮e interested in these three words appearing next to each other. You抣l find that this gets you a lot of pages containing 搕he bigger the better? because it抯 such a common phrase. You can tell Altavista to exclude pages containing that phrase by adding a query term with a "AND NOT" in front of it. Type in this Altavista query:
the bigger the?AND NOT the bigger the better?/SPAN>
It says: get me pages containing 搕he bigger the?but not containing 搕he bigger the better?   Click the Try Query button and notice that the hits you get back are indeed pages containing the right sorts of phrases.
You抳e now told the LSE that it should automatically retrieve Web pages from Altavista using this query. The Max number of documents to retrieve defaults to 10, though you can select other numbers for your purposes.
Since the Web pages you retrieve will undoubtedly contain many (mostly) sentences you are not interested in, there needs to be some way to specify which sentences you are interested in. The box underneath the Altavista search terms allows you to specify a word or words that must appear in a sentence in order for it to be interesting. In the box saying I only want sentences that contain at least one of the following words, type 揵igger?(without quotes).
Now click Save Changes. Then click on My Collections and expand the comparatives collection. You抣l see that your collection now has a Search 1 with the parameters you抳e given it.
Add new searches to this collection description by clicking on edit and repeating the process above:
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Verify that your Altavista search retrieves the right sorts of pages using Try This Query
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Enter the Altavista search terms
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Enter the words that identify sentences of interest
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Choose the maximum number of documents to retrieve for this search
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Click the Save Changes button.
For example,
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Altavista search terms:                       攖he wealthier the?(include quotes)
SPAN style="FONT: 7pt ‘Times New Roman‘">                    I only want sentences?                    wealthier
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Maximum number of documents:      1000
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Click Save Changes
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Altavista search terms:                       攖he poorer the?(include quotes)
SPAN style="FONT: 7pt ‘Times New Roman‘">                    I only want sentences?                    poorer
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Maximum number of documents:      1000
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Click Save Changes
Once you have finished adding searches to your collection, click Start Annotating. This tells the LSE you are ready for it to begin building your collection.
Go back to the My Collections page. Notice that this collection now appears on your list of collections. In the lower right corner, the Status line shows the current status of a collection. Possible values include awaiting Start Annotating command, queued (i.e. waiting until the LSE annotator is free to work on it), downloading, processing, annotating, and finished.   Once the building and annotating process has started, sentences that are found are annotated as quickly as the LSE can get to them, given its available resources. Note that a collection is searchable as soon as it contains any annotated sentences, i.e. you don抰 have to wait for it to be complete.
At any point, you can click Edit for a collection ?for example, you can go back there to delete the collection, or to tell the LSE to stop annotating if the build is still in progress but you抳e already found everything you wanted.   You can even add a new search to extend a collection that already exists.
Using Your Collections
The amount of time it takes to build a collection can vary ?you can watch the My Collections list to see how things are progressing. It will show you how many sentences have been found so far that meet your criteria, and it will also show you how many of those have been linguistically annotated and are therefore now searchable.
The LSE rotates its efforts among the requests of its various users, so your collection building request will not need to wait in line behind all the other requests in order for it to get started. Currently, the LSE抯 scheduler places a high priority on quickly getting some sentences into each collection ?the first thousand ?SPAN style="mso-spacerun: yes"> so that you can very quickly start searching and discover changes you need to make. (To conserve resources, please use Delete Collection for collections you抳e decided not to use, and use the Stop Annotating button for collections once they抳e grown as large as you need them.) After the first thousand sentences, you may notice that your collection builds up more slowly if other users are also building collections at the same time. The scheduler also keeps track of which collections have not received any attention for a while, to make sure that each one gets its fair share.
Let抯 return to the search for 揟he ADJer the NP the ADJer the NP?constructions, using the collection you have built. (Remember, you can do this even before the collection is complete.)
Go back to the Query Options tab. Use the pull-down for Select a Source to pick My Collections: comparatives.
Now use the Select a Saved Query pull-down at the top of the screen to pick the query you saved before: 揟he ADJer the NP the ADJer the NP?/SPAN>. Notice that the LSE automatically fills in the query pattern for you.
Click Submit Query. Depending on how far your collection building has gotten, the results should look something like this:
Congratulations! You have just searched the entire Web (or at least the portion indexed by Altavista) using a structural search, and found some examples of the structure you were looking for.
Tips, Hints, and Advanced Features
The examples above have exercised all of the LSE抯 basic functionality as of this writing. Here are few things that may help make it more useful, based on our experience so far:
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Navigation bar. The navigation bar at the top and bottom of most screens makes it easy to jump back and forth between Query by Example, Query, and My Collections.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Search this Collection shortcut. When you抮e in My Collections, either in the collections list or in the detailed view of a particular collection, you can click Search this Collection to go to a version of the Query page where the collection information has already been filled in.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Tree editing hints. Unless you are particularly interested in your structure抯 occurring at the matrix level, the usual first step will be to right-click on the deepest relevant node and select Remove all but subtree. If you抮e looking at a verb-centered construction and you don抰 need a matrix sentence (and the sentential subject doesn抰 matter), it抯 usually better to keep just the VP rather than the whole S dominating it, since Treebank-style parses will occasionally used adjoined structures (VP dominating VP). On the same note, we recommend being more general rather than more specific where possible ?for example, unless you specifically need a particular NP-internal structure, we recommend keeping just the NP (as was done in the earlier examples) rather than, say, using a specification that requires a determiner. It抯 always easier to go from more general to more specific once you抳e seen what the data look like.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Excluding structure. There are a few interesting things you can do to the automatically generated queries that aren‘t reflected by the tree editor. Foremost among these is negation. The LSE抯 current Query By Example does not provide a way to say that a part of a structure should be absent rather than present; for example, the tree editor does not allow you to say that an NP should not contain an adjective, or that a VP should not have a PP as one of its children. One way to get this behavior is to type in an example sentence that includes the structure you don抰 want, generate the query automatically, and then modify it manually to negate the relevant piece of structure. For example, suppose you want cognate object constructions for the verb live where the direct object does not have an adjectival modifier (搇ived a/the/his/her life? but not 搇ived a quiet life?.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    In Query by Example, type 揌e lived a quiet life? click Parse, and edit the tree to keep just the VP. Use remove subtree to delete the DT (determiner) node, but keep the JJ (adjective) subtree. Click Update Query.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    In the query, scroll right, if necessary, so you can see the part of the pattern that specifies the object noun phrase:
(NP (JJ quiet) (NN ...))
This says that we want an NP that dominates a JJ node (which itself dominates a node labeled quiet), and that also dominates a subtree where the root node is labeled NN. If you add a node in this tree whose label is "!" you specify that the subtree under the "!" node should NOT be contained in your results. you change it from dominates to does not dominate, so if you changed the expression this way
(NP (JJ (! quiet)) (NN ...))
then you have modified your structure to specify an NP that must contain an adjective (JJ), but you抳e said that that adjective cannot be the word quiet. And, in fact, you could say
(NP (JJ (! quiet|peaceful|good)) (NN ...))
in order to exclude the adjectives quiet, peaceful, and good (the vertical bar means 搊r?.
This, however, is not quite want we wanted ?we wanted to exclude all adjectives. The way to do this is to change the specification so that the negation applies to the whole JJ (adjective) and doesn抰 care about what抯 underneath it:
(NP (! JJ) (NN ...))
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Once you抳e edited the query, you can Proceed to Search, save the query, etc., as usual. (Note you can edit the query on the query page, as well.) If you execute this query in the LSE Web collection of sentences, you抣l get sentences like ?/SPAN>You might get hurt, but it‘s the only way to live life completely etc.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Exercise: If you wanted to specify a live-life cognate object construction with no post-verbal adverb, i.e. excluding the above sentence, what query would you come up with? See footnote for one answer.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    If the LSE gets stuck.   If the LSE gets into a strange state that you can抰 get out of, the first thing to try is using your browser to force a reload of the page (in most browsers, hold shift and click the reload page button). The second thing to try is navigating off the page and then navigating back to it, again perhaps reloading it when you get there. The third thing to try is quitting out of your browser entirely, and then starting up the browser again and going to the LSE. As with all things computational, save frequently (e.g. using the Save Query button) if there抯 something that抯 important.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Logging out. There is currently no functionality for logging out. You can just quit your browser.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Use the LSE discussion group. There is a forum for LSE users available at http://lse.umiacs.umd.edu/forum. Join the forum, help each other out, and above all please give us feedback on ways to improve the LSE and which features are most important to add next.
SPAN style="FONT: 7pt ‘Times New Roman‘">                    Have fun, do good work, and keep us posted! The future of the LSE depends, in part, on whether or not it turns out to support good linguistics research. We would very much like to keep track of presentations, papers, articles, and projects where the LSE has played a role.
Appendix: Citing Data Found Using the LSE
In presentations and publications using Web data, we strongly recommend careful documentation of the sources of those data ?not only as good research practice, but to bolster the credibility of the data, since anyone who doubts a claim (揂re you sure that sentence came from a page where the person really knew English?? can go to the data and decide for himself or herself.
The Internet Archive collection makes this particularly easy: for sentences found in this collection, we recommend providing the Internet Archive抯 URL for the page, which includes the page抯 original URL plus a timestamp identifying the date the page was crawled.
It抯 worth noting that, unlike the collection of Internet Archive sentences, Altavista collection sentences are taken from current pages on the Web, which might change or cease to exist at any time. This is undesirable in terms of having persistent data that anyone can return to, but a minimum, the APA style guide recommends that, 揳 reference of an Internet source should provide a document title or description, a date (either the date of publication or update or the date of retrieval), and an address (in Internet terms, a uniform resource locator, or URL)?(http://www.apastyle.org/elecgeneral.html, retrieved 4 October 2003).
For pages in Altavista-based collections, the LSE will help you find a more permanent citation by making it easy to locate stored snapshots of this page on the Internet Archive. If you click the Archived link below a hit, for a sentence that came from an Altavista-based collection, the LSE will look on the Internet Archive and will show you its list of snapshots for that page.
One of these snapshots may be a permanently archived version of the page that contains the sentence you抮e looking at. In our opinion, it is worth looking for the Internet Archive version of any data that you consider important.
Bibliography
Steven Abney, 揝tatistical Methods and Linguistics? in J. Klavans and P. Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language, Cambridge, MA: MIT Press, pp. 1-26, 1996.
American National Corpus, http://americannationalcorpus.org/, as of 9 November 2003.
British National Corpus, http://www.natcorp.ox.ac.uk/, as of 9 November 2003.
Child Language Data Exchange System (CHILDES), http://childes.psy.cmu.edu/, as of 9 November 2003.
Corley, S., Corley, M., Keller, F., Crocker, M., & Trewin, S., 揊inding Syntactic Structure in Unparsed Corpora: The Gsearch Corpus Query System? Computers and the Humanities, 35, 81-94, 2001.
FrameNet, http://www.icsi.berkeley.edu/~framenet/, as of 9 November 2003.
Francis, S. and H. Kueera, Computing Analysis of Present-day American English, Brown University Press, Providence, RI, 1967.
Goldberg, Adele E. Constructions: A Construction Grammar Approach to Argument Structure, University of Chicago Press, 1995.
Levin, Beth, English Verb Classes And Alternations: A Preliminary Investigation, Chicago: University of Chicago Press, 1993.
Linguistic Data Consortium (LDC), http://www.ldc.upenn.edu/, as of 9 November 2003.
Manning, Christopher D. 揚robabilistic Syntax? in Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds), Probabilistic Linguistics, pp. 289-341. Cambridge, MA: MIT Press, 2003.
Oostdijk, N. & P. de Haan (eds.). Corpus-based research into language. Amsterdam: Rodopi. 1994.
Penn Treebank, http://www.cis.upenn.edu/~treebank/home.html, as of 9 November 2003.
Pollard, C. and I. A. Sag, Head-Driven Phrase Structre Grammar. Chicago: University of Chicago Press, 1994.
PropBank, http://www.cis.upenn.edu/~ace/, as of 9 November 2003.
Rohde, D., Tgrep2, http://tedlab.mit.edu/~dr/Tgrep2/, 2001, page as of 9 November 2003.
Sapir, Edward. Language: An Introduction to the Study of Speech. New York: Harcourt, Brace, 1921; Bartleby.com, 2000. www.bartleby.com/186/.
Scott, M., Wordsmith Tools version 3, Oxford: Oxford University Press. ISBN 0-19-459289-8, 1999.
One can go further, to a more thoroughly probabilistic view of grammar, as suggested by Abney (1996), Manning (2003), and others. I am sympathetic to that viewpoint, and I like the way Chris Manning (2003) puts it: 揟o go out on a limb for a moment, let me state my view: generative grammar has produced many explanatory hypotheses of considerable depth, but is increasingly failing because its hypotheses are disconnected from verifiable linguistic data. . . I would join Weinreich, Labov, and Herzog (1968, 99) in hoping that 慳 model of language which accommodates the facts of variable usage . . . leads to more adequate descriptions of linguistic competence.挃
That said, I would emphasize that the LSE抯 main mission ?to permit richer empirical investigation of naturally occurring language data ?is at least compatible with linguists of all (well, most) stripes.
Also worthy of note: The Robustness Principle (揃e conservative in what you do, be liberal in what you accept from others,?Jon Postel, RFC 793) and The Principle of Least Astonishment (揂 program should always respond in the way that is least likely to astonish the user? one Web source attributes this to Grady Booch. 1987. Software Engineering with Ada. 2nd Ed. Benjamin Cummings, Menlo Park, CA, p. 59).
The query language tgrep2 is a variation of Rich Pito抯 original tgrep, distributed with the Penn Treebank. The tgrep family of tools lets you specify tree-based patterns to match in a parsed corpus (Rohde, 2001; http://tedlab.mit.edu/~dr/Tgrep2/).
Note for advanced users: these tree search expressions are tgrep2 patterns. Advanced users could go directly to this page and type in arbitrary tgrep2 queries rather than having Query By Example generate a valid pattern for you automatically. Also, the Add a Subquery button allows advanced users to specify secondary filtering criteria, e.g. more tgrep2 patterns that must match. Sentences must match all subqueries to be returned, i.e. the subqueries are combined via Boolean AND.
The Offensive Content Filter is based on a simple word-list approach ?imagine George Carlin抯 list of 搒even words you can抰 say on TV?expanded a great deal based on the sorts of things likely to show up on Web pornography sites. Please be aware that the filter is not perfect.
For internal bookkeeping, collection names are always prefixed by the user抯 login name.
Using Query by Example with the sentence 揑t抯 the only way to live life completely?SPAN style="mso-spacerun: yes"> and editing the tree and the pattern as recommended, you can get to the expression (VP < (/^(VB|VBD|VBG|VBN|VBP|VBZ)$/ < /^(lived|lived|lives|living|live)$/) < (NP < (/^(NN|NNS)$/ < /^(lives|lives‘s|life‘s|life)$/) ) !< ADVP). Crucially, notice the exclamation point near the end of the expression, which is saying that the VP should not dominate an ADVP.
This will very shortly be added to the information available via a sentence抯 Annotation link.

The linguist‘s search engine The Search Engine Report - Number 110 The State of Search Engine Marketing The Anatomy of a Search Engine The Future of Search Engine Technology search engine SimplyBlog | Vertical LEAP: the Vertical Search Engine Conference The Lucene search engine: Powerful, flexible, and free Search Engine Optimization | Search Engine Marketing News Search Engine Optimization share search engine Improving Search Engine Rankings Search Engine Journal Unusual Search Engines Search Engine Marketing Forum - Igrep niche search engine Niche Marketing | Search Engine Optimization Google的启示:Search Engine搜索引擎研究博客搜索引擎（blog search engine） Library 2.0 - Library 2.0 Search Engine swick... Accoona business search engine - Free company... 搜索引擎营销(SEM)Search Engine Marketing The Saboteurs Of Search Technology News: News: An Open-Source Search Engine Takes Shape 博客搜索和博客联播发布:Search Engine搜索引擎研究博客搜索和博客联播发布:Search Engine搜索引擎研究