The MetaQuerier Project at UIUC

来源:百度文库 编辑:神马文学网 时间:2024/05/04 12:46:04
MetaQuerier: Exploring and Integrating the Deep Web
|| Projects  || Funding  || People  || Publications  ||Tutorials  || Demos  || Datasets  ||
This research aims at enabling effective access to structured information sources on the Internet. Over the past few years, the Web has deepened dramatically- A significant and increasing amount of information is hidden on the "deep" Web, behind the query interfaces of searchable databases. There are numerous such autonomous and heterogeneous sources, each with a different schema and native query constraints. Because current crawlers cannot effectively query databases, such data is invisible to traditional search engines, and thus remains largely hidden from users.
We propose to build a metaquery system, to help users in finding and querying online databases effectively and uniformly. Our efforts aim at opening up the deep Web to users, by building a MetaQuerier; see the architecture below. On this wild frontier of the deep Web, the MetaQuerier will address the challenges of both exploration and integration. Our goal is thus two fold: First, to make the deep Web systematically accessible: the MetaExplorer will discover sources on the deep Web to build a searchable repository, in order to help users find sources useful for their information need. Second, to make the deep Web uniformly usable: the MetaIntegrator will help users interact with online databases to ask queries.Projects
First, the MetaExplorer project focuses on the discovery, modeling, and structuring of databases on the Web, to build a searchable source repository. Essentially, this MetaExplorer project will develop a "search engine" of Web databases: It will develop crawlers for efficiently discovering databases on the Internet, design models for representing these databases, develop wrappers for automatically extracting their model parameters (e.g., schema details on their query interfaces), and structure and index a searchable repository of Web sources.
Second, the MetaIntegrator project focuses on the integration issues of online sources-- i.e., to bring sources coherently together for query answering. Specifically, we will investigate source selection, query mediation, and schema integration, for building the MetaIntegrator. In studying large-scale integration, these thrusts will benefit from the source repository of the companion MetaExplorer. We will investigate the key enabling technology of dynamic ad-hoc information integration. In contrast to a traditional static system, our MetaIntegrator is dynamic (as new sources may be added any time when they are discovered) and essentially requires ad-hoc integration, which must dynamically select and bring together different sources to answer a query.
Given the pressing need for effective access to the deep Web, we believe the synergy between the exploration and integration focuses of the two sub-projects will together bring a more complete and timely solution for realizing our MetaQuerier goal.
Funding
We gratefully acknowledge our funding sources: NSF CAREER Award 2002, IIS-0133199: for MetaExplorer NSF ITR Award 2003, IIS-0313260: for MetaIntegrator NSF REU/ITR Award 2004, IIS-0434721 Intel WIE Intel Scholars Grant 2004 NCSA (National Center for Supercomputing Applications) Faculty Fellows Award 2003 UIUC Faculty Startup Funds
People
Kevin Chen-Chuan ChangBin HeChengkai LiZhen ZhangGovind Kabra
Publications
Automatic Complex Schema Matching across Web Query Interfaces: A Correlation Mining Approach. B. He and K. C.-C. Chang. ACM Transactions on Database Systems (TODS), 31(1), March 2006. [PDF] Accessing the Deep Web: A Survey. B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Communications of the ACM (CACM), To appear. [PDF] Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the 31st Very Large Data Bases Conference (VLDB 2005), Trondheim, Norway, August 2005. [PDF] Making Holistic Schema Matching Robust: An Ensemble Approach. B. He and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGKDD Conference (KDD 2005) (Full Paper), Chicago, Illinois, August 2005. [PDF] Query Routing: Finding Ways in the Maze of the Deep Web. G. Kabra, C. Li, and K. C.-C. Chang. In Proceedings of the ICDE International Workshop on Challenges in Web Information Retrieval and Integration (ICDE-WIRI 2005), Tokyo, Japan, April 2005. [PDF] Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the Second Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, California, January 2005. [PDF] Mining Semantics for Large Scale Integration on the Web: Evidences, Insights and Challenges. K. C.-C. Chang, B. He, and Z. Zhang. SIGKDD Explorations, 6(2):67-76, December 2004. Invited paper. [PDF] A Holistic Paradigm for Large Scale Schema Matching. B. He and K. C.-C. Chang. SIGMOD Record, 33(4):20-25, December 2004. Invited paper. [PDF] Organizing Structured Web Sources by Query Schemas: A Clustering Approach. B. He, T. Tao, and K. C.-C. Chang. In Proceedings of the 13th Conference on Information and Knowledge Management (CIKM 2004) (Full Paper), Washington D.C., November 2004. [PDF] Structured Databases on the Web: Observations and Implications. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. SIGMOD Record, 33(3):61-70, September 2004. [PDF] MetaQuerier over the Deep Web: Shallow Integration across Holistic Sources. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb‘04), Toronto, Canada, August 2004. [PDF] On-the-fly Constraint Mapping across Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb‘04), Toronto, Canada, August 2004. [PDF] Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach. B. He, K. C.-C. Chang, and J. Han. In Proceedings of the 2004 ACM SIGKDD Conference (KDD 2004) (Full Paper), Seattle, Washington, August 2004. [PDF] Mining Complex Matchings across Web Query Interfaces. B. He, K. C.-C. Chang, and J. Han. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD‘04) (Full Paper), Paris, France, June 2004. [PDF] Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the 2004 ACM SIGMOD Conference (SIGMOD 2004), Paris, France, June 2004. [PDF] Clustering Structured Web Sources: A Schema-based, Model-Differentiation Approach. B. He, T. Tao, and K. C.-C. Chang. In Proceedings of the EBDT Workshop on Clustering Information over the Web (EDBT-ClustWeb‘04), Crete, Greece, March 2004. An expanded version of this paper, invited to be a part of the Current Trends in Database Technology volume, is published in the Springer-Verlag Lecture Notes in Computer Science Series Vol. 3268. [PDF] Statistical Schema Matching across Web Query Interfaces. B. He and K. C.-C. Chang. In Proceedings of the 2003 ACM SIGMOD Conference (SIGMOD 2003), San Diego, California, June 2003. [PDF] Approximate Query Translation Across Heterogeneous Information Sources. K. C.-C. Chang and H. Garcia-Molina. In Proceedings of the 26th VLDB Conference (VLDB 2000), pages 566-577, Cairo, Egypt, September 2000. [Extended Version]
Technical Reports
A Structure-Driven Yield-Aware Web Form Crawler: Building a Database of Online Databases. B. He, C. Li, D. Killian, M. Patel, Y. Tseng, and K. C.-C. Chang. UIUCDCS-R-2006-2752, Department of Computer Science, UIUC, July 2006. [PDF]
Tutorials
Accessing the Web: From Search to Integration. K. C.-C. Chang and J. Cho. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006. Tutorial description. [PDF] [Part II:Web Integration;Bibliography]
Demos
Online Demo: Query capability extraction for understanding Web query interfaces MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGMOD Conference (SIGMOD 2005), System Demonstration, Baltimore, Maryland, June 2005. [PDF] MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Second Midwest Database Research Symposium, Chicago, Illinois, April 2005. Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), System Demonstration, Tokyo, Japan, April 2005. [PDF] Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In NSF Information and Data Management (IDM) Workshop 2004, Boston, Massachussett, October 2004. Knocking the Door to the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 2004 ACM SIGMOD Conference (SIGMOD 2004), System Demonstration, Paris, France, June 2004. [PDF] Toward a MetaQuerier for the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In First Midwest Database Research Symposium, Chicago, Illinois, April 2004. Knocking the Doors to the Deep Web: Understanding Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In NSF Information and Data Management (IDM) Workshop 2003, Seattle, Washington, September 2003.
Datasets
The UIUC Web Integration Repository