Some Classic Paper List on Nature language Procession

来源：百度文库编辑：神马文学网时间：2024/04/29 19:25:25

Classical Paper List on Machine Learning and Natural Language Processing

Maintained by Zhiyuan Liuliuliudong AT gmail.com

Hidden Markov Models

Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. (Proceedings of the IEEE 1989)
Freitag and McCallum, 2000, Information Extraction with HMM Structures Learned by Stochastic Optimization, (AAAI'00)

Maximum Entropy

Adwait R. A Maximum Entropy Model for POS tagging, (1994)
A. Berger, S. Della Pietra, and V. Della Pietra. A maximum entropy approach to natural language processing. (CL'1996)
A. Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998.
Hai Leong Chieu, 2002. A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, (AAAI'02)

MEMM

McCallum et al., 2000, Maximum Entropy Markov Models for Information Extraction and Segmentation, (ICML'00)
Punyakanok and Roth, 2001, The Use of Classifiers in Sequential Inference. (NIPS'01)

Perceptron

McCallum, 2002 Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms (EMNLP'02)
Y. Li, K. Bontcheva, and H. Cunningham. Using Uneven-Margins SVM and Perceptron for Information Extraction. (CoNLL'05)

SVM

Z. Zhang. Weakly-Supervised Relation Classification for Information Extraction (CIKM'04)
H. Han et al. Automatic Document Metadata Extraction using Support Vector Machines (JCDL'03)
Aidan Finn and Nicholas Kushmerick. Multi-level Boundary Classification for Information Extraction (ECML'2004)
Yves Grandvalet, Johnny MariÃ , A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification. (NIPS' 05)

CRFs

J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. (ICML'01)
Hanna Wallach. Efficient Training of Conditional Random Fields. MS Thesis 2002
Taskar, B., Abbeel, P., and Koller, D. Discriminative probabilistic models for relational data. (UAI'02)
Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT/NAACL 2003)
B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. (NIPS'2003)
S. Sarawagi and W. W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction (NIPS'04)
Brian Roark et al. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm (ACL'2004)
H. M. Wallach. Conditional Random Fields: An Introduction (2004)
Kristjansson, T.; Culotta, A.; Viola, P.; and McCallum, A. Interactive Information Extraction with Constrained Conditional Random Fields. (AAAI'2004)
Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. (NIPS'2004)
John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel Conditional Random Fields: Representation and Clique Selection. (ICML'2004)

Topic Models

Thomas Hofmann. Probabilistic Latent Semantic Indexing. (SIGIR'1999).
David Blei, et al. Latent Dirichlet allocation. (JMLR'2003).
Thomas L. Griffiths, Mark Steyvers. Finding Scientific Topics. (PNAS'2004).

POS Tagging

J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. (Computer Speech and Language'1992)
Hinrich Schutze and Yoram Singer. Part-of-Speech Tagging using a Variable Memory Markov Model. (ACL'1994)
Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. (EMNLP'1996)

Noun Phrase Extraction

E. Xun, C. Huang, and M. Zhou. A Unified Statistical Model for the Identification of English baseNP. (ACL'00)

Named Entity Recognition

Andrew McCallum and Wei Li. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. (CoNLL'2003). Moshe Fresko et al. A Hybrid Approach to NER by MEMM and Manual Rules, (CIKM'2005).

Chinese Word Segmentation

Fuchun Peng et al. Chinese Segmentation and New Word Detection using Conditional Random Fields, COLING 2004.

Document Data Extraction

Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. (ICML'2000).
David Pinto, Andrew McCallum, etc. Table Extraction Using Conditional Random Fields. SIGIR 2003.
Fuchun Peng and Andrew McCallum. Accurate Information Extraction from Research Papers using Conditional Random Fields. (HLT-NAACL'2004)
V. Carvalho, W. Cohen. Learning to Extract Signature and Reply Lines from Email. In Proc. of Conference on Email and Spam (CEAS'04) 2004.
Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang, Email Data Cleaning, SIGKDD'05
P. Viola, and M. Narasimhan. Learning to Extract Information from Semi-structured Text using a Discriminative Context Free Grammar. (SIGIR'05)
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Li Teng, and Qinghua Zheng, Automatic Extraction of Titles from General Documents using Machine Learning, Information Processing and Management, 2006

Web Data Extraction

Ariadna Quattoni, Michael Collins, and Trevor Darrell. Conditional Random Fields for Object Recognition. (NIPS'2004)
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Shuming Shi, Yunbo Cao, and Hang Li, Title Extraction from Bodies of HTML Documents and Its Application to Web Page Retrieval, (SIGIR'05)
Jun Zhu et al. Mutual Enhancement of Record Detection and Attribute Labeling in Web Data Extraction. (SIGKDD 2006)

Event Extraction

Kiyotaka Uchimoto, Qing Ma, Masaki Murata, Hiromi Ozaku, and Hitoshi Isahara. Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules. (ACL'2000)
GuoDong Zhou and Jian Su. Named Entity Recognition using an HMM-based Chunk Tagger (ACL'2002)
Hai Leong Chieu and Hwee Tou Ng. Named Entity Recognition: A Maximum Entropy Approach Using Global Information. (COLING'2002)
Wei Li and Andrew McCallum. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. Asian Lang. Inf. Process. 2003

Question Answering

Rohini K. Srihari and Wei Li. Information Extraction Supported Question Answering. (TREC'1999)
Eric Nyberg et al. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning. (TREC'2003)

Natural Language Parsing

Leonid Peshkin and Avi Pfeffer. Bayesian Information Extraction Network. (IJCAI'2003)
Joon-Ho Lim et al. Semantic Role Labeling using Maximum Entropy Model. (CoNLL'2004)
Trevor Cohn et al. Semantic Role Labeling with Tree Conditional Random Fields. (CoNLL'2005)
Kristina toutanova, Aria Haghighi, and Christopher D. Manning. Joint Learning Improves Semantic Role Labeling. (ACL'2005)

Shallow parsing

Ferran Pla, Antonio Molina, and Natividad Prieto. Improving text chunking by means of lexical-contextual information in statistical language models. (CoNLL'2000)
GuoDong Zhou, Jian Su, and TongGuan Tey. Hybrid text chunking. (CoNLL'2000)
Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT-NAACL'2003)

Acknowledgement

Dr. Hang Li, for original paper list.

Some Classic Paper List on Nature language Procession Some Thoughts on Inexpensive Vacations China issues white paper on Internet policy Some Words on McDonald’s Successful Experienc... The Center for Advanced Research on Language ... UGG Australia Classic Dakota Boots White on sale Ugg Classic Short Boots 5825 on Sale and lv Chinese cities better on world competitive list China issues 9th white paper on human rights Some notes on lock-free and wait-free algorithms Some Thoughts on DINKs (Dual Income, No Kids) College Students Spend On Brands That Respect Mother Nature The Center for Advanced Research on Language ...1 【欧美专辑】差利.兰保夫 Charlie Landsborough 《Movin On+Classic Doubles》二CD Language Learning & Technology(Vol.11, No.3): Special Issue on Technology and Reading DSL development: 7 recommendations for Domain Specific Language design based on Domain-Driven Design iptables classic Classic ads wall paper Paper Doll Some days, some people, some memory some advice some网站 some dreams