Some Classic Paper List on Nature language Procession

来源:百度文库 编辑:神马文学网 时间:2024/04/29 19:25:25

Classical Paper List on Machine Learning and Natural Language Processing

Maintained by Zhiyuan Liu
liuliudong AT gmail.com

Hidden Markov Models

  • Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. (Proceedings of the IEEE 1989)
  • Freitag and McCallum, 2000, Information Extraction with HMM Structures Learned by Stochastic Optimization, (AAAI'00)

Maximum Entropy

  • Adwait R. A Maximum Entropy Model for POS tagging, (1994)
  • A. Berger, S. Della Pietra, and V. Della Pietra. A maximum entropy approach to natural language processing. (CL'1996)
  • A. Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998.
  • Hai Leong Chieu, 2002. A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, (AAAI'02)

MEMM

  • McCallum et al., 2000, Maximum Entropy Markov Models for Information Extraction and Segmentation, (ICML'00)
  • Punyakanok and Roth, 2001, The Use of Classifiers in Sequential Inference. (NIPS'01)

Perceptron

  • McCallum, 2002 Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms (EMNLP'02)
  • Y. Li, K. Bontcheva, and H. Cunningham. Using Uneven-Margins SVM and Perceptron for Information Extraction. (CoNLL'05)

SVM

  • Z. Zhang. Weakly-Supervised Relation Classification for Information Extraction (CIKM'04)
  • H. Han et al. Automatic Document Metadata Extraction using Support Vector Machines (JCDL'03)
  • Aidan Finn and Nicholas Kushmerick. Multi-level Boundary Classification for Information Extraction (ECML'2004)
  • Yves Grandvalet, Johnny Marià , A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification. (NIPS' 05)

CRFs

  • J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. (ICML'01)
  • Hanna Wallach. Efficient Training of Conditional Random Fields. MS Thesis 2002
  • Taskar, B., Abbeel, P., and Koller, D. Discriminative probabilistic models for relational data. (UAI'02)
  • Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT/NAACL 2003)
  • B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. (NIPS'2003)
  • S. Sarawagi and W. W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction (NIPS'04)
  • Brian Roark et al. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm (ACL'2004)
  • H. M. Wallach. Conditional Random Fields: An Introduction (2004)
  • Kristjansson, T.; Culotta, A.; Viola, P.; and McCallum, A. Interactive Information Extraction with Constrained Conditional Random Fields. (AAAI'2004)
  • Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. (NIPS'2004)
  • John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel Conditional Random Fields: Representation and Clique Selection. (ICML'2004)

Topic Models

  • Thomas Hofmann. Probabilistic Latent Semantic Indexing. (SIGIR'1999).
  • David Blei, et al. Latent Dirichlet allocation. (JMLR'2003).
  • Thomas L. Griffiths, Mark Steyvers. Finding Scientific Topics. (PNAS'2004).

POS Tagging

  • J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. (Computer Speech and Language'1992)
  • Hinrich Schutze and Yoram Singer. Part-of-Speech Tagging using a Variable Memory Markov Model. (ACL'1994)
  • Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. (EMNLP'1996)

Noun Phrase Extraction

  • E. Xun, C. Huang, and M. Zhou. A Unified Statistical Model for the Identification of English baseNP. (ACL'00)

Named Entity Recognition

  • Andrew McCallum and Wei Li. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. (CoNLL'2003). Moshe Fresko et al. A Hybrid Approach to NER by MEMM and Manual Rules, (CIKM'2005).

Chinese Word Segmentation

  • Fuchun Peng et al. Chinese Segmentation and New Word Detection using Conditional Random Fields, COLING 2004.

Document Data Extraction

  • Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. (ICML'2000).
  • David Pinto, Andrew McCallum, etc. Table Extraction Using Conditional Random Fields. SIGIR 2003.
  • Fuchun Peng and Andrew McCallum. Accurate Information Extraction from Research Papers using Conditional Random Fields. (HLT-NAACL'2004)
  • V. Carvalho, W. Cohen. Learning to Extract Signature and Reply Lines from Email. In Proc. of Conference on Email and Spam (CEAS'04) 2004.
  • Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang, Email Data Cleaning, SIGKDD'05
  • P. Viola, and M. Narasimhan. Learning to Extract Information from Semi-structured Text using a Discriminative Context Free Grammar. (SIGIR'05)
  • Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Li Teng, and Qinghua Zheng, Automatic Extraction of Titles from General Documents using Machine Learning, Information Processing and Management, 2006

Web Data Extraction

  • Ariadna Quattoni, Michael Collins, and Trevor Darrell. Conditional Random Fields for Object Recognition. (NIPS'2004)
  • Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Shuming Shi, Yunbo Cao, and Hang Li, Title Extraction from Bodies of HTML Documents and Its Application to Web Page Retrieval, (SIGIR'05)
  • Jun Zhu et al. Mutual Enhancement of Record Detection and Attribute Labeling in Web Data Extraction. (SIGKDD 2006)

Event Extraction

  • Kiyotaka Uchimoto, Qing Ma, Masaki Murata, Hiromi Ozaku, and Hitoshi Isahara. Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules. (ACL'2000)
  • GuoDong Zhou and Jian Su. Named Entity Recognition using an HMM-based Chunk Tagger (ACL'2002)
  • Hai Leong Chieu and Hwee Tou Ng. Named Entity Recognition: A Maximum Entropy Approach Using Global Information. (COLING'2002)
  • Wei Li and Andrew McCallum. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. Asian Lang. Inf. Process. 2003

Question Answering

  • Rohini K. Srihari and Wei Li. Information Extraction Supported Question Answering. (TREC'1999)
  • Eric Nyberg et al. The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning. (TREC'2003)

Natural Language Parsing

  • Leonid Peshkin and Avi Pfeffer. Bayesian Information Extraction Network. (IJCAI'2003)
  • Joon-Ho Lim et al. Semantic Role Labeling using Maximum Entropy Model. (CoNLL'2004)
  • Trevor Cohn et al. Semantic Role Labeling with Tree Conditional Random Fields. (CoNLL'2005)
  • Kristina toutanova, Aria Haghighi, and Christopher D. Manning. Joint Learning Improves Semantic Role Labeling. (ACL'2005)

Shallow parsing

  • Ferran Pla, Antonio Molina, and Natividad Prieto. Improving text chunking by means of lexical-contextual information in statistical language models. (CoNLL'2000)
  • GuoDong Zhou, Jian Su, and TongGuan Tey. Hybrid text chunking. (CoNLL'2000)
  • Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. (HLT-NAACL'2003)

Acknowledgement

  • Dr. Hang Li, for original paper list.