Natural language processing (NLP) aims to enable computers to use human languages – so that people can, for example, interact with computers naturally; or communicate with people who don't speak a common language; or access speech or text data at scales not otherwise possible. The NLP group at Notre Dame is interested in all aspects of NLP, with a focus on machine translation and connections with formal language theory.

The NLP group co-sponsors NL+, the Natural Language Processing Lunch Seminar.

Current Members

  • Ken Sible
    PhD student
  • Aarohi Srivastava
    PhD student
  • Colin McDonald
    Undergraduate student
  • Patrick Soga
    Undergraduate student

Former Members

Projects

Neural networks for machine translation Models and algorithms for translation and language modeling using neural networks.
Expressivity of neural sequence models Relating neural sequence models to automata, grammars, circuits, and logics.
Natural language (variety) processing Collaboration with Antonis Anastaspoulos (GMU) and Yulia Tsvetkov (UW). Sponsored by NSF.
Language documentation with an AI helper Collaboration with Antonis Anatasopoulos and Geraldine Walther (GMU). Sponsored by NSF.
Differentiable, probabilistic programming with recursive structured models. Collaboration with Chung-chieh Shan (IU). Sponsored by NSF.
NLP on medieval texts Analysis of Latin texts and language modeling for OCR of Latin manuscsripts. Collaborations with Walter Scheirer and Hildegund Müller. Sponsored by Notre Dame FRSP.

Recent Publications

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, and others. Beyond the Imitation Game: quantifying and extrapolating the capabilities of language models. 2022. arXiv:2206.04615. PDF BibTeX
David Chiang and Peter Cholak. Overcoming a theoretical limitation of self-attention. In Proc. ACL. 2022. PDF BibTeX
Brian DuSell and David Chiang. Learning hierarchical structures with differentiable nondeterministic stacks. In Proc. ICLR. 2022. PDF BibTeX
Samuel Grieggs, Bingyu Shen, Greta Rauch, Pei Li, Jiaqi Ma, David Chiang, Brian Price, and Walter Scheirer. Measuring human perception to improve handwritten document transcription. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. doi:10.1109/TPAMI.2021.3092688. DOI BibTeX
Toan Q. Nguyen, Kenton Murray, and David Chiang. Data augmentation by concatenation for low-resource translation: a mystery and a solution. In Proc. Conference on Spoken Language Translation. 2021. PDF BibTeX
David Chiang and Colin McDonald. Syntax-based attention masking for neural machine translation. In Proc. NAACL Student Research Workshop. 2021. PDF BibTeX
David Chiang, Alexander M. Rush, and Boaz Barak. Named tensor notation. 2021. arXiv:2102.13196. PDF BibTeX
David Chiang and Chung-chieh Shan. Translating recursive probabilistic programs to factor graph grammars. 2020. Presented at PROBPROG 2020. PDF BibTeX
David Chiang and Darcey Riley. Factor graph grammars. In Proc. NeurIPS, 6648–6658. 2020. PDF BibTeX
Brian DuSell and David Chiang. Learning context-free languages with nondeterministic stack RNNs. In Proc. CoNLL, 507–519. 2020. PDF BibTeX
Julian Salazar, Davis Liang, Toan Q. Nguyen, and Katrin Kirchhoff. Masked language model scoring. In Proc. ACL, 2699–2712. 2020. doi:10.18653/v1/2020.acl-main.240. PDF BibTeX
Justin DeBenedetto and David Chiang. Representing unordered data using complex-weighted multiset automata. In Hal Daumé III and Aarti Singh, editors, Proc. ICML, volume 119 of Proceedings of Machine Learning Research, 2412–2420. 2020. PDF BibTeX
Arturo Argueta and David Chiang. Accelerating sparse matrix operations in neural networks on graphics processing units. In Proc. ACL, 6215–6224. 2019. PDF BibTeX
Antonios Anastasopoulos, Alison Lui, Toan Q. Nguyen, and David Chiang. Neural machine translation of text from non-native speakers. In Proc. NAACL: HLT, volume 1, 3070–3080. 2019. PDF BibTeX
Kenton Murray and David Chiang. Correcting length bias in neural machine translation. In Proc. WMT, 212–223. 2018. PDF BibTeX
Arturo Argueta and David Chiang. Composing finite state transducers on GPUs. In Proc. ACL, 2697–2705. 2018. PDF BibTeX
Justin DeBenedetto and David Chiang. Algorithms and training for weighted multiset automata and regular expressions. In Proc. Conference on Implementation and Applications of Automata, 146–158. 2018. PDF BibTeX
Antonios Anastasopoulos and David Chiang. Tied multitask learning for neural speech translation. In Proc. NAACL: HLT, volume 1, 82–91. 2018. PDF BibTeX
Toan Nguyen and David Chiang. Improving lexical choice in neural machine translation. In Proc. NAACL: HLT, volume 1, 334–343. 2018. PDF BibTeX
Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen. Combining character and word information in neural machine translation using a multi-level attention. In Proc. NAACL: HLT, volume 1, 1284–1293. 2018. PDF BibTeX
Salvador Aguinaga, David Chiang, and Tim Weninger. Learning hyperedge replacement grammars for graph generation. IEEE Trans. Pattern Analysis and Machine Intelligence, 41(3):625–638, 2019. doi:10.1109/TPAMI.2018.2810877. PDF BibTeX
David Chiang, Frank Drewes, Daniel Gildea, Adam Lopez, and Giorgio Satta. Weighted DAG automata for semantic graphs. Computational Linguistics, 44(1):119–186, 2018. PDF BibTeX

All papers

Language and Computation at Notre Dame

Research

People

Courses