Natural language processing (NLP) aims to enable computers to use human languages – so that people can, for example, interact with computers naturally; or communicate with people who don't speak a common language; or access speech or text data at scales not otherwise possible. The NLP group at Notre Dame is interested in all aspects of NLP, with a focus on machine translation and connections with formal language theory.
The NLP group co-sponsors NL+, the Natural Language Processing Lunch Seminar.
Current Members
Former Members
- Brian DuSell (PhD 2023 → ETH Zürich)
- Colin McDonald (BA 2023 → CMU)
- Patrick Soga (BS 2022 → UVA)
- Xing Jie Zhong (MS 2021 → Google)
- Toan Q. Nguyen (PhD 2021 → Amazon → Zoom)
- Justin DeBenedetto (PhD 2021 → asst. prof. Villanova)
- Chan Hee (Luke) Song (BS 2020 → OSU)
- Kenton Murray (PhD 2020 → JHU)
- Antonios Anastasopoulos (PhD 2019 → postdoc CMU → asst. prof GMU)
- Arturo Argueta (PhD 2019 → Apple)
- Tomer Levinboim (PhD 2017 → Google)
- Xiang Zhou (summer intern 2017 → UNC → Google DeepMind)
- Cindy Xinyi Wang (BS 2017 → PhD CMU → Google DeepMind)
- Ashish Vaswani (PhD 2014 at USC → USC ISI → Google Brain → Adept AI → Essential AI)
Projects
Expressivity of neural sequence models Relating neural sequence models to automata, grammars, circuits, and logics. Collaboration with Peter Cholak and Anand Pillay. Sponsored by NSF.
Speech and language processing for dialects and language varieties Collaboration with Antonis Anastaspoulos (GMU), Sachin Kumar (OSU), and Yulia Tsvetkov (UW).
Technologies for documentation of endangered languages Collaboration with Antonis Anatasopoulos and Geraldine Walther (GMU).
Differentiable, probabilistic programming with recursive structured models Collaboration with Chung-chieh Shan (IU).
NLP for ancient languages Analysis of Latin texts, reconstructing proto-Italic from Latin, and modeling Biblical Hebrew. Collaborations with Brian Krostenko, Hildegund Müller, and David Smiley.
Recent Publications
Andy Yang, Pascal Bergsträßer, Georg Zetzsche, David Chiang, and Anthony W. Lin.
Length generalization bounds for transformers.
2026.
arXiv:2603.02238.
PDF
BibTeX
Chihiro Taguchi, Yukinori Takubo, and David Chiang.
Automatic speech recognition for documenting endangered languages: case study of Ikema Miyakoan.
In Proc. Language Resources and Evaluation Conference. 2026.
To appear.
PDF
BibTeX
Stephen Bothwell, Kaitlin Stephan, Hildegund Müller, and David Chiang.
From paginā to webpage: on developing and documenting a digitized Latin collection.
Journal of Open Humanities Data, 2026.
To appear.
BibTeX
Akriti Dhasmana, Aarohi Srivastava, and David Chiang.
Dialect matters: cross-lingual ASR transfer for low-resource Indic language varieties.
In Proc. Workshop on NLP for Similar Languages, Varieties and Dialects. 2026.
PDF
BibTeX
Andy Yang, Anej Svete, Jiaoda Li, Anthony Widjaja Lin, Jonathan Rawski, Ryan Cotterell, and David Chiang.
Probability distributions computed by autoregressive transformers.
In Proc. ICLR. 2026.
To appear.
PDF
BibTeX
Yotaro Kubo, Richard Sproat, Chihiro Taguchi, and Llion Jones.
Building tailored speech recognizers for Japanese speaking assessment.
2025.
arXiv:2509.20655.
PDF
BibTeX
Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, and David Chiang.
The transformer cookbook.
Transactions on Machine Learning Research, January 2026.
PDF
BibTeX
Katsumi Ibaraki and David Chiang.
Frustratingly easy data augmentation for low-resource ASR.
2025.
arXiv:2509.15373.
PDF
BibTeX
Chihiro Taguchi, Seng Mai, Keita Kurabe, Yusuke Sakai, Georgina Agyei, Soudabeh Eslami, and David Chiang.
Languages still left behind: toward a better multilingual machine translation benchmark.
In Proc. EMNLP, 20142–20154. 2025.
doi:10.18653/v1/2025.emnlp-main.1018.
PDF
BibTeX
Chihiro Taguchi, Seiji Maekawa, and Nikita Bhutani.
Efficient context selection for long-context QA: no tuning, no iteration, just adaptive-\(k\).
In Proc. EMNLP, 20116–20141. 2025.
doi:10.18653/v1/2025.emnlp-main.1017.
PDF
BibTeX
Andy Yang, Michaël Cadilhac, and David Chiang.
Knee-deep in C-RASP: a transformer depth hierarchy.
In Proc. NeurIPS 38. 2025.
To appear.
PDF
BibTeX
Andy Yang, Lena Strobl, David Chiang, and Dana Angluin.
Simulating hard attention using soft attention.
Transactions of the Association for Computational Linguistics, 14:147–166, 2026.
doi:10.1162/TACL.a.597.
DOI
BibTeX
Aarohi Srivastava and David Chiang.
We're calling an intervention: exploring fundamental hurdles in adapting language models to nonstandard text.
In Proc. Workshop on Noisy and User-Generated Text. 2025.
Best Paper Award.
PDF
BibTeX
David Chiang.
Transformers in uniform TC\(^0\).
Transactions on Machine Learning Research, January 2025.
PDF
BibTeX
Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, and Ashish Sabharwal.
Transformers as transducers.
Transactions of the Association for Computational Linguistics, 13:200–219, 2025.
doi:10.1162/tacl_a_00736.
DOI
BibTeX
Chihiro Taguchi and David Chiang.
Language complexity and speech recognition accuracy: orthographic complexity hurts, phonological complexity doesn't.
In Proc. ACL. 2024.
Outstanding Paper Award and Senior Area Chair Award.
PDF
BibTeX
Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, and Antonios Anastasopoulos.
DIALECTBENCH: a NLP benchmark for dialects, varieties, and closely-related languages.
In Proc. ACL. 2024.
Social Impact Award.
PDF
BibTeX
David Chiang, Colin McDonald, and Chung-chieh Shan.
Exact recursive probabilistic programming.
PACMPL, 2023.
doi:10.1145/3586050.
PDF
BibTeX
Language and Computation at Notre Dame
People
- Meng Jiang: summarization and generation
- Toby Li: human-computer interaction
- Walter Scheirer: digital humanities and handwriting recognition
- John Lalor (ITAO): NLP and biomedical informatics
Courses
- CSE 40657/60657, Natural Language Processing, Prof. David Chiang
- CSE 40982, Interactive Dialogue Systems, Prof. Collin McMillan
- ITAO 40250, Unstructured Data Analytics, Prof. John Lalor
- AL 20301, Introduction to Linguistics, Prof. Hana Kang