hoopoeDayyán O'Brien

I am a PhD student at the CDT in Designing Responsible NLP at the University of Edinburgh, supported by G-Research NextGen. Under the supervision of Emily Allaway, my research focuses on improving the compositionality of large language models in conversation.

I will be in Suzhou, China for EMNLP 2025. Please feel free to reach out if you would like to meet!

Email  /  Google Scholar  /  LinkedIn  /  CV

profile photo

Publications

MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
Dayyán O'Brien, Pinzhen Chen, Barry Haddow, Emily Allaway
arXiv:2510.05962. Under review for ACL Rolling Review (ARR). 2025.
MathNLP Workshop, Empirical Methods in Natural Language Processing (EMNLP) (non-archival). 2025.
arXiv / pdf / bibtex
DocHPLT: A Massively Multilingual Document-Level Translation Dataset
Dayyán O'Brien, Bhavitvya Malik, Ona De Gibert Bonet, Pinzhen Chen, Barry Haddow, Jörg Tiedemann
Proceedings of the Conference on Machine Translation (WMT). 2025.
data / pivoted / arXiv / pdf / bibtex
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies
Laurie Burchell, Ona de Gibert, Nikolay Arefyev, Mikko Aulamo, Marta Bañón, Pinzhen Chen, Mariia Fedorova, Liane Guillou, Barry Haddow, Jan Hajič, Jindřich Helcl, Erik Henriksson, Mateusz Klimaszewski, Ville Komulainen, Andrey Kutuzov, Joona Kytöniemi, Veronika Laippala, Petter Mæhlum, Bhavitvya Malik, Farrokh Mehryary, Vladislav Mikhailov, Nikita Moghe, Amanda Myntti, Dayyán O'Brien, Stephan Oepen, Proyag Pal, Jousia Piha, Sampo Pyysalo, Gema Ramírez-Sánchez, David Samuel, Pavel Stepachev, Jörg Tiedemann, Dušan Variš, Tereza Vojtěchová, Jaume Zaragoza-Bernabeu
Proceedings of the Association for Computational Linguistics (ACL). 2025.
data / arXiv / pdf / bibtex
Mind the Gap: Diverse NMT Models for Resource-Constrained Environments
Ona De Gibert Bonet, Dayyán O'Brien, Dušan Variš, Jörg Tiedemann
Proceedings of the Nordic Conference on Computational Linguistics (NoDaLiDa). 2025.
pdf / bibtex
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji, Zihao Li, Indraneil Paul, Jaakko Paavola, Peiqin Lin, Pinzhen Chen, Dayyán O'Brien, Hengyu Luo, Hinrich Schütze, Jörg Tiedemann, Barry Haddow
arXiv:2409.17892. Under review for the Journal of Data-centric Machine Learning Research (DMLR). 2024.
models / data / arXiv / pdf / bibtex
Prompting Numerical Commonsense Reasoning across Languages
Dayyán O'Brien
Outstanding Honours Thesis, School of Informatics, University of Edinburgh. 2024.
pdf / bibtex
Numerical Commonsense Reasoning across Languages
Dayyán O'Brien
Outstanding Honours Thesis, School of Informatics, University of Edinburgh. 2023.
pdf / bibtex

Previous background

I received an integrated Master’s in Informatics from the University of Edinburgh, where I worked as an undergraduate researcher with Prof. Mirella Lapata on multilingual commonsense reasoning. I later worked as a Research Assistant at the University of Edinburgh with Pinzchen Chen and Barry Haddow, where I was a member of the High Performance Language Technologies (HPLT) project, working on multilingual model development and mining parallel data for multilingual translation.

Personal

I am originally from Ireland, and moved to Edinburgh 6 years go. I enjoy American history (especially works by Robert Caro), film, and hiking, and most of all spending time with my wife. I am also a member of the Baháʼí Faith. As a reward for reading this far, here are some very rewarding links.

Website last updated on the 9th of October 2025.