Dayyán O’Brien
Hi there, I’m Dayyán 👋 I’m currently a PhD student at the University of Edinburgh in the UKRI AI Centre for Doctoral Training (CDT) for Responsible and Trustworthy in-the-world NLP, supervised by Dr. Emily Allaway. My research focuses on improving the compositionality of language models. I’m fully funded by the prestigious G-Research PhD Scholarship for outstanding academic merit.
Previously, I completed a Master of Informatics (First Class Honours) at the University of Edinburgh from 2019-2024. During my studies, I worked as a research assistant with Prof. Mirella Lapata from 2022-2024, where I curated and evaluated mNumersense—a dataset of 36k+ Arabic, Chinese, and Russian numeric commonsense sentences. This work led to my Outstanding Honours Project on “Numerical Commonsense Reasoning across Languages” in both 2023 and 2024.
After graduating, I continued my research journey with the StatMT group at Edinburgh. I worked as a Junior Research Assistant with Pinzhen Chen on EMMA-500, a 7B parameter LLM supporting over 500 languages, and later as a Research Assistant with Barry Haddow, co-developing the HPLT 2.0 bitexting pipeline for mining parallel data.
My research interests span conversational & compositional NLP, numerical reasoning, and developing more robust language models. I’ve contributed to several high-impact publications and open-source projects, including leading the development of the DocHPLT corpus and contributing to EMMA-500.