Publications
"*" denotes equal contribution
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations
ICLR 2026
PDF
Code/Data
Long
The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
arxiv
PDF
Code/Data
Long
🏆 HALoGEN: Fantastic LLM Hallucinations and Where To Find Them
ACL Outstanding Paper Award/TrustNLP Workshop Best Paper Award
ACL 2025
PDF
Code/Data
Website
Long
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
COLM 2025
PDF
Code/Data
Long
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
Nominated for Outstanding Paper Award
NAACL 2025
PDF
Code/Data
Long
📰 Press:
Techcrunch
Mint
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
NeurIPS 2025
PDF
Code/Data
Long
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
NAACL 2025
PDF
Code/Data
Short
⭐ WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Spotlight
ICLR 2025
PDF
Code/Data
Long
The Art of Saying No: Contextual Noncompliance in Language Models
NeurIPS 2024 Datasets and Benchmarks
PDF
Code/Data
Long
🏆 Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
MASC-SLL 2024 Best Paper Award
ACL 2024
PDF
Code/Data
Long
🏆 OLMo: Accelerating the Science of Language Models
ACL Best Theme Paper Award/GeekWire Innovation of the Year Award
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
PDF
Code/Data
Long
🏆 Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
ACL Best Resource Paper Award
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
PDF
Code/Data
Long
Agent Lumos: Unified and Modular Training for Open-Source Language Agents
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
PDF
Code/Data
Long
Website
📰 Press:
Marktechpost
MacGyver: Are Large Language Models Creative Problem Solvers?
Nominated for Outstanding Paper Award
NAACL 2024
PDF
Code/Data
Long
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
arXiv
PDF
Code/Data
Long
The Generative AI Paradox: "What It Can Create, It May Not Understand"
2024 International Conference on Learning Representations (ICLR 2024)
PDF
Long
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
2024 International Conference on Learning Representations (ICLR 2024)
PDF
Code/Data
Long
Website
Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants
USENIX Security 2024
PDF
Long
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
EMNLP 2023
PDF
Code/Data
Long
🏆 CondaQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
SoCal NLP Symposium Best Paper Award
EMNLP 2022
PDF
Code/Data
Long
Measuring Causal Effects of Data Statistics on Language Model's 'Factual' Predictions
arXiv
PDF
A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus
LREC 2022
PDF
Code/Data
Long
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
EACL 2021
PDF
Code/Data
Long
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)
PDF
Code/Data
Long
Website
On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT
*SEM 2020
PDF
Code/Data
Long
EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
CoNLL 2019
PDF
Code/Data
Long
Question Answering for Privacy Policies: Combining Computational and Legal Perspectives
EMNLP 2019
PDF
Code/Data
Long
Evaluating How Global Privacy Principles Answer Consumers' Questions About Mobile App Privacy
PLSC 2019
Challenges in Automated Question Answering for Privacy Policies
AAAI Spring Symposium Series, 2019
PDF
Long
🏆 Stress Test Evaluation for Natural Language Inference
Area Chair Favorite Paper Prize
COLING 2018
PDF
Code/Data
Long
Website
Slides
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations
ACL 2017 Workshop
PDF
Short