School Seminar Series

What LLMs Reveal and What They Believe

10^th December 2025, 13:00 ELEC201, 2th Floor Lecture Theater EEE
Youcheng Sun
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi

Abstract

How do outputs leak inputs, and how does RAG get misled? Modern language models do not just generate text; under common settings they can also reveal it. The talk first explains how exposing model outputs enables exact reconstruction of the original input. This can aid debugging, for example by helping identify hidden backdoor triggers, yet it can also recover sensitive personal information (such as passwords and ID numbers) using only what the model returns. Meanwhile, on the other side (the inputs), the talk examines what models “believe” in retrieval‑augmented generation (RAG): how a single adversarially phrased document can hijack a pipeline, and how a fast graph‑based reranker restores consensus by rewarding mutually consistent sources and down‑weighting query‑echo outliers. Taken together, the talk aims to enable more informed discussions about when and how to trust LLMs.