About Me

I am currently a second-year Master’s student in Computer Science at Northeastern University, focusing on Retrieval-Augmented Generation. Google Scholar:

Now I am engaged in research internships at NEUIR Lab under the guidance of Associate Professor Zhenghao Liu, as well as at THUNLP, supervised by Yukun Yan.

🔎 Research Interests

My research focuses on agentic systems that integrate external knowledge, efficient reasoning, and scalable infrastructure. These span:

How can models utilize external knowledge: Retrieval-Augmented Generation (RankCoT, KAIR, Cost-Aware), Deep Research (EigentSearch-Q+), AI Memory.
How can models reason efficiently: Reasoning Compression (Mixed-Policy Distillation), Multi-Modal Reasoning.
Agent Infrastructure: UltraRAG 2.0.

👀 I am looking for a Ph.D. position starting in Fall 2027 and would love to explore potential collaborations. Let’s connect!

🔥 News

2026.04: 🎉 Our paper EigentSearch-Q+ is accepted by ACM CAIS 2026 Demos!
2025.08: 🎉 We released UltraRAG 2.0 , an low-code framework for building complex RAG systems!
2025.05: 🎉 Our paper RankCoT is accepted by ACL 2025!

📝 Publications

* indicates equal contribution, and † indicates corresponding author.

Arxiv Print

When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

Mingyan Wu^*, Han Yang^*, Omer Ben-Porat, Yftah Ziser^†

📃Paper | 📄PDF |

This work introduces cost-aware RAG, a setting where retrieved evidence is assigned accesscost tiers and systems must answer under an explicit evidence-access budget. We instantiate this setting by augmenting MS MARCO v2.1 with access-friction tiers and evaluate budgeted evidence selection across generaldomain and domain-specific QA benchmarks.

Arxiv Print

Reasoning Compression with Mixed-Policy Distillation

Han Yang^*, Mingyan Wu^*, Bailan He, Zeyu Cao, Sikuan Yan, Kevin Qinghong Lin, Zifeng Ding^*†

📃Paper | 📄PDF

This work proposes Mixed-Policy Distillation (MPD), a reasoning compression framework that transfers concise reasoning behavior from a larger-sized teacher to a smaller student by distilling teacher-compressed student trajectories. This preserves student-policy exploration while injecting teacher-guided compression.

ACM CAIS 2026 Demos

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He^†

📃Paper | 📄PDF |

This work introduces Q+, a set of query and evidence processing tools that make web search more deliberate by guiding query planning, monitoring search progress, and extracting evidence from long web snapshots.

ACL 2025

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Mingyan Wu, Zhenghao Liu^†,Yukun Yan^†, Xinze Li, Shi Yu, Zheni Zeng, Yu Gu, Ge Yu

📃Paper | 📄PDF |

This work leverages the strengths of both ranking and summarization to effectively refine the knowledge from retrieval results, thereby aiding LLMs in generating more accurate responses.

Arxiv Print

Finding What Matters: Anchoring Context Knowledge with Evolving Indices for Iterative Retrieval

Mingyan Wu^*, Zhenghao Liu^*†, Xinze Li, Yuqing Lan, Yukun Yan^†, Shuo Wang, Cheng Yang, Minghe Yu, Zheni Zeng, Maosong Sun

📃Paper | 📄PDF |

This work proposes a Knowledge Anchoring framework for Iterative Retrieval that anchors knowledge within retrieved knowledge to guide LLMs to locate the key information during iterative retrieval and answer generation.

📖 Educations

2024.09 - now, M.S. School of Computer Science and Engineering, Northeastern University
2020.09 - 2024.07, B.S. School of Computer Science, Yangtze University

💻 Internships

2024.04 - now, THUNLP, Tsinghua University, Beijing.
2023.10 - now, NEUIR Lab, Northeastern University, Shenyang.

🎼 Amateur

Dancing 💃: I’ve been danced for about nine years. My favorite dance styles are jazz and hiphop. And I also used to be a part-time dance teacher.
Guitar 🎸: I am a beginner of guitar and I usually play folk guitar.
Swimming 🏊: I’ve recently got into swimming and would like to learn more about it.

If you share these interests, I would be glad to connect and grow together 📞.