
🛬 Attending
🇰🇷ICML2026, Seoul, KR
🥰 TY to grants from:
Open Philanthropy
Thinking Machines Lab
<aside>
😎 Google Scholar
</aside>
<aside>
🤠 Research Gate
</aside>
<aside>
🧐 Academic CV
</aside>
Socials
<aside>
😝
Linkedin
</aside>
<aside>
😁 BlueSky
</aside>
<aside>
😆 X/Twitter
</aside>
The rise of reasoning and agentic AI is a double-edged sword, which motivates me to study what could go wrong with reasoning agents (esp. in multi-agents that emulate human collaboration/competition) in 2 directions:
- Reasoning Done Right: How can we make agents smarter? My vertical focus on improving capability evaluation better inform post-training (reasoning-driven RL) leveraging horizontal from actionable interpretability (e.g. model diffing) and robustness probes (e.g. longitudinal analysis as probe for contamination). Ultimately, this line of work contributes to AI4Science that enable frontier AI to better accelerate scientific discovery, where I’m particularly intrigued by AI application in exoplanetary astrophysics, phenomenology of high-energy particle physics and organic synthetic chemistry.
- Reasoning For Good: How can we make smarter agents safer? Reasoning has also enabled novel threat model such as deception, scheming and collusion. My vertical focus on identifying key triggers that suppress/encourage such agentic misaligned behavior and probing how models react differently in realistic (quasi-deployment) vs. fictional (quasi-evaluation) scenarios. Eventually, I believe models need to learn safety constraints via consequence-aware reasoning (CoI, Chain-of-Implication) similar to how legal deterrence work on us humans.
Selected Work (Full List: GScholar)
Slide: What could go wrong with Reasoning Machines?
- Science of Evaluation
- I develop reasoning benchmarks grounded in research-level scientific publications (PioneerPhysics, CauSciBench, SeePhys) and provably complex theory in mathematics/theoretical computer science (Lean+TCS)
- I study the cross-domain transferability of emerging reasoning capability within (M)LLM and optimize data distribution for generalizable thinking.
- I probe patterns of contamination/memorization/steganography in legacy benchmarks and propose perturbation as a means to revive their utility.
- Science of Post-Training
- Science of Alignment (See my talk with the AI Safety Directory) What are the key triggers for models to autonomously deceive/scheme/persuade other agents to (covertly) persue misaligned objective? Are models more inclined to do harm under more realistic or unrealistic scenario?
Useful:
The following represents only my personal opinions:
- I recently quit Cursor and pivot back to VScode+Kilo
- ai2-asta for literature review NO hallucination on Semantic Scholar backend
- uv in Rust for package management
Science of Evaluation
Science of Post-Training
- Tinker: I’m a beta user of the Tinker API by the ThinkingMachinesLab
- slime: RL Scaling with Megatron-LM for training + SGLang for inference
- Unsloth: Post-training, but preferably when using single GPU
- OpenSloth supports multi-GPU, but unclear about multi-node
Education
<aside>
🐔 PhD
Advisor:
</aside>
<aside>
🐥 MSc. Interdisciplinary Science ETH (CS and Physics)
ETH Zurich, Switzerland (2024-2025)
Thesis Advisor: Prof. Zhijing Jin, Prof. Bernhard Schölkopf
</aside>
<aside>
🐣 BSc. Interdisciplinary Science ETH (CS, Physics and Chemistry)
ETH Zurich, Switzerland (2023-2025)
Thesis Advisor: Prof. Mrinmaya Sachan
</aside>