Ruiyang Zhang

I am an second-year Ph.D. student at University of Macau, working with Prof. Zhedong Zheng. Previously, I obtained bachelor degree of computer science at Fudan Unversity in 2021.

I am instereted in computer vision and related topics. I believe the ultimate goal of computer vision is to make machine see like human. Currenty, my reseach focuses on enhancing the reasoning and tool use capabilities of multimodal agents through reinforcement learning.

Email / CV / Bio / Scholar / Github

News

🔥 2026.3: Check out our latest work: VSearcher, training multi-turn tool use multimodal search agent in real-world web environment via reinforcement learning!
🔥 2026.1: Our work SketchThinker-R1 is accepted by ICLR'26!
🔥 2026.1: Check out our work: SketchThinker-R1, incentivizing sketch-style reasoning in large multimodal models to improve reasoning efficiency!
🔥 2025.6: Our work UA3D is accepted by ICCV'25!
🔥 2025.3: Check out our work: Uncertainty-o, unveiling uncertainty in Large Multimodal Models (LMMs) in a model-agnostic manner, supporting both Large Comprehension Models and Large Generation Models!
🔥 2024.12: Check out our VL-Uncertainty, leveraging MLLM uncertainty for hallucination detection!
🔥 2024.12: We release the first curated list of reseach on MLLM uncertainty: Awesome-MLLM-Uncertainty!
🔥 2024.7: Our work LiSe is accepted by ECCV'24!
2023.6: Leave Meituan to seek my path into academic world.

Research

	VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning Ruiyang Zhang, Qianguo Sun, Chao Song, Yiyan Qi Zhedong Zheng Preprint, 2026 [Paper] / [Code] (1) Build the RL infrastructure required to develop multi-turn tool-using multimodal search agent, including tool implementation, caching mechanism, and RL framework adaptation. (2) Build long-horizon, multi-turn multimodal search agent in real-world web environment via reinforcement learning. Propose a systematic post-training framework for multimodal search agents, including Iterative Injection-based Data Synthesis, Rejection Sampling Fine-tuning, and RL.
	SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models Ruiyang Zhang, Dongzhan Zhou, Zhedong Zheng ICLR, 2026 [Paper] / [Code] We propose a three-stage framework to incentivize sketch-style reasoning in large multimodal models, including Sketch-Mode Cold Start, SketchJudge Reward Model training, and Sketch-Thinking Reinforcement Learning.
	Uncertainty-o: One Model-agnostic Framework for Unveiling Epistemic Uncertainty in Large Multimodal Models Ruiyang Zhang, Hu Zhang, Fei Hao, Zhedong Zheng Preprint, 2025 [Website] / [Code] We propose a unified framework for uncertainty estimation of Large Multimodal Models, and harness uncertainty for hallucination detection, hallucination mitigation, and uncertainty-aware CoT.
	VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation Ruiyang Zhang, Hu Zhang, Zhedong Zheng Preprint, 2024 [Website] / [Paper] / [Code] We leverage semantic-equivalent prompt perturbation for refined LVLM uncertainty estimation, thereby faciliating more accurate hallucination detection.
	Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng ICCV, 2025 [Paper] / [Code] We harness uncertainty learning to mitigate the negative influence of inaccurate pseudo labels in unsupervised 3D object detection. Specifically, we estimate coordinate level uncerainty and then utilize the learned uncertainty to regularize the self-learning process.
	Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng ECCV, 2024 [Paper] / [Code] / [Blog] We enhance Unsupervised 3D Object Detection via LiDAR and 2D fusion to improve detecting ability of far and small objects.

Internship

	Research Intern @ IDEA 2025.10 - Shenzhen, China Topics: Developing long-horizon multimodal search agent via reinforcement learning (RL) Project: VSearcher
	Research Intern @ Shanghai AI Lab 2025.5 - 2025.9 Shanghai, China Topics: Leveraging RL to incentivize sketch-style reasoning in large multimodal models Project: SketchThinker-R1

Work Experience

2021.07 - 2023.06: Backend Development Engineer, Meituan, Shanghai

Education

2024.09 - Now: Ph.D. student in Faculty of Science and Technology, University of Macau
2017.09 - 2021.06: Undergraduate student in School of Computer Science, Fudan University

Academic Service

Conference Reviewer:ICLR 2025,CVPR 2025
Journal Reviewer:TOMM

Teaching

University of Macau CISC3021: Multimedia Forensics and Security (TA)

Honor and Award

2019: Second Prize of China Undergraduate Mathematical Contest in Modeling (CUMCM)
2018&2019: Third Class Scholarship for Outstanding Students, Fudan University
2016: Province First Price of Chinese Mathematical Olympiad (CMO)
2014&2015: First Price of National Olympiad in Informatics in Provinces (NOIP)

Interest

I enjoy playing table tennis🏓, a member of school team in my primary school hhh
Like walking around parks🏞️

Template from Jon Barron's website.