Ruiyang Zhang

I am an second-year Ph.D. student at University of Macau, working with Prof. Zhedong Zheng. Previously, I obtained bachelor degree of computer science at Fudan Unversity in 2021.

I am instereted in computer vision and related topics. I believe the ultimate goal of computer vision is to make machine see like human. Currenty, my reseach focuses on enhancing the reasoning and tool use capabilities of multimodal agents through reinforcement learning.

Email  /  CV  /  Bio  /  Scholar  /  Github

profile photo

News

  • ๐Ÿ”ฅ 2026.3: Check out our latest work: VSearcher, training multi-turn tool use multimodal search agent in real-world web environment via reinforcement learning!
  • ๐Ÿ”ฅ 2026.1: Our work SketchThinker-R1 is accepted by ICLR'26!
  • ๐Ÿ”ฅ 2026.1: Check out our work: SketchThinker-R1, incentivizing sketch-style reasoning in large multimodal models to improve reasoning efficiency!
  • ๐Ÿ”ฅ 2025.6: Our work UA3D is accepted by ICCV'25!
  • ๐Ÿ”ฅ 2025.3: Check out our work: Uncertainty-o, unveiling uncertainty in Large Multimodal Models (LMMs) in a model-agnostic manner, supporting both Large Comprehension Models and Large Generation Models!
  • ๐Ÿ”ฅ 2024.12: Check out our VL-Uncertainty, leveraging MLLM uncertainty for hallucination detection!
  • ๐Ÿ”ฅ 2024.12: We release the first curated list of reseach on MLLM uncertainty: Awesome-MLLM-Uncertainty!
  • ๐Ÿ”ฅ 2024.7: Our work LiSe is accepted by ECCV'24!
  • 2023.6: Leave Meituan to seek my path into academic world.

Research

VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning
Ruiyang Zhang, Qianguo Sun, Chao Song, Yiyan Qi Zhedong Zheng
Preprint, 2026
[Paper] / [Code]

(1) Build the RL infrastructure required to develop multi-turn tool-using multimodal search agent, including tool implementation, caching mechanism, and RL framework adaptation. (2) Build long-horizon, multi-turn multimodal search agent in real-world web environment via reinforcement learning. Propose a systematic post-training framework for multimodal search agents, including Iterative Injection-based Data Synthesis, Rejection Sampling Fine-tuning, and RL.

SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
Ruiyang Zhang*, Dongzhan Zhou*, Zhedong Zheng
ICLR, 2026
[Paper] / [Code]

We propose a three-stage framework to incentivize sketch-style reasoning in large multimodal models, including Sketch-Mode Cold Start, SketchJudge Reward Model training, and Sketch-Thinking Reinforcement Learning.

Uncertainty-o: One Model-agnostic Framework for Unveiling Epistemic Uncertainty in Large Multimodal Models
Ruiyang Zhang, Hu Zhang, Fei Hao, Zhedong Zheng
Preprint, 2025
[Website] / [Code]

We propose a unified framework for uncertainty estimation of Large Multimodal Models, and harness uncertainty for hallucination detection, hallucination mitigation, and uncertainty-aware CoT.

VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation
Ruiyang Zhang, Hu Zhang, Zhedong Zheng
Preprint, 2024
[Website] / [Paper] / [Code]

We leverage semantic-equivalent prompt perturbation for refined LVLM uncertainty estimation, thereby faciliating more accurate hallucination detection.

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection
Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng
ICCV, 2025
[Paper] / [Code]

We harness uncertainty learning to mitigate the negative influence of inaccurate pseudo labels in unsupervised 3D object detection. Specifically, we estimate coordinate level uncerainty and then utilize the learned uncertainty to regularize the self-learning process.

Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng
ECCV, 2024
[Paper] / [Code] / [Blog]

We enhance Unsupervised 3D Object Detection via LiDAR and 2D fusion to improve detecting ability of far and small objects.

Internship

Research Intern @ IDEA
2025.10 -
Shenzhen, China
Topics: Developing long-horizon multimodal search agent via reinforcement learning (RL)
Project: VSearcher
Research Intern @ Shanghai AI Lab
2025.5 - 2025.9
Shanghai, China
Topics: Leveraging RL to incentivize sketch-style reasoning in large multimodal models
Project: SketchThinker-R1

Work Experience

  • 2021.07 - 2023.06: Backend Development Engineer, Meituan, Shanghai

Education

  • 2024.09 - Now: Ph.D. student in Faculty of Science and Technology, University of Macau
  • 2017.09 - 2021.06: Undergraduate student in School of Computer Science, Fudan University

Academic Service

  • Conference Reviewer:ICLR 2025,CVPR 2025
  • Journal Reviewer:TOMM

Teaching

  • University of Macau CISC3021: Multimedia Forensics and Security (TA)

Honor and Award

  • 2019: Second Prize of China Undergraduate Mathematical Contest in Modeling (CUMCM)
  • 2018&2019: Third Class Scholarship for Outstanding Students, Fudan University
  • 2016: Province First Price of Chinese Mathematical Olympiad (CMO)
  • 2014&2015: First Price of National Olympiad in Informatics in Provinces (NOIP)

Interest

  • I enjoy playing table tennis๐Ÿ“, a member of school team in my primary school hhh
  • Like walking around parks๐Ÿž๏ธ

Template from Jon Barron's website.