Hao Wang

About

I am a PhD candidate supervised by Lei Chen. My research interests include agentic LLMs, multimodal knowledge base retrieval and question answering, as well as finance-related RAG and time series forecasting. Our financial-AI demo has won multiple awards.

First-Author Research

ICML 2026

Beyond Single-View Indexing: Structure-Aware Multi-View Retrieval for Knowledge-Based VQA

Proposed a coarse-grained retrieval framework for knowledge-based visual question answering, using manifold diffusion and graph-similarity-based deduplication to leverage complementary information across knowledge base modalities and improve retrieval accuracy.
Established the ideal upper bound of single-vector retrieval in multimodal settings; experiments show our method reaches 80–90% of this bound. Compared to SOTA single-vector coarse-grained retrieval, we improve recall by over 10% while adding less than 1% retrieval time.

arXiv Preprint

Momentum-integrated Multi-task Stock Recommendation with Converge-based Optimization

Addressed short-term noise and overfitting in stock prediction with a multi-task training framework using momentum labels and overfitting-aware gradient allocation.
On A-share and US stock markets, the proposed method improves RNN prediction performance and significantly mitigates overfitting during training.

Under Review

Work I — Systematic Study of Semantic Uncertainty and Hallucination Detection

Proposed a systematic framework for semantic uncertainty methods in LLM hallucination detection, covering 19 datasets, 17 semantic uncertainty computation methods, and experiments with 13 LLMs for generation and hallucination detection.
Found that current semantic uncertainty methods still struggle on complex semantic tasks such as summarization and translation. Explored blind spots in black-box LLM hallucination detection and constructed blind-spot datasets. Discovered a scaling law in LLM hallucination via uncertainty quantification: stronger models exhibit greater overconfidence.

Under Review

Work II — RAG Uncertainty Measurement and Context Refinement

Addressed RAG frameworks that rely solely on LLMs to judge context relevance by proposing RAG uncertainty measurement using perturbed data and semantic matrices, along with a two-stage RAG context refinement framework.
Experiments show strong correlation between the proposed uncertainty measure and RAG answer accuracy; the refinement method significantly improves RAG context quality.

Under Review

Work III — Financial Investment RAG Framework

LLMs show conservative biases on finance and investment questions, limiting actionable advice for investors. Proposed a knowledge-graph and step-wise reasoning RAG framework with time-semantics hybrid weight sampling and decoupling final recommendations from the LLM to improve stock prediction performance.

Education

PhD · Data Science and Analytics Mar 2023 – May 2026

The Hong Kong University of Science and Technology (Guangzhou) · Information Hub

Expected graduation: 2027

M.S. · Electrical and Computer Engineering Sep 2019 – Jan 2021

Georgia Institute of Technology · School of Electrical and Computer Engineering

GPA 4.0 / 4.0

B.Eng. · Automation Sep 2015 – Jul 2019

Tianjin University · School of Electrical and Information Engineering

GPA 3.58 / 4.0

Research Experience

Research Assistant Jun 2021 – Jun 2022

The Hong Kong University of Science and Technology · Big Data Institute

Supervised by Lei Chen, Head and Chair Professor, Department of Computer Science and Engineering, HKUST

Conducted data analysis, graph database construction, and implementation of AI-related algorithms.
Led the engineering components of collaborative projects between the Big Data Institute and Kaisa, PetroChina, and Zhipu AI.

Research Assistant Jun 2020 – Feb 2021

Georgia Institute of Technology · Social and Language Technologies (SALT) Lab

Supervised by Diyi Yang, Assistant Professor, Department of Computer Science, Stanford University

Conducted conversational data analysis and implemented algorithms related to computational psycholinguistics.
Responsible for data analysis in experiments and the 7Cups collaborative project.

Research Assistant May 2018 – May 2019

Tianjin University · Institute of Robotics and Autonomous Systems

Supervised by Ming Zeng, Associate Professor and Deputy Director, Institute of Robotics and Autonomous Systems, Tianjin University

Collected and annotated image data, trained image classification models, and developed the software components of an embodied intelligent waste-sorting system.
The automated waste-sorting system is now deployed for public interaction at the Tianjin University History Museum.

Internship

Teaching Assistant May 2020 – Aug 2020

Intelligence Racing (now Hitch Open), United States

Supervised by Allen Yang, Executive Director, Department of Electrical Engineering and Computer Sciences, UC Berkeley

Provided algorithm training for US high school teams competing in autonomous FE racing.

Awards During PhD

Mar 2026 Geneva International Invention Exhibition — Gold Medal with Jury Congratulations
Project: No-code interactive automated quantitative trading system

Dec 2025 Shenzhen International Fintech Competition · 1st Place, AI Track (Team prize: ¥100,000)
Project: No-code interactive automated quantitative trading system

Jul 2023 Hong Kong Fintech Olympiad · Silver Award (Team prize: HK$50,000)