Hai Zhang  
zhanghai12138@tongji.edu.cn ·  (+86) xxx-xx-xx-xxx ·  betray12138 (the same as zhihu)  
PERSONAL STATEMENT  
A dedicated, practicing, and capable researcher in reinforcement learning (RL), specifically in model-based RL,  
offline RL and context-based meta RL, with absorbing more than 100 papers in related fields (see the details in  
zhihu). A confident and cohesive member in collaborative work, brilliant at communication and organization.  
A pragmatic and skillful back-end development engineer before diving into RL, with the mastering of C++,  
PYTHON, Golang, and some common techniques.  
Education  
TONGJI UNIVERSITY, SHANGHAI, CHINA  
2022.09 – up to now  
2018.09 – 2022.06  
MASTER(without examination), COMPUTER SCIENCE AND TECHNOLOGY  
with GPA of more than 85  
TONGJI UNIVERSITY, SHANGHAI, CHINA  
BACHELOR, COMPUTER SCIENCE AND TECHNOLOGY  
with GPA of more than 92, general ranking of 5.12%  
Research experience  
*
means co-first author, † means corresponding author  
ZHEJIANG LAB HANGZHOU, ZHEJIANG, CHINA  
RESEARCH INTERN supervised by Lanqing Li  
2023.07 – 2024.01  
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learn-  
ing. Lanqing Li*, Hai Zhang*, Xinyu Zhang, Shatong Zhu, Junqiao Zhao†, and Pheng-Ann Heng. Under  
review in ICML 2024  
TIEV LAB SHANGHAI, CHINA  
2022.09 – up to now  
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization. Hai Zhang,  
Hang Yu, Junqiao Zhao†, Di Zhang, Chang Huang, Hongtu Zhou, Xiao Zhang and Chen Ye. In NeurIPS  
023  
2
Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery. Xiao Zhang, Hai Zhang,  
Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye†, Junqiao Zhao†. In IEEE Robotics and Automation  
Letters 2023 (Present in ICRA 2024)  
Projects  
Distributed Complete Vehicle Cloudization  
2022.05 – 2023.01  
Invention Patent (Submitted, Patent Number: 202310899331.2)  
Affiliated with the National Key R&D Program  
Core Technique: distributed framework architecture, k8s deployment  
NIO, SHANGHAI, CHINA  
BACKEND DEVELOPMENT ENGINEER  
2021.10-2022.03  
LIFELONG CYCLE MANAGE SYSTEM (MongoDB replica set, MongoDB atomic operation, Nginx re-  
verse proxy, docker deployment, Redis distributed lock)  
EDGE-SIDE DATA UPLOAD AND DISPLAY (Protobuf, web-socket, TTL index, split-table storage)  
DATA SYNCHRONIZATION SERVICE (Kafka consumer group, full and increment synchronization)  
5T MAGNITUDE DATA MIGRATION AND BACKUP (MongoDB shared set, shared key setting, index  
setting, MongoDB cluster construction)  
Unknown Environment Exploration and Application Device Based on Deep Reinforcement  
Learning  
2020.05-2021.03  
Innovation and Entrepreneurship Program for SHANGHAI University Students.  
Responsible for accomplishing end-to-end vehicle driving on the CARLA simulator using representation  
learning combined with reinforcement learning algorithms such as SAC, TD3, and PPO.  
Responsible for training visual models to work with the ROS system to fulfill target detection.  
Competitions  
RLChina Intelligent Agent Challenge Nonin Spring Season Curling Challenge  
Second Place in finals, Sixth Place in total scores  
Responsible for optimizing PPO and rule-based intelligence to complete curling strikes in a POMDP envi-  
ronment  
WAIC: Meta-verse Lights Up Autonomous Driving, AI Simulation Driving Competition  
Second Place (Unique), won 40 thousand RMB  
Responsible for perceptual information processing and code implementation of the decision state machine  
Intel Cup National College Students Embedded System Invitational Competition: All Sight  
Security Eye - Intelligent Suspect Detection System  
National Second Prize (Leader)  
Responsible for the overall architecture of the project implementation as well as the full-view image stitching  
Others  
Good at badminton, hiking and photography; enthusiastic in travelling  
Outstanding Undergraduate Thesis  
Scholarship for outstanding students for three consecutive years