I am currently in my fourth year as a Ph.D. student in the Computer Science and Engineering (CSE) department at the University of California, Santa Cruz, advised by Professor Xin Eirc Wang. I earned my Bachelor's degree in Automation from Shandong University, followed by a Master's degree in Robotics from Johns Hopkins University. My research interests predominantly lie in the fields of Embodied AI, Computer Vision, and Natural Language Processing.
- We first build MultipanelVQA benchmark to challenge Large Vision-Language Models with their ability to understand multipanel images, such as web screenshot and posters.
- We are now working on developing specialized AI agent to interact with all kinds of UI, including web, mobile, etc.
- We establish the R2H benchmark, featuring tasks that assess an agent's capabilities based on guiding users or another agent in unknown areas through dialogues.
- We propose two multimodal navigation-helper agents, fine-tuned SeeRee model for multi-modal response generation and employing a large language model in a zero-shot manner, analyzed via benchmarking and human evaluations.
- The challenge aims at advancing conversational AI. University teams are tasked with developing a "socialbot", an AI chatbot that can interact naturally and intelligently with humans on a variety of topics through Amazon's Alexa platform.
- I serve as the team leader of our Athena3 team.
- Our Athena team has secured the second-place in the scientific innovation category of Alexa Prize SocialBot Grand Challenge 5.
Students from the ERIC Lab and the Natural Language and Dialogue Systems Lab are making the fifth appearance in the competition. The goal of the team is to leverage advance algorithms and AI models to build a smart chat bot.
Location: Santa Cruz, California
Faculty advisor: Xin Wang
Team lead: Yue Fan
UC Santa Cruz is one of America's Public Ivy universities and a member of the prestigious Association of American Universities (AAU). The ERIC Lab is led by Prof. Xin Eric Wang and stands for Embodiment, Reasoning, Intelligence, and language Communication. The ERIC Lab’s research topics include natural language processing, computer vision, and machine learning, with an emphasis on building embodied AI agents that can communicate with humans in natural language to perform real-world multimodal tasks.
Location: Santa Cruz, California
Faculty advisor: Xin Wang
- Our AVDN project aims at building drones that understand and follow natural language commands, facilitating hands-free control and accessibility.
- We build AVDN dataset of over 3k recorded dialogs and navigation trajectories and drone simulator with a photorealistic environment.
- We successfully host public AVDN Challenge at the ICCV 2023 CLVL workshop.
- Apply feature selection to find dominant factors among the disease progression.
- Design the extra-data-dimension heatmap toolkit for visualization the patient clusters.
- Use Bayesian Neural Network to classify the progressor with uncertainty.
- Design a data auto-labeling method using inter-frame geometric consistency.
- Bring up a DNN called UAVPatrolNet for Detecting Road.
- Make a dataset for drone autonomous Navigation.
Object Detection in Aerial Image
- I contributed to the teamwork by reproducing existing mature algorithms, e.g. RPN, Faster R-CNN.
- I conducted simulated experiments and adjusted the parameters to realize the optimal training effect; improved the object detection performance on aerial images.
After testing nowadays' state-of-the-art object detection networks, we followed the Faster R-CNN algorithm. However, we made a few adjustments on it to adapt to VisDroneDet dataset. The dataset given consists of many variant-sized proposals which lead to a multi-scale object detection problem. In order to mitigate the impact of relatively rapid changes in sizes of bounding boxes, we added more anchors with large sizes to fit those larger objects and keep small anchors unchanged for detecting tiny objects such as people and cars in long distance. Moreover, the VisDroneDet dataset has an unbalanced object dis- tribution. When testing on validation dataset, we found that classification performance for car is much better than others for the reason that the appearance of cars is more frequent. To alleviate this problem, we masked out some car bounding boxes by hand for pursuing better classification performance.
Control and Monitoring System of DJI Drones through PC
- Designed the control interface on PC with varies functions like “vehicle detection”.
- Developed a system to transmit data between UVA and PC using Qt and DJI SDK.
- Applied the system in city traffic to successfully improve the management efficiency.
Binocular distance measurement with one camera on UVA.
Control of Carbon-free Car
- Developed a circuit board and selected the proper sensor by studying the control system of the carbon-free car.
- Conducted OOP of the machine by designing and applying the control algorithm.
- Awarded the First Prize in Engineering Training Integration Ability Competition of Shandong Province.
-Since the car is powered by the hammer block‘s gravitational potential energy, it is called Carbon-free Car.
-Win the First Prize in Engineering Training Integration Ability Competition of Shandong Province
Assembling and Programming Drones with ROS
- Assembled drones from scratch.
- Designed programs for STM32 flight controller(Pixhawk) running Robot operating system.
- Won the municipal First Prize in China RoboWork Competition.