GSOC week0 - Homepage
GSOC 2021
I am very happy to be accepted this year as a GSOC student in the Redhen lab. Google Summer of Code is a global program focused on bringing more student developers into open source software development. Students work with an open-source organization on a 10-week programming project during their break from school. The organization list for 2021 is listed here
Redhen Lab
The International Distributed Little Red Hen Lab™ is a global big data science laboratory and cooperative for research into multimodal communication. The main idea of the Redhen lab is cooperation. It is not like a traditional university where professors lead students to do research, while, in the Redhen Lab everyone makes contributions to make progress of research in the multimodal communication world. The Redhen Lab home page can be found here if you are interested in this community.
Project Description
Gesture recognition becomes popular in recent years since it can play an essential role in many fields, such as non-verbal communication, emotion analysis, human-computer interaction, etc. We can notice that people use quite a lot of hand gestures in daily life. The research task is to detect hand gestures in raw news videos that are streams of RGB images. I propose a keypoints-based pose tracking system for human tracking and a Transformer and keypoints-based gesture detector for gesture detection to fulfill this task. This structure is composed of a keypoints extractor, a person tracker, and a gesture detector. So the mission has three main parts. The first part is to track people in temporal space. In the second part, for each person, we use their hand keypoints features in temporal space to construct several keypoints sequences. The third part is to use these sequences to make predictions of the existence of gestures. I believe that for gesture detection tasks, both spatial and temporal information is important. So that is why we use the Transformer that can take into account the local shape information of hands in one frame and can also capture the global hand motion information across the frames. As hand gestures in news videos do not have a good definition of label class, we start with only detecting the existence of a hand gesture. The classification can be easily extended. The final evaluation will be done on Redhen’s “Newscape Dataset”.
Weekly Report
Community Bonding Period (From May 17, 2021 till June 7, 2021)
- Week 0 Homepage (May 17 ~ May 23, 2021)
- Week 1 Report: Setup Singularity on HPC (May 24 ~ May 30, 2021)
- Week 2 Report: Transfer Learning for DETR (May 31 ~ Juin 06, 2021)
Coding Period Before the First Evaluation (First evaluation time July 12 - 16, 2021)
- Week 3 Report (Juin 07 ~ Juin 13, 2021)
- Week 4 Report (Juin 14 ~ Juin 20, 2021)
- Week 5 Report (Juin 21 ~ Juin 27, 2021)
- Week 6 Report (Juin 28 ~ July 04, 2021)
- Week 7 Report (July 05 ~ July 11, 2021)
Coding Period Before the Final Evaluation (Final evaluation time August 16 - 23, 2021)
- Week 8 Report (July 12 ~ July 18, 2021)
- Week 9 Report (July 19 ~ July 25, 2021)
- Week 10 Report (July 26 ~ August 01, 2021)
- Week 11 Report (August 02 ~ August 08, 2021)
- Week 12 Report (August 09 ~ August 15, 2021)
Final result (August 31, 2021)
Related Papers
Computer Vision
[1]: Nicolas Carion, Francisco Massa et al. “End-to-End Object Detection with Transformers”ECCV 2020.
[2]: Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh et al. “OpenPose: Realtime Multi-Person 2D PoseEstimation using Part Affinity Fields” IEEE Transactions on Pattern Analysis and Machine Intelligence. Date of publication: 17 July 2019.
Dataset
[1]: Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick et al. “AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions” CVPR 2018.
Links for this project
All articles in this blog adopt the CC BY-SA 4.0 Agreement ,Please indicate the source for reprinting!