AboutDownloadsDocumentsForumsSource CodeIssues

We use synthetic video and IMU data generated from the AMASS datasets (n = 500 subjects) to train deep learning models that can predict 3-D motion from noisy videos and/or uncalibrated IMUs.


Marker-based motion capture, considered the gold standard in human motion analysis, is expensive and requires trained personnel. Advances in inertial sensing and computer vision offer new opportunities to obtain research-grade assessments in clinics and natural environments. A challenge that discourages clinical adoption, however, is the need for careful sensor-to-body alignment, which slows the data collection process in clinics and is prone to errors when patients take the sensors home. We trained deep learning models that estimate human movement from noisy video data (VideoNet), inertial data (IMUNet), and a combination of the two (FusionNet), obviating the need for careful calibration.

Data: We use a public dataset (https://amass.is.tue.mpg.de) to train the models. We will post the test data soon.

Code: https://github.com/CMU-MBL/FS_Video_IMU_Fusion

Citation: Shin, Soyong, Zhixiong Li, and Eni Halilaj. "Markerless Motion Tracking with Noisy Video and IMU Data." IEEE Transactions on Biomedical Engineering (2023).

Feedback