Training LSTM for different motions
Posted: Thu Sep 07, 2023 3:39 pm
Hi everyone,
I'm pretty new to the concept of using LSTM networks to augment the set of 3D keypoints... I'm trying to understand how I could train a new LSTM model for this purpose to look at a different set of motions that isn't currently supported in OpenCap (read: motions that the current iteration of OpenCap's LSTM network isn't trained on).
I've tried to look at the marker-augmentation repo: https://github.com/antoinefalisse/marker-augmentation and the opencap-core repo to understand exactly how to develop a LSTM model for this purpose, but it is still unclear to me how to format the marker data as input for the model.
I read in the OpenCap paper that "we split the data into non-overlapping time-sequences of 0.5s, and added Gaussian noise (standard deviation: 0.018 m) to each time frame of the video keypoint positions based on a range of previously reported keypoint errors. Finally, we standardized the data to have zero mean and unit standard deviation. We used the resulting time-sequences to train the networks."
My questions are:
1. Does one motion file (.trc) from a given subject correspond to a resultant 30-by-59 array (19 3D markers + height and weight)? Does this make up one "time-sequence" (i.e., does each time-sequence correspond to an individual subject / motion)?
2. What is the structure of the feature/response arrays for the LSTM model? Is this a 30-by-59-by-# of time-sequences array? Do you have any programmatic examples of the structure of that array?
3. Is noise added for every time sequence/ does the added noise for every time sequence have a standard deviation of 0.018?
4. Is each time-sequence standardized to have zero mean/unit standard dev, or is the standardization performed across all time-sequences?
Please let me know if you can provide any insight.
Many thanks,
Brody
I'm pretty new to the concept of using LSTM networks to augment the set of 3D keypoints... I'm trying to understand how I could train a new LSTM model for this purpose to look at a different set of motions that isn't currently supported in OpenCap (read: motions that the current iteration of OpenCap's LSTM network isn't trained on).
I've tried to look at the marker-augmentation repo: https://github.com/antoinefalisse/marker-augmentation and the opencap-core repo to understand exactly how to develop a LSTM model for this purpose, but it is still unclear to me how to format the marker data as input for the model.
I read in the OpenCap paper that "we split the data into non-overlapping time-sequences of 0.5s, and added Gaussian noise (standard deviation: 0.018 m) to each time frame of the video keypoint positions based on a range of previously reported keypoint errors. Finally, we standardized the data to have zero mean and unit standard deviation. We used the resulting time-sequences to train the networks."
My questions are:
1. Does one motion file (.trc) from a given subject correspond to a resultant 30-by-59 array (19 3D markers + height and weight)? Does this make up one "time-sequence" (i.e., does each time-sequence correspond to an individual subject / motion)?
2. What is the structure of the feature/response arrays for the LSTM model? Is this a 30-by-59-by-# of time-sequences array? Do you have any programmatic examples of the structure of that array?
3. Is noise added for every time sequence/ does the added noise for every time sequence have a standard deviation of 0.018?
4. Is each time-sequence standardized to have zero mean/unit standard dev, or is the standardization performed across all time-sequences?
Please let me know if you can provide any insight.
Many thanks,
Brody