Marcus Brown
- Apr 5, 2022
- 2 min read

Data hell

Updated: Apr 26, 2022

Data management in labs is crazy! Take some time and think about how you will name your files and get buy-in from your lab, it will make a huge difference.

Data naming conventions in biomechanics are like the wild west; no matter what lab I visit, it seems like they have come up with their own syntax to keep their data organized. To make matters worse, the data may not even be standardized at the lab level, as individual students seem to come up with their own protocols.

The real question is, who cares? The experimenter defines the file names they want, runs their pipelines on these files, likely contacts C-Motion support a couple times, then produces the plots they need to show their supervisor. Then they write their paper (or papers), submit and, probably, defend. The data is moved to some central location (or maybe not), to proverbially collect dust until it is purged by IT a decade later. So, what’s the problem here?

In my experience, this lack of organization leads to constant duplicate data collections. One would think that at this point, every single mature lab that is more than 10 years old has already collected all their normal data, and incoming students could match that data to theirs easily. This is so rare that in nearly every study I’ve been involved in, matched normal data is collected alongside the study data if it’s needed in the experimental design. But I lament.

To make matters worse, due to the setup time in marker-based motion capture, renaming a few c3d’s after the fact to satisfy the lab requirements may not be the end of the world; data collection time is dominated by setup time. However, when collecting markerless data, setup time is almost nothing, and renaming 800 files that follow different syntaxes for one particular day of data collections may take longer than the collection themselves. I admit to doing this, and in my particular case, re-collecting the data was easier than renaming.

What’s the solution? Like everything in biomechanics, the solution is preparation and standardization among students and the lab itself. When naming files, I include the subject ID, the study, the action being performed (like running or walking), the trial number, the date, and the camera ID if I’m collecting video data. Yes, it would be nice if all of this was embedded in the file rather than using the file name to store this information, but it’s a good start. From here, I duplicate this format in our database, which allows me to link to other metadata like subject age, gender, or injury history. I like to duplicate these each time the subject comes in, because in my experience, we seem to recruit repeated subjects, and their meta-data changes over time.

Buy-in from the students and the lab is so important, and getting these processes in place from the start may allow more publications on the same data, less data renaming, and generally less pain!

Data hell

Recent Posts

Join our mailing list