Contents

Current State
Data Management Overview
PART I: Raw Data
PART II: Derivative Data
Sample Data

Current State

In vivo data is available at https://multisbeta.stanford.edu/.

In vitro data management site is currently undergoing testing (https://mobilizealpha1.stanford.edu/).

Please see /Discussion for details on the current state of the data management specifications/infrastructure.

Data Management Overview

Target Outcome

1. Automatically push data from its origin to a central data analysis computer. After analysis, data will be manually pushed to to the Stanford and MIDAS sites.

Local Data Management

Stages

Stage 1: Automatically push data from specified folders to an established server
Stage 2: Push data from server to specified folder on data analysis computer.
Stage 3: Manually initiate analyzed data push to Stanford, MIDAS, and back to the Clinic server upon completion of analysis.

A specified folder on each computer will be automatically pushed to the server.

Folder Configuration

In Vivo

Raw Data Collection:

MULTIS001-1
- Ultrasound
- Data
- Configuration

Data Analysis folders appended:

File Association Folders
- DataOverview
- FileAssociation
- TimeSynchronization
Tissue Thickness Folders
- Tissue Thickness
  - UltrasoundManual
    - ThicknessPNG

In Vitro - Ultrasound

Raw Data Collection:

CMULTIS001-1
- Ultrasound
- Data
- Configuration
- MRI (segmentations will be added to this directory)
- CT (segmentations will be added to this directory)

Data Analysis folders appended:

File Association Folders
- DataOverview
- FileAssociation
- TimeSynchronization
Tissue Thickness Folders
- Tissue Thickness
  - UltrasoundManual
    - ThicknessPNG
  - CTManual_zSlice
    - ThicknessPNG
  - MRManual_zSlice
    - ThicknessPNG
Registration Folders
- Registration
  - Registration Instance (i.e. R01, R02, etc.)
    - MarkerSTLs
      - CT
      - MRI
    - QualityCheck
    - USPositions
      - CT
      - MRI

In Vitro - Surgical Tools

** TBD: Work in progress Instrumented Surgical Tools:

MULTIS-CIS-000-1
- Data
- Configuration
- Video
- Vic3D

PART I: Raw Data

Target Outcome

1. To build a web-based data management for organization and dissemination of raw data.

Proposed web-based data management and databases

Use Cases

1. Upload and reorganization of raw MRI files (DICOM) format [project administrator actions]

Assume files are organized in a directory structure off-line, which is to be uploaded and replicated in the web-based system. Implement as a drag-and-drop.
Meta-data will be automatically derived from wiki page or readme, looking for certain keywords (like gender, age). User can update meta-data after import (or enter meta-data if none is automatically imported). Meta-data associated with all child files. Meta-data may come from file or folder name also, maybe have user include key for filenames as part of readme.
Once files are in the web-based system, they can be moved around to different folders or deleted. New files and folders of files can be added to the existing directory structure.
Folders can be added, deleted, or moved.

2. Dissemination of raw data [data user actions]

Example queries would require searches based on gender, age, data type (in this MRI data but others may only be interested in mechanical data),...is this sufficient to start?
1. Gender
2. Age
3. Health condition (arthritic or not)
4. Data type, e.g., if you want to analyze tissue thickness across a population, then you want to find all datasets with MRI data.
5. Cadaver vs. human
6. Licensing restrictions
Should queries work across studies or within a study? In other words, will you be uploading all your data in as one study so searches are just done within your study? Start with queries within a study. Later when we have more studies, we can think
Have ability to "select all" or check off boxes for which data from the specific subjects to download. Download the data as a zip file containing the files in a directory structure.
Provenance information should be included somehow. At a minimum, this should include the date (if available, revision number) and location from which the data was obtained. Part of download as a README file.
Also need to provide licensing info with downloaded zip file.
In moving files from storage to dissemination stage, want to add license info perhaps to just a portion of it.

Minimum Required Functionality

Desired End-User Functionality

Desired System Specifications

1. Ability to handle large datasets. In this project, the expectation is that there will be about 1GB of data per subject.

2. Existing functionality for file management would be ideal, so this wouldn't need to be recreated.

Preliminary Work

Workflow

File Browser Software Analysis

Requirements

- Licensing: BSD or MIT or equivalent Open Source license that allows for commercialization

Candidates

jquery.fileTree (Demo: http://labs.abeautifulsite.net/archived/jquery-fileTree/demo Code: http://www.abeautifulsite.net/jquery-file-tree/)
- Pros: Simple integration, clean interface, responsive, extensible framework
- Cons: Few features, really old (8 years)
elFinder (Demo: http://hypweb.net/elFinder-nightly/demo/2.1 Code: https://github.com/Studio-42/elFinder)
- Pros: Complete look-and-feel, icon preview, responsive under heavy use
- Cons: Lots of features therefore complicated and difficult to extend
Fancy Tree (Demo: http://wwwendt.de/tech/fancytree/demo Code: https://github.com/mar10/fancytree)
- Pros: Bootstrap integration, lots of plugins
- Cons: Also lots of features therefore complicated and difficult to extend, however may have all we need
Girder (Sample collections: https://data.kitware.com/#collections Main Site: https://data.kitware.com/)

Feature Requests

May 6, 2016

Version control for data management - (update 6/14/2016) implemented using git. Will include feature to revert to other versions along with all updates needed to support that. Will be important to update associated metadata and also include information in query about what version/version date used to generate output.
If folder doesn't have metadata, ask user for URL - (update 6/14/2016) Will support having this in a YAML file
Provide feedback on parsing of metadata to show metadata success and request additional metadata from user (update 6/14/2016) Currently no required fields, but checks need to be done for structural problems and flagged for user, either mark-up of YAML file or a page for metadata parsing corrections
Add contextual menu item list to bring up the metadata editing UI - (update 6/14/2016) Could a YAML editor suffice?
Fix zip file decompression in file manager - (update 6/14/2016) Still relevant?
Command-line interface for data query and data import
Two projects with different query UI for different intentions (two different views) with the same data store
Data provider selects metadata fields to expose to query UI for end-user
Store download parameters and user ID to the database to create reports for usage analysis and funding agencies

Outstanding Questions

1. Is the ability to set permissions at a study level sufficient? Need it at a more granular level

2. Should we create ability to import meta-data from wiki page? Yes, eventually

3. What fields do you think users would want to query on? See above

4. Will you be uploading all your data in as one study so searches are just done within your study? Yes.

5. What provenance information is required? Thoughts on what is the best way to include that in the data that is downloaded - as an additional README? in the header of each file? other ideas? This decision will affect the handling of derivative data and associating it back to the parent data when it is uploaded. It's really tricky to incorporate the info in the header of every file. Better to advise and strongly encourage people to keep that README around.

PART II: Derivative Data

Target Outcome

To build web-based databases for organization and dissemination of derivative data and their association to raw data.

Use Cases

Minimum Required Functionality

Desired End-User Functionality

Desired System Specifications

Preliminary Work

Check FEA-workflow.pdf , which provides a detailed workflow for finite element analysis in biomechanics. Various steps within the workflow indicate the need for a relevant platform i) to process raw data to bring it into a form readily usable for modeling & simulation, ii) to build relational databases to organize and disseminate derivative data. Also refer to FEA-anatomy.pdf .

Sample Data

This zipped folder contains a subset of in-vivo data that has been collected to be used for final modifications to the data management interface.

MULTIS006-1 Contains only raw data files
- Configuration Folder (sensor and state configuration files and subject xml)
- Data Folder (force data .tdms format)
- Ultrasound Folder (ultrasound images dicom format .ima extension)
MULTIS007-1 Contains data after file association
- Previous data plus...
  - AnalysisPNG folder (images showing selected frames from ultrasound with corresponding force data - accepted trials only)
  - FileAssociationPNG folder (images showing Rwaves and adjusted Rwaves to get time synchronization between ultrasound and force data)
  - TimeSynchronization folder (individual xml files for each trial storing the deltaT information)
  - MULTIS007-1readme.txt (file association summary information)
  - TimeSynchronization.txt (time synchronization of manually matched trials)
MULTIS008-1 Contains data after thickness analysis
- Previous data plus...
  - Analysis folder (individual thickness measurements for each trial stored in xml file and PNG folder)
    - PNG folder contains image of first frame analyzed with boundaries marked and also another image showing force vs. thickness relationship)