Recurring Meeting of Cleveland Clinic - Stanford University
Date: June 28, 2016
Time: 1:00 PM EST
Means: BlueJeans (Web Conference)
Attendees:
- Ahmet Erdemir (Cleveland Clinic)
- Joy Ku (Stanford University)
- Mike Wong (Consultant)
Agenda:
- Demo new framework and get feedback (Mike).
Outline plan for getting feedback on uploaded metadata (Mike & Joy).
- Additional feedback on data sharing feedback based on Ahmet’s group’s testing (Ahmet).
- Plan for next step.
Immediate Action Items:
- Ahmet (Cleveland Clinic)
- Evaluate the functionality of the data management and data querying interfaces.
- Joy (Stanford University)
- Explore the possibility to integrate data management and querying system to SimTK infrastructure.
- Mike (Consultant)
- Complete implementation of version control for data management system.
- Implement metadata extraction and querying interface specific to the MULTIS project, relying on sample XML file.
Notes:
- Demo new framework and get feedback.
- Mike implemented version control for the data management system. He decided to use Git rather than Subversion based on Git's plans for versioning of binary files. Ahmet asked about access to old versions of data. Mike demonstrated that this can be done albeit at a low level. Ahmet noted that this may be a good enough as he would not anticipate users to have a need.
- Mike also described the general strategy for the project towards indexing of files, e.g. for search and retrieval. This will allow flexibility for the system to be used for different projects. Mike also noted that indexing with versioning is enabled and he has been working on provenance.
- Mike is interested in developing templates, which can be used for different projects. The data management system operates similar to a file system, i.e. relying on hfs5 file system - disk image enriched with table format. He noted that metadata applies to all sibling files and child files.
- Outline plan for getting feedback on uploaded metadata.
- About quick access to metadata through the data browser; Mike will wait until original developers incorporate customization. He informed Ahmet and Joy that there has been related work in this regard.
- Mike also implemented a strategy to annotate metadata, i.e. a text file (metadata.txt) that can point where the meta data is. In following, a parser of such a file can let the system know where a specimen wiki page is or where to look for a text file in the upload. If not found, it may ask to include one. Ahmet noted that this feature would be useful for MULTIS to point to XML files that contain subject metadata and point to trial files. Ahmet provided a sampel XML file to Mike and Joy.
- Joy asked about the difference between YAML and XML. Mike mentioned the utility of YAML for human interaction and human readability, and XML for configuration and machine readability.
- Additional feedback on data sharing feedback based on Ahmet’s group’s testing.
- Ahmet's group did not perform any tests yet. Following the start of in vivo testing, he anticipates that there will be heavy use and evaluation of the data management system.
- Plan for next step.
- Ahmet mentioned that MULTIS and Open Knee(s) can be used as two projects with potentially different needs to evaluate the generality of data management and querying systems.
- For development of the query system, two strategies can be adapted. One is to use all project specific metadata to generate a project-specific query interface, which can be customized by the project administrator. Another is to use a simplified querying interface that exposes metadata common for all projects, e.g. subject demographics. In future, the latter may be interesting as it may provide the path to consolidate query interfaces for different projects that may offer similar data. Yet, this is not necessarily the priority for the current projects. In either case, usability tests are warranted.
- Ahmet wondered about back up of data. Currently the server is stand alone without any backup, just giving functionality. Joy needs to figure out what framework to implement. One option is to leverage SimTK where there are multiple point of failures to prevent disruption of access. Also, duplicates of data are pushed to back-up systems on- and off-campus, e.g. Amazon Cloud. Another option is to provide a separate, project specific system. In either way, Ahmet asked for sharing of authentication and authorization with SimTK to facilitate system use.