Recurring Meeting of Cleveland Clinic - Stanford University
Date: October 6, 2015
Time: 1:00 PM EST
Means: Conference Call
Attendees:
- Ahmet Erdemir (Cleveland Clinic)
- Joy Ku (Stanford University)
Agenda:
- Personnel support at Stanford University.
- Specifications for management and organization of raw data.
- Specifications for development and delivery of databases for derivative data and models.
- Other.
Immediate Action Items:
- Ahmet (Cleveland Clinic)
- Document the needs for data management system and databases for raw and derivative data and for models.
Provide an illustration of the modeling & simulation workflow.
- Provide a roadmap for launching prototype data management systems and databases.
- Joy (Stanford University)
- Document available systems for data management and databases.
Notes:
- Personnel support at Stanford University.
Joy will check with Scott Delp to evaluate how to effectively provides personnel support for the project. Henry, one of the web developers, is already occupied with other projects. The Stanford University may tap into current consultants to support SimTk or part-time staff at San Francisco State University, who are helping on data related projects.
- Joy anticipates that code development will begin at early January and will likely take two months to launch a simple prototype. Personnel need to be recruited accordingly.
- Specifications for management and organization of raw data.
- Ahmet clarified what he meant by a data management system. Such infrastructure is intended for those who are collecting the data; providing the framework to push the data on the servers and organize through web-based interfaces. Essentially, the data management system will facilitate bringing the raw data to a coherent form for release. Joy was interested in getting a data set to make sense of how much space and processing power is needed - scope of data and metadata.
- Specifications for development and delivery of databases for derivative data and models.
- Ahmet clarified what he meant by databases. Databases will essentially contain raw data (in a release form), derivative data (processed raw data that can be used as inputs to models), and models; all related to each other in a hierarchical manner. Joy recommends to start with a workflow with a specific case then explore its extensibility. This will allow focusing on specific data sets while keeping in mind general purpose abstraction. Ahmet will provide an illustration of a modeling and simulation workflow. Ahmet also noted that moving from raw data to derivative data and to models may not necessarily be automated and tools should be in place for developers to associate information for representations at various levels.
- Other.
- Joy asked for details of a one-year timeline for delivery of various components of data related infrastructure for the project. Ahmet and Joy briefly discussed potential scheduling for specification development, prototyping, and testing - by January 1, 2016 completion of specifications and identification of infrastructure (e.g., software); by April 1, 2016 initial prototype(s); by July 1, 2016 completion of prototype evaluation; by September 1, 2016 beta release. Ahmet will provide a more detailed roadmap on the wiki.
- Joy mentioned the availability of a set of slides about the system they have been building based on Galaxy. Tangelo, a proprietary system and DSpace were mentioned as web-based infrastructure to potentially build upon. Joy will provide potential software systems that can be utilized for the web-based infrastructure.
- Ahmet and Joy also discussed the potential of acquiring doi for data (like Dryad) and different business models to accomplish this.
Upcoming upgrade of SimTk was also discussed. Per Joy, the new system will be more like a hub integrating with other repositories and tools.