Review of summary document providing Committee overview

Ahmet Erdemir · Post by **Ahmet Erdemir** » Sun Apr 21, 2013 2:59 pm

Dear Committee Members, Advisory Council, and other interested parties,

We have recently completed a summary document providing our initial vision on what the Committee on CPMS in Healthcare is about and how it will operate. The document can be found here.

If you like, please provide your feedback in here. As we move along and shape the Committee and the Advisory Council, we will likely implement your comments into this document.

Best,

Ahmet

C. Anthony Hunt · Post by **C. Anthony Hunt** » Sun Apr 21, 2013 3:45 pm

I have several comments that hope will be useful. I'll take them one at a time, but not in order of the slides, starting with Need: Clinical urgency.

"There is a pressing need to utilize computational modeling & simulation to support clinical research and decision making in healthcare."

There is a pressing need to use M&S? Why? Suggestion: state the problem and then claim that M&S is a solution. I understand that it is not easy to state the problem in just a few words. However, you/we can work on statements. The problem is the "need."

Until then, it may be more direct to devise a statement like, "M&S offers solutions to …"

What are the problems that you have identified?

"There is a gap in mechanisms or processes for translating computational models to the clinical practice."

I'm not sure I understand the issue. To what computational models do you refer. Where are they? Are you referring to models published in research journals, models of the type that we've seen described in posters at the annual MSM meeting?

I'll pause for feedback and answers.

Ahmet Erdemir · Post by **Ahmet Erdemir** » Sun Apr 21, 2013 4:24 pm

Hi Tony, thank you for your comments. I tried to respond to them below:

I agree that for the sake of condensing the information in to a few words, we may have ended up being vague. As a group we should definitely clarify what "the pressing need" is and how modeling and simulation can offer a solution. From my perspective (orthopaedic biomechanics, musculoskeletal and tissue; implants, etc) I feel that the pressing need emerges from two conditions: 1) individualized medicine - Can modeling and simulation provide the means to increase the accuracy of how we deliver healthcare? Which intervention works for what type of patient? 2) expedited delivery of healthcare products - Can modeling and simulation increase the efficiency we design implants and simplifies the efforts for their regulation? Until we all iterate back and forth on the need and the premise, I believe a statement like the one you propose will be more appropriate:
"Modeling and simulation offers the capabilities to potentially expedite and increase the efficiency of healthcare delivery by supporting clinical research and decision making"
About the gap, I think we are generalizing the issue without realizing that there are indeed models translated to clinical practice. On the other hand, acceptance of published research models (to be used in clinical care) and the rate research models move into clinical realm seems lagging behind (at least for my discipline). I am just wondering if new mechanisms and processes are needed to address this essentially valley of death problem in translational research. For now, we may want to be at least be specific and refer to the models as "research models".

Ahmet

Jacob Barhak · Post by **Jacob Barhak** » Mon Apr 22, 2013 2:09 pm

Hi All.

At your request, here are some suggestions to supplement the slides. These are raw idea that may need refinement. Please feel free to adopt/disregard/change, or keep for possible future consideration. If all members add thoughts, this may get out of hand pretty quickly, so please be selective - you can always keep those ideas and texts for the future and I will probably add more with time.

1. Under the Clinical urgency slide please consider adding the following bullet

Current computing technology can now replace many human tasks and decisions. It is important that the ability of computers is neither exaggerated nor diminished. It is important to gage this transition of tasks from human to machine in a manner that will be most efficient while diminishing negative phenomena. Establishing the credibility level of models will help smooth this transition.

2. Please consider the following topic under the slide: Charge
Identify and promote innovative game changing modeling technologies

2a. If you add this slide you will need another slide with the above title and the following bullets:

Engage with modelers and accumulate technologies in a list

Identify technologies that are successful in one modeling field and check if those are applicable in other modeling fields.

Assess possible benefits of each technology from certain to highly speculative.

Disseminate the list of technologies and findings with the modelers and modeling community.

3. Under the slide titled Propose guidelines and procedures for credible practice
Endorse methods that directly tie claims to results

4. Under the slide titled Promote good practice:

Reward Self Criticism: Suggest methods and promote environments that allow admitting failure to speed up the development cycle.

Below you can find additional notes below that expand the above points in more details.

I hope this is opens possibilities.

Jacob

######### More Details ###########

The idea regarding the transition from human to machine is something I am seeing a lot lately - computers are much better is processing certain data and have access to much more data than humans. Today it is impossible for a human to read all publications on Wikipedia. Yet computers can easily store and process it. The same for medical articles, no doctor can read all of these, yet a computer can and a computer model can help a doctor make a decision when human knowledge is insufficient. Marty Kohn was talking about this at the MSM meeting we attended. And recently there are voices that speak on this move of computers in harsher terms of computers replacing humans. In many fields humans still make decisions rather than machines, yet the process of tasks moving from humans to machines has started already and gaining momentum. Therefore we have to be ready with tools and methods to gage how well computer models perform tasks to help the public and decision makers understand when it is ok for a computer to take over and when there is a way to go before that happens.

In the Mount Hood challenge in Malmo a similar subject has been raised when discussing model credibility - in that time my position was that we have to see if the new modeling methods are better than what we already have today. This was a few years ago, my position has not changed. We need to gage models and use competition to do so, blind competition seems to be a good mechanism. If needed, human and machine competition.

Biological systems are less clear than mathematical systems, and people who work with those are used to their ambiguity. In many cases this ambiguity is conveyed forward by the specialist raher than resolved. On the other hand, it is also common for developers to overpromise and deliver shortly. The combination of the above two is very negative since the broad understanding of biologists can be used to accept the unrealized dreams of developers. Therefore to keep overly enthusiastic developers at bay it is important to have simple tools that check that whatever was promised by a system is actually implemented and works as promised.

Another issue I encounter a lot is the fact that humans are afraid of exposing their errors - for good reason - there are penalties. Never the less, the faster an error is caught the faster it is corrected, and therefore the system improves. If the human environment will be supportive to expose errors and fix them quickly we will be in a better situation. The software world knows this and there are paradigms such as Test Driven Development (TDD). I am not familiar with Test Driven Modeling – perhaps since it is harder to achieve, yet I know from experience that steps in this direction are possible. Perhaps imposing stricter quality regulations and allowing a grace period for correction without penalty would facilitate improved model credibility - this goes well with versioning. It may not be possible to resolve liability issues in a License Disclaimer as is common the software world, so there may be a need to be imaginative there.

Below you can find unstructured phrases you may find useful at some slides:

Innovation vs. traditional conservative approach
Blind testing
Competition
How to address all agencies in IMAG?
Establish importance of modeling and simulation
Credibility compared to human practice
High human responsiveness promotes efficiency since humans are typically the bottleneck

Martin Steele · Post by **Martin Steele** » Tue Apr 23, 2013 1:50 pm

An easier to read version of these comments are in the attached Word file (Steele Comments on CPMS Summary - 04-23-2013.docx)

Chart Comment
2 The phrase “desired certainty level” in this definition seems vague
Last phrase: I’m not sure reproducibility is a requirement of credibility
FYI – 7009 definition of “Credibility” – The quality to elicit belief or trust in M&S results
2 – 6 See document: “Exploring Definitions.docx”
5 Definition of “Simulation”
This definition states that a simulation is merely the use of a model to provide results. While I’ve previously seen similar definitions, I find them limited. Models provide a representation of a system. Using a model provides results about a system. A simulation, on the other hand, uses models to provide a behavioral representation of a system. Therefore, a simulation provides a dynamic and behavioral representation of a system with complex and potentially probabilistic/stochastic elements and interactions.
7 “subject matter experts tend to have their own interpretation of credibility assessment” – this was happening in NASA, too, when the development of NASA-STD-7009 started development
To either identify or develop “good practices” was a prime motivation coming out of the CAIB.
8 This description of multi-scale analysis of “subsidiary models” may also be known as sub-models, linked models, coupled models, integrated models, surrogate models, and meta-models
8 Comment only: One of the key points of the NASA Standard for Models & Simulations is the insistence on uncertainty analysis and reporting.
8 Add to list at bottom: Sensitivity Analysis (aka, Results Robustness)
8 What is meant by “mark-up development”?
10 Comment only: Bullets 1, 2, & 4 were directives for developing the NASA Standard for Models & Simulations
11 Also, add:
Abstraction – The process of selecting the essential aspects of a reference system to be represented in a model or simulation while ignoring those aspects that are not relevant to the purpose of the model or simulation (from NASA-STD-7009, adapted from Fidelity ISG Glossary, Vol. 3.0).
Assumption – Asserting information as a basis for reasoning about a system. In modeling and simulation, assumptions are taken to simplify or focus certain aspects of a model with respect to the RWS or presume distinct values for certain parameters in a model. Any modeling abstraction carries with it the assumption that it does not significantly affect the intended uses of the M&S (from NASA-HDBK-7009, draft).
Certification (as intended on Chart 24, last bullet)
Credibility (see comment for Chart 2)
Intended Use:
Referent: Data, information, knowledge, or theory against which simulation results can be compared [this is the definition in NASA-STD-7009 as adapted from ASME V&V 10].
Robust (or Robustness): tbd (associated with sensitivity – results that are robust are not sensitive to relatively slight variations in model parameters; conversely, results that are sensitive to variations in model parameters are not robust)
12 what is meant by “translational research directions”?
12 “model evaluation techniques” – the M&S Assessment Worksheet, in the NASA Handbook for M&S (Draft), may be useful to consider
12 what is meant by “novel translational workflows”?
21 just to be complete, consider adding: scientists, physicists, physicians, pharmacists

Jacob Barhak · Post by **Jacob Barhak** » Tue Apr 23, 2013 2:55 pm

Hello Martin,

Your post should start a side discussion - so you will notice a new subject for this thread. Hopefully this would interest the committee.

Can you give an example where a computational system is credible yet not reproducible?

Does this example hold considering newer technology?

These days computational systems become very complicated so there is place for more errors to slip through.

On the other hand, memory and storage today is cheap and computing power abundant. Moreover, a lot of open source code is available and systems such as virtual machines can easy be used to capture the state of a machine. With open source code it is possible to reproduce results and allow full inspection of the system.

If a computing system cannot reproduce results while having such tools - it is probably not credible. Even computational systems that depend on randomness such as Monte-Carlo simulation can be reproduced if the random seed/random state is recorded.

If older definitions of credibility for computational systems were human belief oriented, newer definitions of credibility should be aware of technological advances and include stricter credibility measures - these can now be supported.

Enforcing reproducibility is a great way to increase quality.

Jacob

mjsteele wrote:An easier to read version of these comments are in the attached Word file (Steele Comments on CPMS Summary - 04-23-2013.docx)

Chart Comment
2 The phrase “desired certainty level” in this definition seems vague
Last phrase: I’m not sure reproducibility is a requirement of credibility
FYI – 7009 definition of “Credibility” – The quality to elicit belief or trust in M&S results
2 – 6 See document: “Exploring Definitions.docx”
5 Definition of “Simulation”
This definition states that a simulation is merely the use of a model to provide results. While I’ve previously seen similar definitions, I find them limited. Models provide a representation of a system. Using a model provides results about a system. A simulation, on the other hand, uses models to provide a behavioral representation of a system. Therefore, a simulation provides a dynamic and behavioral representation of a system with complex and potentially probabilistic/stochastic elements and interactions.
7 “subject matter experts tend to have their own interpretation of credibility assessment” – this was happening in NASA, too, when the development of NASA-STD-7009 started development
To either identify or develop “good practices” was a prime motivation coming out of the CAIB.
8 This description of multi-scale analysis of “subsidiary models” may also be known as sub-models, linked models, coupled models, integrated models, surrogate models, and meta-models
8 Comment only: One of the key points of the NASA Standard for Models & Simulations is the insistence on uncertainty analysis and reporting.
8 Add to list at bottom: Sensitivity Analysis (aka, Results Robustness)
8 What is meant by “mark-up development”?
10 Comment only: Bullets 1, 2, & 4 were directives for developing the NASA Standard for Models & Simulations
11 Also, add:
Abstraction – The process of selecting the essential aspects of a reference system to be represented in a model or simulation while ignoring those aspects that are not relevant to the purpose of the model or simulation (from NASA-STD-7009, adapted from Fidelity ISG Glossary, Vol. 3.0).
Assumption – Asserting information as a basis for reasoning about a system. In modeling and simulation, assumptions are taken to simplify or focus certain aspects of a model with respect to the RWS or presume distinct values for certain parameters in a model. Any modeling abstraction carries with it the assumption that it does not significantly affect the intended uses of the M&S (from NASA-HDBK-7009, draft).
Certification (as intended on Chart 24, last bullet)
Credibility (see comment for Chart 2)
Intended Use:
Referent: Data, information, knowledge, or theory against which simulation results can be compared [this is the definition in NASA-STD-7009 as adapted from ASME V&V 10].
Robust (or Robustness): tbd (associated with sensitivity – results that are robust are not sensitive to relatively slight variations in model parameters; conversely, results that are sensitive to variations in model parameters are not robust)
12 what is meant by “translational research directions”?
12 “model evaluation techniques” – the M&S Assessment Worksheet, in the NASA Handbook for M&S (Draft), may be useful to consider
12 what is meant by “novel translational workflows”?
21 just to be complete, consider adding: scientists, physicists, physicians, pharmacists

Martin Steele · Post by **Martin Steele** » Wed Apr 24, 2013 7:53 am

I'm trying "Reply with Quote" and deleting the quoted part, so I Reply to Jacob's new Thread without redundancy, I hope!

Jacob – interesting thoughts.
You initially ask about a “computational system.” I think there has been a one-of-a-kind super computer that has accomplished some complex analyses and been deemed credible at some level. It depends on what you’re doing with it and how critical it is. It is hoped that a “computational analysis” (purposeful change in term, here) is reproducible, but must we require that it be reproduced before we deem it credible? I think the answer is “It depends.” That expectation should be defined.

Open source code, and commercial off-the-shelf (COTS) software, is both a blessing and bane. How do you ensure that code is any good? If a revision comes out, what process do you go through to ensure it’s still good before using it, especially for critical analyses? This is an important aspect of M&S Verification, which must include verification of the computational platform (hardware & software) … every time it changes.

For Monte-Carlo type simulations, you can also have reproducible results without matching random number generator seeds, if the data is properly (i.e., statistically) analyzed.

C. Anthony Hunt · Post by **C. Anthony Hunt** » Wed Apr 24, 2013 5:52 pm

Figure in this paper [http://onlinelibrary.wiley.com/doi/10.1 ... .1222/full]
helps make clear the complicated variety of intended uses of different model types within Pharma R&D—all could be classified as research models.

1. Does the Committee need some way to shrink the modeling & model use case space on which it will focus? Are we focused on uses where simulation results are essential?

2. One-on-one discussions within the past year with individuals doing M&S work within big Pharma brought to light many model "uses" that are often ignored. …the _real_, primary reason the work was done. Examples follow. The M&S work was done (I'm paraphrasing)…
• to bolster my qualifications for a new position (or promotion)
• because [someone higher up or the "outside consultant] insisted on it
• to enhance publishability
• because we knew that it would distract the FDA from these other issues
• to make a messy situation look better
• because it proven effective as a means to "absorb" "what about X" questions during quarterly reviews
Etc.

If you or I are one step removed, such "uses" are invisible. If they were known, it would change my approach to those models.

Academic counterparts are easy to imagine.

Such hidden uses impact credibility. Do we ignore these issues or confront them?

Jacob Barhak · Post by **Jacob Barhak** » Thu Apr 25, 2013 12:08 am

Hi Tony,

To your second point regarding 'hidden uses". There will always be negative instances of use.

Imposing stricter Quality standards, and increasing competition will reduce the prevalence of negative phenomena.

The tools today allow increasing the level of scrutiny. If you give carrots to those who compete well with stricter demands you may not need the stick.

I hope this makes sense.

Jacob

Jacob Barhak · Post by **Jacob Barhak** » Thu Apr 25, 2013 12:48 am

Thanks Martin,

You have sufficient clarifications in the reply to better explain your thoughts.
To keep things short here are some characteristics that give credibility points:

- Reproducibility
- Publicly available Test Suite
- Documentation with examples
- Good service indicated by Responsiveness of developers
- Improvement with versions
- Open error reporting
- The system is blind tested
- The system is competitive compared to other systems
- Traceability of data to its source
- Open source

You are welcome to add your own to this list.

In my mind the more of those characteristics the system has the more credible it is.
Perhaps we should think in terms of credibility score?

The long story is below if you are interested in more reading.

I hope this is still on topic.

Jacob

######## The Long story ########

First, Martin your definition of " computational analysis" is more inclusive than our limited context of simulation and modeling in Healthcare. And I do think it is reasonable to demand reproducibility to establish credibility.

In cases where it is more difficult to reproduce at least all the source code and intermediate data should be frozen to allow tracing results backwards.

Here is an example I had experience with: Monte Carlo simulations on a cluster of computers. Each time I launched the simulation the results will be different since each machine is running a different random seed. To help reproducibility, there is a directory that holds all random states and the source code that ran. This allows future replacement of the code with the seeds to produce the same results. I used the source code traceability for debugging. It was useful in finding some issues that could not be explained by normal statistic techniques typically used with Monte Carlo simulations. Therefore the ability to trace back results to the original model is superior to statistical analysis. It is reasonable to demand this these days.

This brings me to the next point Martin mentioned - how do you know a software is any good. If the software is supplied with the tools to test its integrity and with sufficient reproducible examples then the customer/user has at least the capabilities of assessing the credibility of the software tools.
And versioning is part of the game - a very important part. Each version of a model/software should be better than the previous. I believe you will find that Tony Hunt supports this point - he refers to this as "model falsification" - Tony, please correct me if I am wrong. If there is a test suite attached to each version you can make sure that the tests pass and a newer version should pass more tests with each version.

Note that versioning tools are available today and even this committee uses such a tool to hold its documents. Also, there are systems today that help cope with multiple versions of multiple dependent software tools.

So versioning should not be seen as a hazard to credibility. On the contrary, a system that has many versions and rapidly responds to demands and evolves/develops quickly should be given credibility points.
As for critical applications, the best stable version should be frozen until a better version can pass all the tests. Think of test driven development where the tests are written first and the model/software should accommodate these.

Many systems today such as operating systems are constantly updated with new versions/patches. This update is a sign of their credibility. If such a versioning and correction mechanism is not active, then there is a problem and a system should be doubted. It is very much like buying a car that no one can service.

Almost any software system today is not perfect. Yet having the way to correct it is essential. The software should be as good as the demands and in many cases in modeling demands can be coded. The rest - such as Graphic User Interfaces can still be human tested.

Martin is correct to cast some doubt in software systems. Even the same source code may not run the same in different environments. I can give all sorts of examples. Never the less, humans do trust some computer systems. The question is what demands should we make from a system to tag it as credible?

Remember that the model/software just has to be better than we already have today - it is always possible to make a competition to test this.

The list above is what I would answer. You are welcome to edit this list from your own knowledge/experience.

Review of summary document providing Committee overview

Review of summary document providing Committee overview

Comment on Committee overview--Need: Clinical urgency

Re: Comment on Committee overview

Re: Review of summary document providing Committee overview

Re: Review of summary document providing Committee overview

Credibility and reproducibility

Re: Credibility and reproducibility

Regarding the issue of "intended uses"

Re: Regarding the issue of "intended uses"

Re: Credibility and reproducibility