simtk.org

Posted: **Fri Aug 16, 2013 11:29 am**

This thread is to discuss the proposed 'ten simple rules for credible practice'. See http://wiki.simtk.org/cpms/Ten_Simple_R ... e_Practice for more details.

A list of possible rules (copied from above link) is:

Use version control
Use credible solvers
Explicitly list your limitations
Define the context the model is intended to be used for
Define your evaluation metrics in advance
Use appropriate data (input, validation, verification)
Attempt validation within context
Attempt verification within context
Attempt uncertainty (error) estimation
Perform appropriate level of sensitivity analysis within context of use
Disseminate whenever possible (source code, test suite, data, etc)
Report appropriately
Use consistent terminology or define your terminology
Get it reviewed by independent users/developers/members
Learn from discipline-independent examples
Be a discipline-independent/specific example
Follow discipline-specific guidelines
Conform to discipline-specific standards
Document your code
Develop with the end user in mind
Provide user instructions whenever possible and applicable
Practice what you preach
Make sure your results are reproducible
Provide examples of use
Use traceable data that can be traced back to the origin
Use competition/alternative methods to compare results

Jacob's initial email:

Allow me to jump start the discussion and suggest alternatives. It would be beneficial to all if we conduct this discussion on the wiki and the forum. However, since some of you may not be comfortable with it, I decided to start this by email. If you are comfortable with a discussion on the wiki/forum please let us migrate there and someone else should start it and suggest a direction.

On an attempt to start a thread allow me to select the 3 most important elements from the list. I suggest you do the same and explain why. This should give us a good idea on what to discuss.

My selection of the most important elements are:

1. Use competition/alternative methods to compare results

2. Use traceable data that can be traced back to the origin

3. Use version control

I can add a few more rules from the list and there are overlaps, yet my selection is based on my experience. These points characterize my work and I found them important to keep my development stable. My argument is that with those elements it is possible to gage the success of your model and identify elements in it that do not work well and improve them in the next version. Following these three elements it is possible to constantly improve a model.

I hope I was able to convey my argument well and that you will follow up with a discussion. We have about a week for this discussion before our meeting. I hope we make the best out of this week.

Posted: **Fri Aug 16, 2013 11:52 am**

Some initial thoughts.

From the above list, the ones that stand out to me as being the most important are:

Use version control
Define the context the model is intended to be used for
Attempt verification within context
Attempt validation within context -- I would include the rule 'Define your evaluation metrics in advance' as a sub-rule of this
Attempt uncertainty (error) estimation -- I am taking this to mean uncertainty quantification (determining the effect on output QOIs of uncertainty in input parameters), I am not sure why uncertainty and error are identified here - error estimation is part of the verification process
Explicitly list your limitations
Use traceable data that can be traced back to the origin -- Definitely very important and not always adhered to enough, at least in my field (cardiac modelling)
Make sure your results are reproducible

Also

Disseminate whenever possible (source code, test suite, data, etc)

except I would phrase this more strongly, maybe in terms of open-science.

Finally, all of these:

Report appropriately
Document your code
Provide user instructions whenever possible and applicable
Provide examples of use

could be one (very important) rule about providing clear and detailed documentation on ALL aspects of model development and use. Ideally modellers would very clearly document: equations, parameters & calibration, testing, verification activities, validation activities, UQ activities, possible context of use, and user instructions - this would allow regulators/end-users to make proper informed decision on the credibility of predictions in a given context of use.

Posted: **Sat Aug 17, 2013 1:21 pm**

Hi Pras,

If you noticed, I chose only 3 elements from the list. I was trying to get a ranking of all the rules. If every one of us picks only 3 we can quickly rank the best 10. It also makes us think harder what is really more important.

I assume you ranked your list according to importance. If I am correct, the first few elements capture the essence of what is really most important in your mind.

All elements are important to some degree since they were listed there by someone thinking of important elements. And I assume the candidate list will grow with time. Yet perhaps it is worthwhile to narrow it down by ranking. After all it is hard to follow 100 rules, yet it is much easier to follow a number you can count with your fingers.

I look forward to see what others think is most important.

Posted: **Sat Aug 17, 2013 4:34 pm**

Hunt’s entry #1. I like and want to expand on the issues raised by Pras. My focus is this directive: “survey, augment and consolidate the list as needed to narrow in on the ten simple rules that matter most to the community each team represents.” My survey lead to a rich literature stretching back decades. Drawing on that, I believe that the list can be clustered under five more general, guiding ideas (I'll suggest those in another entry).

First, an important rule is missing. I offer it below. Long term, I think this new rule will prove most useful in building credibility.

In a second entry, following the “augment” directive, I want to take issue with several candidate rules, explain my issues, and offer alternatives.

In a third entry, I’ll organize and rank the revised list into what I see as five, common themes.

In a fourth entry, I’ll pick what I think are the ten most important.

Need for new rule: Examples of using multiple implementations (to achieve the same or similar requirements &/or use cases) are beginning to appear in the computational biomedical literature. The practice is much more common in other domains. The credibility of a particular therapeutic intervention increases when similar experimental results are achieved using different wet-lab implementations of the experiment (different animal models; different in vitro models). We can expect analogous credibility improvements for computational models.

New rule suggestion: Use multiple implementations to check and balance each other

This new rule overlaps the revisions suggested in entry #2 for these two [Use credible solvers; Use traceable data that can be traced back to the origin].

Posted: **Sat Aug 17, 2013 5:10 pm**

Hunt’s entry #2. Below, I take issue with several candidate rules, explain my issues, and offer alternatives.
See entry #1.

Use credible solvers
This is too narrow. A "solver" is an implementation and should be treated in exactly the same way as other SW implementations, including with version control, being open/published, etc.

My alternative: Include credible/pedigreed implementations alongside the other implementations.

Explicitly list your limitations
Too vague. Falsification evidence (documentation) is very important. Falsification trumps validation.

Scratch any model and it bleeds assumptions. We expect the model builder to know or learn when, why, and how her model is false. If that information is made clear, along with when, why, and how the model is useful, that information will immediately bolster the model’s credibility. Why hide it? Information on both when predictions have been supported and when they have been falsified (which may be treated by some as negative results), further bolsters the model’s credibility. Why hide such “negative results”? (Aside: it’s strange that our MSM community actively shies away from acknowledging the importance of falsification.) It is easy for any modeling expert to falsify any computational model. So, in order for a model to have scientific and biomedical use credibility, it must be easy for an observer (a potential user of the model or its results) to see and understand it limits. That requires providing falsification data (evidence). That information can be more important than validation evidence. The latter without the former risks failing to achieve credibility.

Suggestions for alternatives: Explicitly identify experimental scenarios illustrating when, why, and how the model is false

Define the context the model is intended to be used for

Revise: Define the use context for which the model is intended

Report appropriately
Too vague. Having a readable, easily executed (by anyone) model is necessary and essential precondition for a broadly accessible report.

My alternative: Report early and often, in the form of disseminated models and results.

Get it reviewed by independent users/developers/members
This is good. It is the heart and soul of validation. Technically, validation by oneself is not validation at all.

Revision suggestion: Entice others to explore, criticize, repeat, and falsify your results.

Be a discipline-independent/specific example

Revision suggestion: Be very concrete and particular first, generalize second.

Follow discipline-specific guidelines
Avoid simply following someone else’s guidelines. Scientifically and biomedically useful and credible MSMs will (for the foreseeable future) always be works in progress. We expect creativity while adhering to and advancing current best practices. Follow the science; do whatever is required by the project, by the various use cases.

Revision suggestion: Choose the best solutions based on a clear project formulation.

Conform to discipline-specific standards
Be knowledgeable of, but do not abide by (feel constrained by) any standards. Be creative. Think and work outside the box when feasible and possible. Mesh methods from many disciplines.

Revision suggestion: Prespecify use cases and then choose standards and methods based on the context.

Document your code
Simple documentation as currently practiced is inadequate. Use "literate programming” (http://en.wikipedia.org/wiki/Literate_programming). Write readable code. The point is to make the model a thing we can use and study in order to learn more about the referent. If the model is readable, the referent is easier to study. It follows that verification, iteration, reuse, etc., will also be easier.

Revision suggestion: Do not simply document your code. Make your code readable.

Practice what you preach
Too vague; possibly misleading. Credibility requires developing and presenting your use cases first. Next, present your model in source and executable format. Encourage and enable others to repeat, use and/or reuse your work by giving them the means to repeat it, the ability to read it, and the accessibility to criticize it.

Revision suggestion: Do not merely "practice what you preach", make your work easy for others to challenge

Make sure your results are reproducible

Revision suggestion: Make it easy for anyone to repeat and/or falsify your results.

Use traceable data that can be traced back to the origin
The issues need to be more concrete. I see two: 1) data provenance and 2) whether the data are appropriate for the intended purpose. I view undertaking and documenting those tasks as an obvious best practice. Up front clarity about current and future use case should cover (2). Note: Data for which provenance cannot be established may still be very useful, but should not be used for critical decision making during model development and refinement.

Revision suggestion for (1): Rely only on wet-lab data for which provenance is established.

Posted: **Sun Aug 18, 2013 1:01 pm**

Hunt’s entry #3. I see the candidate rules, revised as in my Entry #2, falling into five clusters. I suggest a guiding principle for each cluster (I’m sure that they can be improved). I rank-ordered the candidate rules within each cluster, but could be happy with other rankings.
Also, see entry #1.

1) Never purposefully limit yourself, but always keep your methods clear and open
• Use multiple implementations to check and balance each other
• When feasible, include credible/pedigreed implementations alongside the other implementations.
• Prespecify use cases and then choose standards and methods based on the context.
• Choose the best solutions based on a clear project formulation.

2) Get more people involved
• Do not simply document your code. Make your code readable.
• Make it easy for anyone to repeat and/or falsify your results.
• Use version control.
• Disseminate whenever possible (source code, test suite, data, etc.)
• Attempt verification within context
• Perform appropriate level of sensitivity analysis within context of use
• Attempt uncertainty (error) estimation

3) Falsification trumps validation
• Define your evaluation metrics in advance
• Entice others to explore, criticize, repeat, and falsify your results
• Rely only on wet-lab data for which provenance is established
• Define the use context for which the model is intended
• Develop with the end user in mind
• Provide examples of use
• Provide user instructions whenever possible and applicable
• Do not merely "practice what you preach", make your work easy for others to challenge
• Use appropriate data (input, validation, verification)
• Attempt verification within context

4) Talk is cheap. Data, software, and math matter more
• Explicitly identify experimental scenarios illustrating when, why, and how the model is false.
• Report early and often, in the form of disseminated models and results
• Use consistent terminology or define your terminology

5) The general is derived from the particular, not vice versa
• Learn from discipline-independent examples
• Be very clear, concrete and particular first, generalize second

Posted: **Sun Aug 18, 2013 1:35 pm**

Hunt’s entry #4. Below I list my top ten candidate rules selected from the original list, as revised as in my Entry #2, but I Ignore the clustering in Entry #3.
Also, see entry #1.

1. Use multiple implementations to check and balance each other
2. Do not simply document your code: make your code readable
3. Define your evaluation metrics in advance
4. Explicitly identify (in silico) experimental scenarios illustrating when, why, and how the model is false
5. Be very clear, concrete, and particular first, generalize second

6. When feasible, include credible/pedigreed implementations alongside the other
7. Make it easy for anyone to repeat and/or falsify your results
8. Prespecify use cases, and then choose standards and methods based on the context
9. Report early and often, in the form of disseminated models and results
10. Learn from discipline-independent examples

Posted: **Mon Aug 19, 2013 8:44 am**

jbarhak wrote: If you noticed, I chose only 3 elements from the list. I was trying to get a ranking of all the rules. If every one of us picks only 3 we can quickly rank the best 10. It also makes us think harder what is really more important.

I assume you ranked your list according to importance.

My ordering wasn't ranked. I am struggling to pick out a 'top three' rules because so many things are very important. I suppose I would choose the 'document code & VVUQ activities' rule that I suggested, 'attempt validation' (how can a model be credible if there is no comparison with reality done?), and I'm not sure about the third, maybe 'define context of use'.

Posted: **Tue Aug 20, 2013 1:16 am**

Tony made major work on this topic. I do agree with his first selected rule formulation. Actually I think it is better defined than the text I suggested previously. Allow me to extend it a bit more by adding competition to the text:

Use competition of multiple implementations to check and balance each other

I will change the wiki to use this text instead of the previous version.

Pras, this rule also helps in coping in situations where you cannot test the model in reality. Note that in biological systems reality itself is uncertain. For example, conducting the same clinical trial does not guaremtee the same results. Never the less, having several competing models may provide better understanding of phenomena observed.

Note that all those who responded so far have at least one clerical book keeping rule in our top 3 rules. This shows the importance of keeping records.

Finally I notice that Tony and Pras both chose a 3rd rule that defines the goal. This makes perfect sense. In my case, the goal is implicitly embedded in competition in the most important rule. We all agree that knowing the modeling goal is important.

I like the grouping of the laws Tony made although I can argue on specific formulations of rules.

The question is how to incorporate These changes in the wiki so others can follow. I think it is a good idea to copy these alternates to the wiki as alternative formulations and let future versions evolve from this extended list.

I hope to get more feedback from other committee members.

Posted: **Tue Aug 20, 2013 10:21 am**

jbarhak wrote:Tony made major work on this topic. I do agree with his first selected rule formulation. ... Pras, this rule also helps in coping in situations where you cannot test the model in reality. ... For example, conducting the same clinical trial does not guaremtee the same results. Never the less, having several competing models may provide better understanding of phenomena observed...

Team members: I should have included a clarification with my 1st Entry (on this page).

The [Ten Simple Rules of Credible Practice] page refers to PLoS-CB and envisions rules that might be published. I focused on rules that could be submitted to PLoS-CB. The Roles and Responsibilities of our Team state, "...that comprise of [1] the fundamental and applied researchers, as well as [2] commercial developers of M&S for healthcare applications." My rule suggestions aim at [1], not [2], and I was not focused on "End Users" or on some future group's "Practice Standards and Guidelines." If I were to focus on [2], I'd change some suggestions.

I expect several top rules to be quite different depending on one's major focus. I expect the [Ten Simple Rules of Credible Practice] published in PLoS-CB will be somewhat different from a set published in CPT-PSP http://www.nature.com/psp/index.html (I mention the latter because I've been reading several papers; I'm writing an invited tutorial for that journal on OO, discrete event, agent-oriented methods).

Modelers publishing in CPT-PSP have several professional-society-accepted best practices and adhere to many pharmacometric standards and guidelines. At the same time, they rely almost exclusively on "modeling packages" developed by others. With most of those packages, the modeler uses a GUI and the package's "tools" to describe a model. It is often infeasible for the reader of a CPT-PSP paper to "see" or read exactly how the model was implemented or executed. Nevertheless, many CPT-PSP readers will give such a model a degree of credibility simply because the modeling package is broadly accepted (is professional-society-accepted or FDA-accepted). For me, the opacity of model resulting from that "practice" is unacceptable, and that accounts somewhat for the bias reflected in my rule suggestions.

simtk.org

Ten simple rules of credible practice

Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice

Re: Ten simple rules of credible practice