TODO list:

* Add argument specifying number of GPUs to use, or autodetect?
  How can we figure out how many per node and how MPI processes are allocated to nodes?

* Can we remove dependency on pyopenmm.py for simplicity?  
  We can eventually revise this to be up to date, but right now it is out of date and a maintenance nightmare.
  Instead, we can create a simple utility routine to do system concatenation if needed, use the OpenMM app to set up
  subsystems separately, or insist receptor and ligand subsystems are already be set up correctly with LEaP.