Correct. This is platform specific. You're using the CPU platform, which has an optimized implementation of CustomNonbondedForce which does much of the calculation in single precision. But it doesn't include an optimized implementation of CustomHbondForce (just because it's never been a priority to write one), so it uses the reference implementation which does everything in double precision. If you use the OpenCL or CUDA platform, they'll use the same precision level for everything.Are the energy differences in fact due to different precision levels?
Peter