Operator-split physics¶
Overview¶
JCM evaluates physics exactly once per dynamics timestep dt, outside
the IMEX-RK stages, and applies the resulting tendency as a Lie split
ahead of the dynamics integral:
state ─┐
├── compute_physics_step ─► dyn_tendency, new_physics_state
│
▼
state + dt · dyn_tendency
│
▼
IMEX-RK SIL3 dynamics over dt ← primitive equations (+ optional nudging)
│ + upper sponge, hyperdiffusion filters
▼
filters (mean-Ps fix, level-dependent hyperdiffusion on
divergence / vorticity-tracers / temperature)
│
▼
state_next
The splitting error is O(dt) (Lie split (a), state → physics → dynamics → next). The dynamics IMEX-RK is the only step that advances
sim_time — dyn_tendency carries sim_time = 0 so the forward-Euler
add does not perturb it.
This mirrors operational GCM practice (ECHAM, CAM, IFS, E3SM): physics
runs once per dt as forcing to the dynamics, rather than at each RK
substage.
Why Lie and not Strang¶
The natural next step from Lie is a Strang split (state → ½dt physics → dynamics → ½dt physics → next). Two forms exist:
Single-evaluation (“symmetric Lie”): reuse the same
dyn_tendencyfor both half-steps. 1× physics cost. But for state-dependent physics (the normal case) this is not second-order accurate — the local truncation analysis shows it misses the(dt²/2)(PD + P²)term the exact propagator carries. GloballyO(dt), same as Lie.Classical Strang: re-evaluate physics on the post-dynamics state for the second half-step. 2× physics cost (still cheaper than the legacy 3× per-substage scheme). True
O(dt²)accuracy.
Neither is implemented today. Classical Strang is a worthwhile
follow-up if a coarser-dt regime exposes the splitting error; at the
current climate-rate dt = 12-30 min, the Santos 2021 analysis
(JAMES 13, e2020MS002359) finds coupling error dominates self-feedback
error and Lie/Strang are both adequate. The single-evaluation symmetric
form was attempted in this PR’s history but reverted after Codex review
correctly flagged the order issue (PR #481 review thread).
Key components¶
Model._get_op_split_step_fn (jcm/model.py)¶
Builds the per-dt step closure (state, physics_state) -> (state_next, physics_state_next). Internals:
Resolve the current step’s date and forcing slice from
state.sim_time.Call
compute_physics_stepfor(dyn_tendency, new_physics_state).Lie apply:
state + dt·tend → dynamics_step(full forward-Euler add then full-dtIMEX-RK).Run the post-step filters (conserve global-mean surface pressure, level-dependent hyperdiffusion on divergence / vorticity+tracers / temperature).
The step is a pure function of (state, physics_state). The
physics_state carry is a JAX pytree and is the only cross-step state
the integrator threads.
Model._get_dynamics_step_fn (jcm/model.py)¶
Builds the IMEX-RK SIL3 integrator over the dynamics composition. The
dynamics composition is the primitive equations plus, optionally, a
Newtonian nudging tendency. Sponge and hyperdiffusion stay inside the
IMEX-RK stage loop / step_with_filters path — they are stiff /
fast-linear couplings that benefit from intermediate-state evaluation
and would lose stability if migrated to op-split.
This is the function a future pysces backend (#388) would replace.
Its interface is state -> state' over dt with no physics knowledge.
compute_physics_step (jcm/physics_interface.py)¶
Converts the spectral dynamics State to a nodal PhysicsState, runs
verify_state (non-negativity clamp on tracers), calls
Physics.compute_tendencies(state, forcing, terrain, prev_physics_data=physics_state), applies verify_tendencies (caps
negative-going tracer tendencies at -state/dt), and converts the
result back to a dinosaur dynamics tendency.
Returns (dynamics_tendency, new_physics_state). The new carry is the
dict the physics call writes to (radiation cache, prior-step TKE,
cloud / aerosol / chemistry diagnostics, …) — exactly the input shape
of the next step.
_op_split_trajectory (jcm/model.py)¶
The trajectory builder. Takes the per-dt step function and threads
(state, physics_state) through a nested lax.scan:
Outer scan —
outer_stepssaved frames.Inner scan —
inner_stepsdtsteps between saves.
Two output modes:
Snapshot (
output_averages=False). The inner scan steps silently; the outer step saves the post-inner-scan(state_final, physics_state_final). The savedpredictions.physicsis the cross-step carry the integration actually consumed — radiation sub-cycle cache, TKE memory, etc. This is required for correctness: withradiation_interval = 7200 sthe dycore reuses cached fluxes on most outer steps, and recomputing physics from a freshly seeded carry at save time would silently report zero / IC radiation fields (the original #470 bug).Averaged (
output_averages=True). The inner scan accumulates a running mean of post-step dynamics states (state_next) and of the per-step physics-state dict. The outer step saves the time-mean. Summing post-step states gives bit-equivalence with the snapshot path’s end-of-step samples —mean(snapshots) == averaged(...)to numerical roundoff. The summands are cast tofloatbefore the running mean to avoid mid-scan dtype promotion (whichlax.scanrejects).
jax.checkpoint wraps each inner step, so backward passes
re-materialise the step rather than save it. Memory budget: physics
runs once per dt rather than three times, so the autodiff tape for
a checkpointed step is ~3× smaller than under the deprecated
in-stage scheme.
Cross-step carry persistence¶
Model holds two slots of cross-step state:
_final_modal_state— dinosaur spectral state at the end of the lastrun()/resume()call._final_physics_state— the cross-step physics carry at the end of the last call.
run() resets both via bootstrap_state (and rebuilds the physics
carry via _build_initial_physics_carry). resume() threads them
back in. The result: a run(5d) + resume(5d) chain is numerically
equivalent to a contiguous run(10d) — sub-cycled radiation and
prior-step TKE do not reset at the API seam. Regression covered by
test_op_split_carry_persists_across_resume.
run_from_state_with_carry exposes the carry seed and final carry
directly for callers that need explicit control.
Initial physics carry¶
Model._build_initial_physics_carry unions:
Physics.initial_carry_state(coords)— per-term deterministic seed. EachPhysicsTermthat has cross-step state overrides this with a sensible non-zero value (e.g.TTETKEVerticalDiffusionseeds TKE at the 0.01 m²/s² ECHAM floor; radiation slots useRadiationData.zerosfor fluxes — the first compute step writes real values before any consumer reads them).Physics.get_empty_data(coords)— a structural template obtained by probing the term loop with an isothermal-288 KPhysicsState, zero-filled. Used only to discover the dict structure thatcompute_tendenciesproduces, so thelax.scancarry pytree matches the post-step output on iteration 1 for any diagnostic keys whose owning term has not overriddeninitial_carry_state.
The isothermal probe (rather than a zero-state probe) avoids the
0/0 = NaN cascade that motivated the cache-cleanup in PR #469. The
result of get_empty_data is never used as live state — only as a
shape template.
Coupling within physics¶
ComposablePhysics is process-parallel: every term sees the same
input state, tendencies are summed. Order-independent
(A + B + C == B + A + C), which keeps replace() / remove() /
__add__() composition operators well-defined.
This differs from ECHAM6’s sequential coupling (each scheme sees the
state with prior schemes’ tendencies already applied via
tte += ...). Sequential coupling is more accurate for tightly-coupled
process pairs but introduces order-dependence. Process-parallel is
adequate at JCM’s climate-rate dt; sequential coupling is available
as a follow-up for terms that need it (cf. E3SM’s CLUBB+MG2 loop).
Filters and dycore composition¶
The post-step filter chain runs on (pre_step_state, post_dynamics_state)
in this order:
conserve_global_mean_surface_pressure— pins the 0,0,0 spectral mode oflog_surface_pressureto its pre-step value.diffuse_div— level-dependent hyperdiffusion on divergence.diffuse_vor_q— hyperdiffusion on vorticity and every tracer (specific humidity + microphysics tracers + GHG VMRs).diffuse_temp— hyperdiffusion on temperature variation.
The level-dependent scaling is precomputed once at Model.__init__
and inlined as a JIT constant.
A final verify_state clamp runs in _post_process at the
modal→nodal output boundary to mask any spectral Gibbs-ringing of
negative tracer tendencies before user-visible output.
Tests¶
Op-split coverage in jcm/model_test.py::TestOperatorSplitPhysics:
test_op_split_snapshot_speedy_finite/test_op_split_averaged_speedy_finite/test_op_split_averaged_echam_hybrid_finite— both modes, SPEEDY and ECHAM-hybrid; require finite atmospheric state.test_op_split_step_is_jax_pure— the step function traces cleanly underjitand round-trips the dynamics pytree structure.test_op_split_carry_threading—physics_statereturned by step N is the same pytree structure as the input to step N+1 (thelax.scancarry contract).test_op_split_carry_persists_across_resume—run(5d) + resume(5d)matchesrun(10d)to numerical roundoff (the P1 fix).test_op_split_run_resets_carry— repeatedrun()on the sameModeldoes not inherit stale carry.test_op_split_snapshot_physics_uses_integration_carry— the snapshotpredictions.physicsis the carry the integration consumed, not a freshly-seeded copy (the P2 fix).
The legacy in-stage path was removed in Phase 4 (TestLegacyPathRemoved);
no flag remains to switch back. The deleted symbols are
_step_tendencies, physics_forcing_eqn, DiagnosticsCollector,
averaged_trajectory_from_step, get_physical_tendencies,
_get_step_fn_factory, _get_integrate_fn, and the use_op_split
kwarg on run() / resume() / run_from_state().
Performance notes¶
Physics runs once per
dtrather than three times perdt(one per IMEX-RK explicit substage). For RRTMGP / TTE-TKE / Tiedtke-Nordeng this is roughly 3× the wall-time saving on the physics path.Backward passes pay
~3×less memory underjax.checkpointfor the physics path because each checkpoint re-traces a single physics call.Radiation sub-cycling honours one timeline (
radiation_should_computereadsmodel_stepfrom_dateandradiation_intervalfrom the term config), not three intermixed substage timelines. Same semantics as before, cleaner plumbing.
Appendix — ECHAM6 reference¶
The proposed JCM structure mirrors ECHAM6’s well-established
operator-split design. The relevant code chain in
echam6.3.0-ham2.3-moz1.0.r7492/src:
stepon.f90:156 integration_loop: DO ← main time loop
stepon.f90:271 CALL scan1 ← gridpoint phase
scan1.f90:499 CALL gpc(jrow)
gpc.f90:78 CALL physc(krow)
physc.f90:543 CALL cover ← cloud cover
physc.f90:566 CALL radiation ← full RT (sub-cycled)
physc.f90:678 CALL vdiff ← vertical diffusion
physc.f90:776 CALL radheat ← cached-flux heating
physc.f90:835 CALL gwspectrum ← Hines GW drag
physc.f90:877 CALL ssodrag ← orographic GW drag
physc.f90:987 CALL cucall ← Tiedtke convection
physc.f90:1067 CALL cloud ← stratiform + micro
stepon.f90:280 CALL sccd ← divergence semi-implicit
stepon.f90:286 CALL scctp ← T + Ps semi-implicit
stepon.f90:294 CALL uspnge ← upper sponge (in dynamics)
stepon.f90:309 CALL hdiff ← horizontal diffusion
stepon.f90:476 END DO integration_loop
Physics tendencies are accumulated into the gridpoint buffer arrays
declared at mo_scan_buffer.f90:36-47 (vol, vom, tte, qte,
xlte, xite, xtte) and consumed by sccd / scctp as the
explicit forcing in the semi-implicit Helmholtz solve. Physics is
called exactly once per dynamics timestep.
Two structural differences from JCM:
Dynamics integrator. ECHAM uses leapfrog + semi-implicit (spectral); JCM uses IMEX-RK SIL3 (also spectral, via dinosaur). The split point is the same — operator-split physics — but the dynamics integrator on each side differs. ECHAM’s split is forced by leapfrog’s single RHS evaluation per
dt; JCM’s SIL3 has substages and could in principle evaluate physics at each one, so for JCM operator-splitting is a real design choice. CAM (HOMME spectral element), E3SM (HOMME with sub-cycled advection), and IFS (semi-Lagrangian + semi-implicit) op-split despite having more than one dynamics evaluation per physicsdt— the same situation JCM-with-SIL3 is in.Within-physics coupling. ECHAM is sequential (each scheme reads the state with prior schemes’ tendencies already applied via the
tte += ...pattern on the shared accumulators inmo_scan_buffer.f90). JCM’sComposablePhysicsis process-parallel (every term reads the same input state, tendencies are summed). The latter preserves the composability story (the order-independence ofA + B + C) at the cost of a small accuracy penalty for tightly-coupled term pairs.
The cross-step physics state in ECHAM lives in module-level globals
(mo_radiation_forcing, …) — semantically the same as JCM’s
PhysicsCarryState pytree, just routed via Fortran’s module-level
mutability rather than as an explicit functional carry.