# Operator-split physics

## Overview

JCM evaluates physics exactly once per dynamics timestep `dt`, outside
the IMEX-RK stages, and applies the resulting tendency as a Lie split
ahead of the dynamics integral:

```text
state ─┐
       ├── compute_physics_step ─► dyn_tendency, new_physics_state
       │
       ▼
state + dt · dyn_tendency
       │
       ▼
   IMEX-RK SIL3 dynamics over dt   ← primitive equations (+ optional nudging)
       │                              + upper sponge, hyperdiffusion filters
       ▼
   filters (mean-Ps fix, level-dependent hyperdiffusion on
            divergence / vorticity-tracers / temperature)
       │
       ▼
state_next
```

The splitting error is `O(dt)` (Lie split (a), `state → physics →
dynamics → next`). The dynamics IMEX-RK is the only step that advances
`sim_time` — `dyn_tendency` carries `sim_time = 0` so the forward-Euler
add does not perturb it.

This mirrors operational GCM practice (ECHAM, CAM, IFS, E3SM): physics
runs once per `dt` as forcing to the dynamics, rather than at each RK
substage.

### Why Lie and not Strang

The natural next step from Lie is a Strang split (`state → ½dt
physics → dynamics → ½dt physics → next`). Two forms exist:

- **Single-evaluation** ("symmetric Lie"): reuse the same
  `dyn_tendency` for both half-steps. 1× physics cost. But for
  state-dependent physics (the normal case) this is *not* second-order
  accurate — the local truncation analysis shows it misses the
  `(dt²/2)(PD + P²)` term the exact propagator carries. Globally `O(dt)`,
  same as Lie.
- **Classical Strang**: re-evaluate physics on the post-dynamics state
  for the second half-step. 2× physics cost (still cheaper than the
  legacy 3× per-substage scheme). True `O(dt²)` accuracy.

Neither is implemented today. Classical Strang is a worthwhile
follow-up if a coarser-`dt` regime exposes the splitting error; at the
current climate-rate `dt = 12-30 min`, the Santos 2021 analysis
(JAMES 13, e2020MS002359) finds coupling error dominates self-feedback
error and Lie/Strang are both adequate. The single-evaluation symmetric
form was attempted in this PR's history but reverted after Codex review
correctly flagged the order issue (PR #481 review thread).

## Key components

### `Model._get_op_split_step_fn` (`jcm/model.py`)

Builds the per-`dt` step closure `(state, physics_state) -> (state_next,
physics_state_next)`. Internals:

1. Resolve the current step's date and forcing slice from `state.sim_time`.
2. Call `compute_physics_step` for `(dyn_tendency, new_physics_state)`.
3. Lie apply: `state + dt·tend → dynamics_step` (full forward-Euler add
   then full-`dt` IMEX-RK).
4. Run the post-step filters (conserve global-mean surface pressure,
   level-dependent hyperdiffusion on divergence / vorticity+tracers /
   temperature).

The step is a pure function of `(state, physics_state)`. The
`physics_state` carry is a JAX pytree and is the only cross-step state
the integrator threads.

### `Model._get_dynamics_step_fn` (`jcm/model.py`)

Builds the IMEX-RK SIL3 integrator over the dynamics composition. The
dynamics composition is the primitive equations plus, optionally, a
Newtonian nudging tendency. Sponge and hyperdiffusion stay inside the
IMEX-RK stage loop / `step_with_filters` path — they are stiff /
fast-linear couplings that benefit from intermediate-state evaluation
and would lose stability if migrated to op-split.

This is the function a future pysces backend (#388) would replace.
Its interface is `state -> state'` over `dt` with no physics knowledge.

### `compute_physics_step` (`jcm/physics_interface.py`)

Converts the spectral dynamics `State` to a nodal `PhysicsState`, runs
`verify_state` (non-negativity clamp on tracers), calls
`Physics.compute_tendencies(state, forcing, terrain,
prev_physics_data=physics_state)`, applies `verify_tendencies` (caps
negative-going tracer tendencies at `-state/dt`), and converts the
result back to a dinosaur dynamics tendency.

Returns `(dynamics_tendency, new_physics_state)`. The new carry is the
dict the physics call writes to (radiation cache, prior-step TKE,
cloud / aerosol / chemistry diagnostics, …) — exactly the input shape
of the next step.

### `_op_split_trajectory` (`jcm/model.py`)

The trajectory builder. Takes the per-`dt` step function and threads
`(state, physics_state)` through a nested `lax.scan`:

- **Outer scan** — `outer_steps` saved frames.
- **Inner scan** — `inner_steps` `dt` steps between saves.

Two output modes:

- **Snapshot** (`output_averages=False`). The inner scan steps
  silently; the outer step saves the post-inner-scan `(state_final,
  physics_state_final)`. The saved `predictions.physics` is the
  cross-step carry the integration *actually consumed* — radiation
  sub-cycle cache, TKE memory, etc. This is required for correctness:
  with `radiation_interval = 7200 s` the dycore reuses cached fluxes
  on most outer steps, and recomputing physics from a freshly seeded
  carry at save time would silently report zero / IC radiation fields
  (the original #470 bug).
- **Averaged** (`output_averages=True`). The inner scan accumulates a
  running mean of post-step dynamics states (`state_next`) and of the
  per-step physics-state dict. The outer step saves the time-mean.
  Summing *post-step* states gives bit-equivalence with the snapshot
  path's end-of-step samples — `mean(snapshots) == averaged(...)` to
  numerical roundoff. The summands are cast to `float` before the
  running mean to avoid mid-scan dtype promotion (which `lax.scan`
  rejects).

`jax.checkpoint` wraps each inner step, so backward passes
re-materialise the step rather than save it. Memory budget: physics
runs once per `dt` rather than three times, so the autodiff tape for
a checkpointed step is `~3×` smaller than under the deprecated
in-stage scheme.

### Cross-step carry persistence

`Model` holds two slots of cross-step state:

- `_final_modal_state` — dinosaur spectral state at the end of the
  last `run()` / `resume()` call.
- `_final_physics_state` — the cross-step physics carry at the end of
  the last call.

`run()` resets both via `bootstrap_state` (and rebuilds the physics
carry via `_build_initial_physics_carry`). `resume()` threads them
back in. The result: a `run(5d)` + `resume(5d)` chain is numerically
equivalent to a contiguous `run(10d)` — sub-cycled radiation and
prior-step TKE do not reset at the API seam. Regression covered by
`test_op_split_carry_persists_across_resume`.

`run_from_state_with_carry` exposes the carry seed and final carry
directly for callers that need explicit control.

### Initial physics carry

`Model._build_initial_physics_carry` unions:

- `Physics.initial_carry_state(coords)` — per-term deterministic seed.
  Each `PhysicsTerm` that has cross-step state overrides this with a
  sensible non-zero value (e.g. `TTETKEVerticalDiffusion` seeds TKE at
  the 0.01 m²/s² ECHAM floor; radiation slots use `RadiationData.zeros`
  for fluxes — the first compute step writes real values before any
  consumer reads them).
- `Physics.get_empty_data(coords)` — a *structural template* obtained
  by probing the term loop with an isothermal-288 K `PhysicsState`,
  zero-filled. Used only to discover the dict structure that
  `compute_tendencies` produces, so the `lax.scan` carry pytree
  matches the post-step output on iteration 1 for any diagnostic keys
  whose owning term has not overridden `initial_carry_state`.

The isothermal probe (rather than a zero-state probe) avoids the
`0/0 = NaN` cascade that motivated the cache-cleanup in PR #469. The
result of `get_empty_data` is never used as live state — only as a
shape template.

### Coupling within physics

`ComposablePhysics` is **process-parallel**: every term sees the same
input `state`, tendencies are summed. Order-independent
(`A + B + C == B + A + C`), which keeps `replace()` / `remove()` /
`__add__()` composition operators well-defined.

This differs from ECHAM6's sequential coupling (each scheme sees the
state with prior schemes' tendencies already applied via
`tte += ...`). Sequential coupling is more accurate for tightly-coupled
process pairs but introduces order-dependence. Process-parallel is
adequate at JCM's climate-rate `dt`; sequential coupling is available
as a follow-up for terms that need it (cf. E3SM's CLUBB+MG2 loop).

### Filters and dycore composition

The post-step filter chain runs on `(pre_step_state, post_dynamics_state)`
in this order:

1. `conserve_global_mean_surface_pressure` — pins the 0,0,0 spectral
   mode of `log_surface_pressure` to its pre-step value.
2. `diffuse_div` — level-dependent hyperdiffusion on divergence.
3. `diffuse_vor_q` — hyperdiffusion on vorticity and *every* tracer
   (specific humidity + microphysics tracers + GHG VMRs).
4. `diffuse_temp` — hyperdiffusion on temperature variation.

The level-dependent scaling is precomputed once at `Model.__init__`
and inlined as a JIT constant.

A final `verify_state` clamp runs in `_post_process` at the
modal→nodal output boundary to mask any spectral Gibbs-ringing of
negative tracer tendencies before user-visible output.

## Tests

Op-split coverage in `jcm/model_test.py::TestOperatorSplitPhysics`:

- `test_op_split_snapshot_speedy_finite` /
  `test_op_split_averaged_speedy_finite` /
  `test_op_split_averaged_echam_hybrid_finite` — both modes, SPEEDY
  and ECHAM-hybrid; require finite atmospheric state.
- `test_op_split_step_is_jax_pure` — the step function traces cleanly
  under `jit` and round-trips the dynamics pytree structure.
- `test_op_split_carry_threading` — `physics_state` returned by step
  N is the same pytree structure as the input to step N+1 (the
  `lax.scan` carry contract).
- `test_op_split_carry_persists_across_resume` — `run(5d) +
  resume(5d)` matches `run(10d)` to numerical roundoff (the P1 fix).
- `test_op_split_run_resets_carry` — repeated `run()` on the same
  `Model` does not inherit stale carry.
- `test_op_split_snapshot_physics_uses_integration_carry` — the
  snapshot `predictions.physics` is the carry the integration
  consumed, not a freshly-seeded copy (the P2 fix).

The legacy in-stage path was removed in Phase 4 (`TestLegacyPathRemoved`);
no flag remains to switch back. The deleted symbols are
`_step_tendencies`, `physics_forcing_eqn`, `DiagnosticsCollector`,
`averaged_trajectory_from_step`, `get_physical_tendencies`,
`_get_step_fn_factory`, `_get_integrate_fn`, and the `use_op_split`
kwarg on `run()` / `resume()` / `run_from_state()`.

## Performance notes

- Physics runs once per `dt` rather than three times per `dt` (one per
  IMEX-RK explicit substage). For RRTMGP / TTE-TKE / Tiedtke-Nordeng
  this is roughly 3× the wall-time saving on the physics path.
- Backward passes pay `~3×` less memory under `jax.checkpoint` for the
  physics path because each checkpoint re-traces a single physics call.
- Radiation sub-cycling honours one timeline (`radiation_should_compute`
  reads `model_step` from `_date` and `radiation_interval` from the
  term config), not three intermixed substage timelines. Same
  semantics as before, cleaner plumbing.

## Appendix — ECHAM6 reference

The proposed JCM structure mirrors ECHAM6's well-established
operator-split design. The relevant code chain in
`echam6.3.0-ham2.3-moz1.0.r7492/src`:

```text
stepon.f90:156      integration_loop: DO                 ← main time loop
stepon.f90:271        CALL scan1                         ← gridpoint phase
  scan1.f90:499         CALL gpc(jrow)
    gpc.f90:78           CALL physc(krow)
      physc.f90:543       CALL cover               ← cloud cover
      physc.f90:566       CALL radiation           ← full RT (sub-cycled)
      physc.f90:678       CALL vdiff               ← vertical diffusion
      physc.f90:776       CALL radheat             ← cached-flux heating
      physc.f90:835       CALL gwspectrum          ← Hines GW drag
      physc.f90:877       CALL ssodrag             ← orographic GW drag
      physc.f90:987       CALL cucall              ← Tiedtke convection
      physc.f90:1067      CALL cloud               ← stratiform + micro
stepon.f90:280        CALL sccd                          ← divergence semi-implicit
stepon.f90:286        CALL scctp                         ← T + Ps semi-implicit
stepon.f90:294        CALL uspnge                        ← upper sponge (in dynamics)
stepon.f90:309        CALL hdiff                         ← horizontal diffusion
stepon.f90:476      END DO integration_loop
```

Physics tendencies are accumulated into the gridpoint buffer arrays
declared at `mo_scan_buffer.f90:36-47` (`vol`, `vom`, `tte`, `qte`,
`xlte`, `xite`, `xtte`) and consumed by `sccd` / `scctp` as the
explicit forcing in the semi-implicit Helmholtz solve. **Physics is
called exactly once per dynamics timestep.**

Two structural differences from JCM:

1. **Dynamics integrator.** ECHAM uses leapfrog + semi-implicit
   (spectral); JCM uses IMEX-RK SIL3 (also spectral, via dinosaur).
   The split point is the same — operator-split physics — but the
   dynamics integrator on each side differs. ECHAM's split is forced
   by leapfrog's single RHS evaluation per `dt`; JCM's SIL3 has
   substages and *could* in principle evaluate physics at each one,
   so for JCM operator-splitting is a real design choice. CAM (HOMME
   spectral element), E3SM (HOMME with sub-cycled advection), and IFS
   (semi-Lagrangian + semi-implicit) op-split despite having more
   than one dynamics evaluation per physics `dt` — the same situation
   JCM-with-SIL3 is in.

2. **Within-physics coupling.** ECHAM is **sequential** (each scheme
   reads the state with prior schemes' tendencies already applied via
   the `tte += ...` pattern on the shared accumulators in
   `mo_scan_buffer.f90`). JCM's `ComposablePhysics` is
   **process-parallel** (every term reads the same input state,
   tendencies are summed). The latter preserves the composability
   story (the order-independence of `A + B + C`) at the cost of a
   small accuracy penalty for tightly-coupled term pairs.

The cross-step physics state in ECHAM lives in module-level globals
(`mo_radiation_forcing`, …) — semantically the same as JCM's
`PhysicsCarryState` pytree, just routed via Fortran's module-level
mutability rather than as an explicit functional carry.