MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics

Abstract

To study the ability to infer physical dynamics from videos and extrapolate them forward in time, we assemble a dataset of 2D Material Point Method (MPM) physical simulations covering rich physical phenomena such as deformable objects, fluids, and emitters. We study code generation and video diffusion approaches on this dataset, identifying their strengths and weaknesses by varying the amount of physically relevant side information. The code generation model, beyond giving a working demonstration of automatic synthesis of MPM simulations, reveals that such an approach struggles with inferring physical parameters from visual input, but relative to video diffusion, produces physically plausible extrapolations forward in time, while the video diffusion model more strongly identifies geometric properties from visual input but produces physically implausible extrapolations.

When does code generation excel?

A VLM-generated simulation program is executed by an explicit MPM solver, so its extrapolations inherit conservation of mass and object permanence by construction — simulated particles cannot spontaneously vanish or appear. Video diffusion has no such guarantee; we regularly see hallucinated motion, disappearing objects, abrupt colour shifts, and temporal drift that grows worse over long horizons. We observe this pattern in the evaluation metrics from our paper. Motion accuracy (W-MAE) measures whether a rollout has similar temporal activity to ground truth; object preservation (collapse score) tracks whether foreground regions remain visible; temporal stability (anomaly rate) flags abrupt frame-to-frame appearance jumps; color composition (CTV) compares foreground colour distributions; and shape overlap (mask IoU) measures spatial alignment of foreground regions.

Below we show selected examples where code generation outperforms video diffusion. Each card compares Ground Truth · Code Gen · Video Gen under the same input condition, with the relevant metric wins labelled underneath.

In each example, the Video Gen column begins with observation frames copied from ground truth, then switches to model-generated extrapolation. Once playback reaches predicted frames, look for a small orange Predicted pill in the top-right corner of that video, plus an orange inset border around the panel.

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

color compositionoverall scoreobject preservationtemporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

succcess demo for VLM
ALSO VLM > VDM on color tv, cpompsite score, and pred collapse score and anomaly.
VDM haluciantes the object positions (elastic)
positional 03 still better than VDM in wmae and especially pred collapse score, but worse in mask iou overlap..

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

object preservationtemporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

metric difference, pred collapse, anomaly, etc.
VLM wins, VDM hallucinates

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

shape overlapmotion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

Liquid VLM wins in mask_iou_mean and wmae
Again more high frequency detail preserved, liquid tends to collapse in VDM volume wise and the fnas are deforming.

Ground Truth

Code Gen

Video Gen

No positions

Code Gen · wins on

temporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terms of anomaly rate and. color rtv.
VDM has weird object deformation and motion from nowhere.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

object preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in object collapse score maximal attempt 01. VDM disapperas green cube.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracycolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in temrs of everything (color, wmae, etc.) VDM just mxies all colors and everything, not precise dynamics.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

object preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terms of object collapse score. VDM disappears red objects.

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

object preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terrsom of pred collapse score. VDM removes one of hte objects. mateiral setting. 09.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

object preservationcolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM bette in temrs of collapses score and color tv.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracyobject preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terms of wmae, hgh levle details, and VDMfails in object presercataion (VLM wins)

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

shape overlaptemporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

succcess demo for VLM. gets quite close
VDM hallucinates positions and shapes.

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in wmae_mean
Liquds are easier to get right, because most of dhynamics are lower dimensional.
BUT VDM tends to miss out on the details and high frequency dynamics.
VLM gets it correct.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracycolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

maximal attempt 5 basically perfect for VLM
material_prediction 03 wrongly implemented sand in VLM, VDM better in terms of wmae, color b.

Positional prediction attempt 04, VDM better, much better in mask iou overlap.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

temporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM better than VDm in terms of anomaly rate. VDM has high anomaly rate. VLM low.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

shape overlapobject preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in temrs of maskiou. VDM incorreclty implmements material
VLM wins in temrs of ojbect collapse score.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in temrs of amount of mask and motion (wmae, mask iou)

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

temporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in temrs of color composition and anomlay rate rsj (VDM makes another extra object).

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of wmae in positional
VLM wins in temrs of wmae in maximal case attemp 09

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in trems of wmae. gets dynamics more correctly.

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

object preservation

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM better than VDM in terms of wmae and ojbect collapse. need to check which VDM video version.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

shape overlaptemporal stability

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM mask iou better and anomaly rate.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM total score better. Metric comparison again.
VDM uncertain and predict dark dynamics.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

color compositionshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM better preserves increasing volume of liquid in emitters and also high frequency liquid details again.
metric wise vlm bettern in wmae, compmosite, colortv, maskiou, etc.

Better in maximanl, positional, material inputs

But attemps 02 can get. color of liqudi wrong in vlm easily. can point that out.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

color composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM more correct in terms of color composition , doesnt mix colors that much. VDM worse.
better in terms of color tv mean

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM better in tersm of overlap, but wrong material. VDM doesnt follow text instructions.

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

color composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM better in terms of color tv.

When does video diffusion excel?

Video diffusion predicts extrapolation frames directly in pixel space, which tends to preserve approximate object placement, spatial overlap, and colour composition more reliably than VLM-generated simulation programs. VLM code sometimes places objects at wrong initial positions or draws incorrect material boundaries, even when the overall dynamics look plausible.

Below we show selected examples where video diffusion outperforms code generation. Each card compares Ground Truth · Code Gen · Video Gen under the same input condition, with the relevant metric wins labelled underneath.

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

color compositionshape overlapoverall score

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in the positional case (attempt 03) (metric win in both color tv, mask iou, etc.)
Material case VLM better in terms of wmea and mask iou.
VDM has easier time getting the right starting positions.

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

color compositionshape overlapmotion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of color tv, maskiou, anomal, wmae, etc.
VLM fails to implement it
Metric comparison

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

color compositionshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of color and maskiou and wmae.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracycolor compositionshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM fails, VDM wins by a lot.

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

motion accuracyshape overlapcolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins by a lot on everyhting, VLM fails

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

motion accuracyshape overlapcolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM much better, VLM fails to implement.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

color compositionshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM bettter in color tv and wmae and overlap. much b etter.

Ground Truth

Code Gen

Video Gen

Minimal info

Video Gen · wins on

motion accuracyshape overlapcolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM much better, VLM fails

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

color composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM better picks up on growing size of the object, which might be hard to implmeent.

Better on meetric wmae, etc.

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

motion accuracycolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of amount of moption and overla.
VLM positional doesnt correclty exaclty predict initial positions

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM better in composite score, wmae, llocation, VLM fails to put objects in right position

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of mask iou, VLM faisl to implemment liquid.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in teerms of mask. implmenets the shapes correctly.

Ground Truth

Code Gen

Video Gen

No positions

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in trersm of mask overlap for positional.
For maximal vlm wins on verything.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of color tv and wmae and overlap.

Ground Truth

Code Gen

Video Gen

Minimal info

Video Gen · wins on

shape overlapmotion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

minimal doesnt get positions correctly. wdm wins in temrs of mask overlap and wmae.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracyshape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of wmae in positional
VLM wins in temrs of wmae in maximal case attemp 09

Ground Truth

Code Gen

Video Gen

No materials

Video Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terms of color, wmae and anomaly rate for maximal info (attempt 03). VLM incorrectly implements liquid material / doesn't recognize it from video, VDM better in material case.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in temrs of maskiou and wmae.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in rems of wame, VLM doesnt implmeent collisions.

Ground Truth

Code Gen

Video Gen

Maximal info

Video Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of wmae.

Ground Truth

Code Gen

Video Gen

—

Video Gen · wins on

overall score

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM fails to implmenet anything, VDM wins.

Ground Truth

Code Gen

Video Gen

—

Video Gen · wins on

overall score

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM fails to implmenet anything, VDM wins.

Complementary strengths

These examples illustrate why the two approaches are complementary rather than competing: on the same scene, code generation and video diffusion often excel on different metric classes. A simple prefix-quality routing rule — use VLM when the generated prefix matches observed frames, otherwise video diffusion — consistently outperforms either model alone.

Below we show selected examples where each model wins on a different set of metrics. Each card compares Ground Truth · Code Gen · Video Gen under the same input condition.

Ground Truth

Code Gen

Video Gen

No positions

Code Gen · wins on

motion accuracyobject preservation

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

Ground Truth

Code Gen

Video Gen

Maximal info

Code Gen · wins on

temporal stability

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM better in terms of maskiou for example. VLM struggles to implement the exact shape.
VDM worse in terms of anomaly rate.

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

motion accuracy

Video Gen · wins on

color composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in temrs of color tv, since its more dim.
VDM does get some weird motion towards the end. can showcase.

Ground Truth

Code Gen

Video Gen

No positions

Code Gen · wins on

temporal stability

Video Gen · wins on

motion accuracy

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

positional attempt 1. VDM wins in temrs of wame. VLM wins in temrs of anomaly (VDM gets another extra ball.)

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

shape overlapmotion accuracy

Video Gen · wins on

color composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VLM wins in terms of overlap and wmae.
VDM wins in rems of color TV

Ground Truth

Code Gen

Video Gen

No materials

Code Gen · wins on

motion accuracy

Video Gen · wins on

shape overlap

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins in terms of overlap, but looses in terms of wmae motion

Ground Truth

Code Gen

Video Gen

No positions

Code Gen · wins on

object preservationtemporal stability

Video Gen · wins on

motion accuracycolor composition

All chips are evaluation wins. Click to highlight a subset for this demo; export saves your picks.

Notes

VDM wins on wmae and color tv
VLM wins on objc collapse and anomaly rate score.

The role of physical input information

VLM outputs change substantially with input quality; video diffusion outputs do not. When material labels or object positions are withheld, VLM exhibits corresponding mismatches — suggesting it genuinely uses structured scene information. Video diffusion rollouts remain nearly identical across input regimes, indicating reliance on short-term visual extrapolation rather than the provided specifications.

Below we show selected candidates side by side under all four input regimes. Maximal info gives the full scene specification; minimal info gives only the observation video; no materials reveals object positions but hides material labels; no positions reveals materials but hides where objects start. Comparing columns within a row shows how each model responds when specific physical information is present or missing.