sim-benchmark.
A public benchmark for developing and evaluating LLM agents on real CAE and EDA simulation workflows.
Current public rows.
| Model | LTspice circuits 20 tasks | OpenFOAM fluids 3 tasks |
|---|---|---|
| Claude Opus 4.6 | 0.986 | 1.000 |
| MiniMax-M2.5-highspeed | 0.936 | 0.408 |
| MiniMax-M2.7 | 0.884 | 0.284 |
Initial task set: 20 LTspice circuit tasks and 3 OpenFOAM fluid tasks. Scores are 0 to 1; GitHub contains the produced files and run details.
From prompt to trusted simulation evidence.
A passing agent has to do more than name the right physics. It has to operate the software, get a run to complete, and report numerical results that can be checked from produced files.
The technical contract, task files, leaderboard artifacts, and reproducing guide are public in the GitHub repo.
Two ways to use it.
The public benchmark is a starting point for both model builders and engineering teams. Labs can use it to develop industrial simulation capability; CAE leaders can use it to evaluate whether agents can actually run simulation work before choosing a model.