SnorkelSpatial
Large language models (LLMs) show remarkable results on solving complex reasoning problems across domains — from mathematical proofs and logical puzzles to graduate-level science and engineering questions. However, their spatial reasoning capabilities are less understood, even though such reasoning underlies many everyday and scientific tasks involving geometry, diagrams, and spatial relations. To expand our understanding of spatial reasoning capabilities of LLMs, we design a simple spatial reasoning benchmark with a variety of problems based on a 2D grid world.
Leaderboard
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 |
99%
|
| 2 | Grok 4 Fast Reasoning |
84.85%
|
| 3 | o3 |
76.67%
|
| 4 | gpt-5 |
73.94%
|
| 5 | gpt-oss-120b |
52.73%
|
| 6 | gpt-5-mini |
45.45%
|
| 7 | Claude Opus 4.1 |
45.15%
|
| 8 | Magistral Medium 1.2 |
44.24%
|
| 9 | Claude Opus 4 |
40.3%
|
| 10 | o3-mini |
37.88%
|
| 11 | Claude Sonnet 4 |
33.33%
|
| 12 | gpt-5-nano |
26.67%
|
| 13 | Claude Sonnet 3.7 |
21.52%
|
| 14 | Gemini 2.5 Flash |
18.79%
|
| 15 | Llama 4 Scout |
15.45%
|
| 16 | Gemini 2.5 Pro |
15.15%
|
| 17 | gpt-5-chat |
14.85%
|
| 18 | Mistral Large |
14.85%
|
| 19 | o4 mini |
14.85%
|
| 20 | GPT-4.1 |
14.55%
|
| 21 | Llama 3.3 70B |
14.55%
|
| 22 | Mistral Medium 3.1 |
14.55%
|
| 23 | Nova Micro |
14.55%
|
| 24 | Command R+ |
14.24%
|
| 25 | Nova Premier |
14.24%
|
| 26 | Qwen 3 235B |
13.94%
|
| 27 | Codestral |
13.64%
|
| 28 | Nova Lite |
13.33%
|
| 29 | Grok 3 |
12.73%
|
| 30 | Magistral Medium |
12.42%
|
| 31 | Llama 4 Maverick |
12.12%
|
| 32 | Nova Pro |
12.12%
|
| 33 | Command R |
11.82%
|
Types of actions
Movements
Rotations








Types of queries
Dataset
Sample task
# Initial States
Two particles P1, P2 on board B1 (5×5, tiles 1–25, zigzag pattern). Board at (2.5, 2.5), facing north.
P1 at (1.5, 2.5) facing south.
P2 at (4.5, 2.5) facing east.
# Actions
1. Rotate P2 by 90 degrees
2. Move P1 BACKWARD 1 unit
3. Rotate P2 by 0 degrees
4. Rotate board B1 by 180 degrees
# Question
What is the orientation of particle P1 after all the actions?
# Response format: JSON
{ "particle_P1_orientation": "..." }

