TORA: Topological Representation Alignment for 3D Shape Assembly

Abstract

Flow-matching methods for 3D shape assembly learn point-wise velocity fields that transport parts toward assembled configurations, yet they receive no explicit guidance about which cross-part interactions should drive the motion. We introduce TORA, a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. We first realize this via simple instantiation, token-wise cosine matching, which injects the learned geometric descriptors from the teacher representation. We then extend to employ a Centered Kernel Alignment (CKA) loss to match the similarity structure between student and teacher representations for enhanced topological alignment. Through systematic probing of diverse 3D encoders, we show that geometry- and contact-centric teacher properties, not semantic classification ability, govern alignment effectiveness, and that alignment is most beneficial at later transformer layers where spatial structure naturally emerges. TORA introduces zero inference overhead while yielding two consistent benefits: faster convergence (up to 6.9×) and improved accuracy in-distribution, along with greater robustness under domain shift. Experiments on five benchmarks spanning geometric, semantic, and inter-object assembly demonstrate state-of-the-art performance, with particularly pronounced gains in zero-shot transfer to unseen real-world and synthetic datasets.

6.9×

Faster Convergence

TORA reaches baseline peak performance up to 6.9× faster on Breaking Bad, significantly reducing training cost.

95.7%

Part Accuracy

State-of-the-art assembly accuracy on the Breaking Bad benchmark, improving over RPF baseline (93.2%).

0

Inference Overhead

The teacher encoder is only used during training. At test time, TORA adds strictly zero computational overhead.

94.4%

Zero-shot Transfer

Strong generalization to unseen datasets (Breaking Bad Artifact), demonstrating robust domain transfer.

Consistent Gains Across Diverse Assembly Settings

TORA outperforms RPF in every setting—geometric, semantic, and inter-object. Higher Part Accuracy ↑ is better; lower Rotation Error ↓ and Translation Error ↓ are better.

Geometric assembly

Breaking Bad benchmark

Part Accuracy ↑

RPF

93.2%

Ours

95.7% +2.5%

Rotation Error ↓

RPF

16.0°

Ours

8.6° 46% ↓

Translation Error ↓

RPF

4.3 cm

Ours

2.1 cm 51% ↓

Many-part geometric

Breaking Bad with [21, 33] parts

Part Accuracy ↑

RPF

62.1%

Ours

71.7% +9.6%

Rotation Error ↓

RPF

77.3°

Ours

64.8° 16% ↓

Translation Error ↓

RPF

15.2 cm

Ours

12.5 cm 18% ↓

Semantic assembly

PartNet-Assembly benchmark

Part Accuracy ↑

RPF

59.8%

Ours

69.1% +9.3%

Rotation Error ↓

RPF

46.2°

Ours

40.8° 12% ↓

Translation Error ↓

RPF

21.5 cm

Ours

18.8 cm 13% ↓

Inter-object assembly

TwoByTwo benchmark

Part Accuracy ↑

RPF

65.4%

Ours

71.5% +6.1%

Rotation Error ↓

RPF

15.8°

Ours

10.0° 37% ↓

Translation Error ↓

RPF

11.9 cm

Ours

7.6 cm 36% ↓

Input

RPF

Ours

GT

Convergence Analysis

TORA consistently accelerates optimization, reaching the RPF baseline's peak performance up to 6.9× faster.

EMA smoothing off

Part Accuracy (PA) vs training epoch. Ours (CKA / Cos-dist / NT-Xent) are TORA variants; RPF is baseline.

Zero-shot Transfer to Unseen Domains

Without any fine-tuning, TORA generalizes across un-seen & real-world fracture, consistently outperforming RPF.
Higher Part Accuracy ↑ is better; lower Rotation Error ↓ and Translation Error ↓ are better.

Breaking Bad - Artifact

Synthetic multi-part geometric assembly

Part Accuracy ↑

RPF

88.3%

Ours

94.4% +6.1%

Rotation Error ↓

RPF

20.9°

Ours

8.0° 62% ↓

Translation Error ↓

RPF

5.3 cm

Ours

2.1 cm 60% ↓

FRACTURA

Real-world multi-part geomtric assembly

Part Accuracy ↑

RPF

68.1%

Ours

76.0% +7.9%

Rotation Error ↓

RPF

50.1°

Ours

36.4° 27% ↓

Translation Error ↓

RPF

11.2 cm

Ours

7.7 cm 31% ↓

Fantastic Breaks

Real-world 2-part geometric assembly

Part Accuracy ↑

RPF

96.9%

Ours

97.2% +0.3%

Rotation Error ↓

RPF

6.3°

Ours

3.5° 44% ↓

Translation Error ↓

RPF

1.5 cm

Ours

0.9 cm 40% ↓

Input

RPF

Ours

GT

BibTeX

@article{lee2026tora,
  title     = {TORA: Topological Representation Alignment for 3D Shape Assembly},
  author    = {Lee, Nahyuk and Chen, Zhiang and Pollefeys, Marc and Hong, Sunghwan},
  journal   = {arXiv preprint arXiv:2604.04050},
  year      = {2026}
}

TORA: Topological Representation Alignmentfor 3D Shape Assembly

Abstract

Consistent Gains Across Diverse Assembly Settings

Convergence Analysis

Zero-shot Transfer to Unseen Domains

BibTeX

TORA: Topological Representation Alignment
for 3D Shape Assembly