TORA: Topological Representation Alignment
for 3D Shape Assembly

ETH Zurich
*Equal contribution
TL;DR: A topology-first teacher–student framework that distills relational geometric structure from a frozen 3D encoder into a flow-matching model, achieving state-of-the-art 3D shape assembly with zero inference overhead.

Abstract

Flow-matching methods for 3D shape assembly learn point-wise velocity fields that transport parts toward assembled configurations, yet they receive no explicit guidance about which cross-part interactions should drive the motion. We introduce TORA, a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. We first realize this via simple instantiation, token-wise cosine matching, which injects the learned geometric descriptors from the teacher representation. We then extend to employ a Centered Kernel Alignment (CKA) loss to match the similarity structure between student and teacher representations for enhanced topological alignment. Through systematic probing of diverse 3D encoders, we show that geometry- and contact-centric teacher properties, not semantic classification ability, govern alignment effectiveness, and that alignment is most beneficial at later transformer layers where spatial structure naturally emerges. TORA introduces zero inference overhead while yielding two consistent benefits: faster convergence (up to 5.9×) and improved accuracy in-distribution, along with greater robustness under domain shift. Experiments on five benchmarks spanning geometric, semantic, and inter-object assembly demonstrate state-of-the-art performance, with particularly pronounced gains in zero-shot transfer to unseen real-world and synthetic datasets.

5.9×
Faster Convergence
TORA reaches baseline peak performance up to 5.9× faster on Breaking Bad, significantly reducing training cost.
95.7%
Part Accuracy
State-of-the-art assembly accuracy on the Breaking Bad benchmark, improving over RPF baseline (93.2%).
0
Inference Overhead
The teacher encoder is only used during training. At test time, TORA adds strictly zero computational overhead.
94.4%
Zero-shot Transfer
Strong generalization to unseen datasets (Breaking Bad Artifact), demonstrating robust domain transfer.

Consistent Gains Across Diverse Assembly Settings

TORA outperforms RPF in every setting—geometric, semantic, and inter-object. Higher Part Accuracy ↑ is better; lower Rotation Error ↓ and Translation Error ↓ are better.

Geometric assembly
Breaking Bad benchmark
Part Accuracy ↑
RPF
93.2%
Ours
95.7% +2.5%
Rotation Error ↓
RPF
16.0°
Ours
8.6° 46% ↓
Translation Error ↓
RPF
4.3 cm
Ours
2.1 cm 51% ↓
Many-part geometric
Breaking Bad with [21, 33] parts
Part Accuracy ↑
RPF
62.1%
Ours
71.7% +9.6%
Rotation Error ↓
RPF
77.3°
Ours
64.8° 16% ↓
Translation Error ↓
RPF
15.2 cm
Ours
12.5 cm 18% ↓
Semantic assembly
PartNet-Assembly benchmark
Part Accuracy ↑
RPF
59.8%
Ours
69.1% +9.3%
Rotation Error ↓
RPF
46.2°
Ours
40.8° 12% ↓
Translation Error ↓
RPF
21.5 cm
Ours
18.8 cm 13% ↓
Inter-object assembly
TwoByTwo benchmark
Part Accuracy ↑
RPF
65.4%
Ours
71.5% +6.1%
Rotation Error ↓
RPF
15.8°
Ours
10.0° 37% ↓
Translation Error ↓
RPF
11.9 cm
Ours
7.6 cm 36% ↓
Input
RPF
Ours
GT

Assembly Flow Visualization

Drag the slider to control the flow from noise (disassembled) to assembled state.

0.00

Convergence Analysis

TORA consistently accelerates optimization, reaching the RPF baseline's peak performance up to 5.9× faster.

5.9×
Breaking Bad
Fastest convergence speedup, reaching higher final accuracy.
3.3×
PartNet-Assembly
Consistent acceleration over cosine (2.2×) and NT-Xent (1.8×).
+6.1%
TwoByTwo
Most pronounced gains under distribution shift with improved final accuracy.

What Makes a Good Teacher?

Geometry- and contact-centric properties predict assembly transfer, not semantic classification ability.

r = −0.04
Classification Accuracy ↔ Assembly
Near-zero correlation: global semantic understanding does NOT predict assembly quality.
r = 0.94
Segmentation F1 ↔ Assembly
Strongest correlation: mating-surface segmentation is highly predictive of downstream gains.

We adopt Uni3D as the default teacher, as it consistently achieves the strongest geometry/contact probe performance and yields the best downstream assembly accuracy across all alignment objectives.

BibTeX

@inproceedings{anonymous2026tora,
  title     = {TORA: Topological Representation Alignment
               for 3D Shape Assembly},
  author    = {Lee, Nahyuk and Chen, Zhiang and Pollefeys, Marc and Hong, Sunghwan},
  year      = {2026}
}