Stage 1: Task and Scene Construction
Overview

Scene construction forms the foundation of sim-to-real transfer by establishing high-fidelity digital representations of physical workspaces. This process encompasses three critical components: asset acquisition and modeling, spatial relationship measurement, and simulation configuration with stochastic variations to enhance robustness.
-
Asset Acquisition: Comprehensive identification and modeling of task-relevant physical elements including robotic systems, environmental fixtures, and manipulable objects. Assets are sourced through geometric primitives, established repositories (PartNet-Mobility, YCB Dataset), or custom CAD models tailored to specific operational requirements.
-
Spatial Quantification: Precise measurement of component positions and orientations using metrology tools (laser rangefinders, calibration boards). Documentation encompasses static configurations (robot base frames, fixture locations), stochastic object placement distributions, and workspace boundaries to ensure geometric fidelity.
-
Randomization Configuration: Implementation of measured spatial relationships within the simulation framework, incorporating controlled randomization to promote policy generalization.
Task Example
We demonstrate the scene construction methodology through two representative manipulation scenarios: kitchen and canteen tasks.
Task Description
Kitchen task: Sequential manipulation involving bowl grasping, microwave placement, and door closure.
Canteen task: Multi-object manipulation sequence: fork grasping, placement in designated area, followed by plate manipulation.
Asset Collection
Kitchen task assets:
- UR5 manipulator with fixed base frame
- Work surface geometry
- Articulated microwave model (kinematic door joint)
- Bowl (rigid body with grasp affordances)
Canteen task assets:
- UR5 manipulator with fixed base frame
- Work surface geometry
- Deformable placement areas
- Fork (rigid body with grasp affordances)
- Plate (rigid body with grasp affordances)
Position Measurement
Spatial configuration from physical workspace measurements:

Simulation Implementation
IsaacLab scene configuration based on empirical measurements:
Configuration Requirements:
- Consistent coordinate frame transformations
- Physically-plausible material properties
- Robust collision geometry and contact dynamics
Code Reference
Implementation references:
Kitchen Task Scene Configuration:
- File path:
https://github.com/ByteDance-Seed/manip-as-in-sim-suite/blob/main/wbcmimic/source/isaaclab_mimic/isaaclab_mimic/tasks/manager_based/ur5_sim/ur5_put_bowl_in_microwave_and_close.py
- Defines spatial configurations and physical properties for task-relevant assets
Canteen Task Scene Configuration:
- File path:
https://github.com/ByteDance-Seed/manip-as-in-sim-suite/blob/main/wbcmimic/source/isaaclab_mimic/isaaclab_mimic/tasks/manager_based/ur5_sim/ur5_clean_plate.py
- Implements multi-object scene composition and interaction dynamics
Next Steps
Following scene construction, Stage 2 addresses camera calibration to establish correspondence between simulated and physical visual observations—a prerequisite for vision-based control policies.