LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Rutgers University
Under Review for IROS'24

System Run-through

Abstract

We present LGMCTS, a framework that uniquely combines language guidance with geometrically informed sampling distributions to effectively rearrange objects according to geometric patterns dictated by natural language descriptions. Language-Guided Monte-Carlo Tree Search (LGMCTS) to create feasible action plans that ensure executable semantic object rearrangement.

We present a comprehensive comparison with leading approaches that use language to generate goal rearrangements independently of actionable planning, including Structformer, StructDiffusion, and Code as Policies. We also present a new benchmark, the Executable Language Guided Rearrangement (ELGR) Bench, containing tasks involving intricate geometry. With the ELGR bench, we show limitations of task and motion planning (TAMP) solutions that are purely based on Large Language Models (LLM) such as Code as Policies and Progprompt on such tasks.

Our findings advocate for using LLMs to generate intermediary representations rather than direct action planning in geometrically complex rearrangement scenarios, aligning with perspectives from recent literature.

Language Guided Monte Carlo Tree Search (LGMCTS)

We introduce the LGMCTS technique, crafted for executable semantic object rearrangement. LGMCTS employs LLMs to process free-form language, perceiving object placements as probability distributions rather than fixed points. The problem is viewed as sequential sampling, where language and the present scene influence each object's pose distribution. This method accommodates distracting objects, ensuring plans satisfy linguistic specifications while being executable.

Key Contributions

1. A method that simultaneously tackles semantic rearrangement goal generation and action planning.
2. A solution promoting zero-shot multi-pattern composition in semantic object rearrangement.
3. The establishment of a new benchmark (ELGR) specifically for executable semantic object rearrangement that highlights the drawbacks of existing LLM-driven solutions and other Learning based semantic rearrangement methods.

Distribution Generation

This illustrates the (x, y) prior for the 'line' pattern. Sequence from left to right represents stages K = 0 to K = 3. White stars indicate sampled poses. At K = 0, poses can be placed freely. At K = 1, poses are sampled outside a specified circle. For K ≥ 2, all poses align with the line formed by the initial two poses.


For each object, we compute a conditional distribution using two components: a pattern prior function and a boolean function indicating free space. The pattern prior function is selected from a database using Sentence-BERT embeddings of given language instructions, matching the closest predefined function. Within the LGMCTS context, patterns are described using parametric curves.

About the sequential sampling process

1. When no objects are sampled, the first object can be placed randomly.
2. With one object sampled, the next must maintain a minimum distance from the first.
3. From the third object onwards, they are placed based on the parametric curve's gradient angle and some variance.

Patterns are categorized as ordered or unordered based on whether a specific sequence is required. The distribution generation process is illustrated with a lines pattern in the accompanying figure above.


Results

Demonstrations using a real UR5 robot

UR5 results

The instructions for each of the five scenarios are:
(a) Arrange all blocks in a circle and place the white bottle in front of a block.
(b) Position all boxes in a rectangle and situate the white bottle to the left of a box.
(c) Line up the bottles and arrange the phones in a separate line.
(d) Align all yellow objects in a straight line.
(e) Organize all phones in a line.

Dotted lines suggest shape patterns, while red arrows denote spatial directions (e.g., left, right, front, back). These experiments validate LGMCTS's capability to interpret intricate linguistic commands and manage challenging initial setups and pattern combinations.


Qualitative Comparison

Qualitative

LGMCTS outperforms StructFormer and StructDiffusion by guaranteeing a collision-free goal arrangement in a qualitative comparison.

BibTeX

@misc{chang2023lgmcts,
      title={LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement}, 
      author={Haonan Chang and Kai Gao and Kowndinya Boyalakuntla and Alex Lee and Baichuan Huang and Harish Udhaya Kumar and Jinjin Yu and Abdeslam Boularias},
      year={2023},
      eprint={2309.15821},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
      }