We present LGMCTS, a framework that uniquely combines language guidance with geometrically informed sampling distributions to effectively rearrange objects according to geometric patterns dictated by natural language descriptions. Language-Guided Monte-Carlo Tree Search (LGMCTS) to create feasible action plans that ensure executable semantic object rearrangement.
We present a comprehensive comparison with leading approaches that use language to generate goal rearrangements independently of actionable planning, including Structformer, StructDiffusion, and Code as Policies. We also present a new benchmark, the Executable Language Guided Rearrangement (ELGR) Bench, containing tasks involving intricate geometry. With the ELGR bench, we show limitations of task and motion planning (TAMP) solutions that are purely based on Large Language Models (LLM) such as Code as Policies and Progprompt on such tasks.
Our findings advocate for using LLMs to generate intermediary representations rather than direct action planning in geometrically complex rearrangement scenarios, aligning with perspectives from recent literature.
We introduce the LGMCTS technique, crafted for executable semantic object rearrangement. LGMCTS employs LLMs to process free-form language, perceiving object placements as probability distributions rather than fixed points. The problem is viewed as sequential sampling, where language and the present scene influence each object's pose distribution. This method accommodates distracting objects, ensuring plans satisfy linguistic specifications while being executable.
1. A method that simultaneously tackles semantic rearrangement goal generation and action planning.
2. A solution promoting zero-shot multi-pattern composition in semantic object rearrangement.
3. The establishment of a new benchmark (ELGR) specifically for executable semantic object rearrangement that highlights the drawbacks of existing LLM-driven solutions and other Learning based semantic rearrangement methods.
This illustrates the (x, y) prior for the 'line' pattern. Sequence from left to right represents stages K = 0 to K = 3. White stars indicate sampled poses. At K = 0, poses can be placed freely. At K = 1, poses are sampled outside a specified circle. For K ≥ 2, all poses align with the line formed by the initial two poses.
For each object, we compute a conditional distribution using two components: a pattern prior function and a boolean function indicating free space.
The pattern prior function is selected from a database using Sentence-BERT embeddings of given language instructions, matching the closest predefined function.
Within the LGMCTS context, patterns are described using parametric curves.
The instructions for each of the five scenarios are:
(a) Arrange all blocks in a circle and place the white bottle in front of a block.
(b) Position all boxes in a rectangle and situate the white bottle to the left of a box.
(c) Line up the bottles and arrange the phones in a separate line.
(d) Align all yellow objects in a straight line.
(e) Organize all phones in a line.
Dotted lines suggest shape patterns, while red arrows denote spatial directions (e.g., left, right, front, back).
These experiments validate LGMCTS's capability to interpret intricate linguistic commands and manage challenging initial setups and pattern combinations.
LGMCTS outperforms StructFormer and StructDiffusion by guaranteeing a collision-free goal arrangement in a qualitative comparison.
@misc{chang2023lgmcts,
title={LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement},
author={Haonan Chang and Kai Gao and Kowndinya Boyalakuntla and Alex Lee and Baichuan Huang and Harish Udhaya Kumar and Jinjin Yu and Abdeslam Boularias},
year={2023},
eprint={2309.15821},
archivePrefix={arXiv},
primaryClass={cs.RO}
}