Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Pan Lu¹, Baolin Peng², Hao Cheng², Michel Galley², Kai-Wei Chang¹, Ying Nian Wu¹, Song-Chun Zhu¹, Jianfeng Gao²
¹University of California, Los Angeles ² Microsoft Research, Redmond

💥 Accepted to NeurIPS 2023.

💥 Best Weekly AI Paper (by AlphaSignal, 1st in 1682, 0.06%).

📝 Paper 💻 Github 🎥 YouTube 🔥 Coverage 🐦 Twitter 📸 Poster

a beautiful painting of a llama following the instructions of the AI robot, by studio ghibli, octane render, brilliantly coloured

Examples from our Chameleon with GPT-4 on ScienceQA, a multi-modal question answering benchmark in scientific domains.
Chameleon is adaptive to different queries by synthesizing programs to compose various tools and executing them sequentially to get final answers.

Discover Chameleon, our cutting-edge compositional reasoning framework designed to enhance large language models (LLMs) and overcome their inherent limitations, such as outdated information and lack of precise reasoning. By integrating various tools such as vision models, web search engines, Python functions, and rule-based modules, Chameleon delivers more accurate, up-to-date, and precise responses, making it a game-changer in the natural language processing landscape. With GPT-4 at its core, Chameleon has showcased exceptional improvements in accuracy on benchmark tasks, outperforming competitors and setting new industry standards.

Significant improvements are observed for Chameleon over both fine-tuned models and few-shot prompted GPT-4/ChatGPT

TabMWP

ScienceQA

What Plan does Chameleon synthesize?

The GPT-4 planner is capable of making good decisions on how to sequence tools in a few-shot setup.
For ScienceQA, GPT-4 often relies on either the knowledge retriever or Bing search, but rarely both.
On TabMWP, there are two main modes observed: either going through the solution-executor module or via the program verifier and executor.

Transitions between modules in programs generated by Chameleon (GPT-4) on ScienceQA. START is the start symbol, END is a terminal symbol and the others are non-terminal symbols.

Transitions between modules in programs generated by Chameleon (GPT-4) on TabMWPQA. START is the start symbol, END is a terminal symbol and the others are non-terminal symbols.

Tools called in the generated programs from Chameleon (ChatGPT) and Chameleon (GPT-4) on ScienceQA

Tools called in the generated programs from Chameleon (ChatGPT) and Chameleon (GPT-4) on TabMWP

Chameleon (GPT-4)

Chameleon (ChatGPT)

TBA

Our work is featured by WorldofAI

Citation

If the paper inspires you and the data is used in your research, please cite us:

@inproceedings{lu2023chameleon,
  title={Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models},
  author={Lu, Pan and Peng, Baolin and Cheng, Hao and Galley, Michel and Chang, Kai-Wei and Wu, Ying Nian and Zhu, Song-Chun and Gao, Jianfeng},
  booktitle={The 37th Conference on Neural Information Processing Systems (NeurIPS)},
  year={2023}
}

Release and License

The data is intended solely for research and non-commercial purposes. Its use is subject to the Terms of Use for data generated by OpenAI. If you discover any potential violations, please contact us. Additionally, the code is governed by the Apache License 2.0.

The Team

Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao

Acknowledgement

We would like to thank Chunyuan Li, Qiuyuan Huang, and other members of the Deep Learning group at Microsoft Research for their valuable discussions.