Sep 25, 2024 | “Chain of Thoughtlessness? An Analysis of CoT in Planning” has been accepted to the main track of NeurIPS 2024! And “LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench” has been accepted to the Open World Agents Workshop. |
Sep 22, 2024 | New preprint. We extend PlanBench to OpenAI’s o1-preview and o1-mini, and provide a preliminary analysis of the models’ capabilities: LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench. |
May 31, 2024 | New preprint analyzing how chain of thought approaches break down out-of-distribution: Chain of Thoughtlessness? An Analysis of CoT in Planning. |
May 01, 2024 | LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks accepted into ICML 2024 and awarded a spotlight distinction. |
Feb 29, 2024 | New preprint, extending our work from last year on the efficacy of LLM self-verification: On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks. |
Dec 10, 2023 | I’m at NeurIPS 2023 together with Karthik Valmeekam and Lin Guan from the Yochan Lab. We’ll be presenting 1, 2, 3, 4, and 5, covering various facets of LLM planning and reasoning abilities (or lack thereof). |
Nov 16, 2023 | GPT-4 Doesn’t Know It’s Wrong: An Analysis of Iterative Prompting for Reasoning Problems accepted into the 2023 NeurIPS Foundation Models for Decision Making workshop. |
Aug 01, 2023 | Started my Linguistics M.A. at Arizona State University. I also joined the Yochan Lab in ASU’s school of computing and AI. |