202409preprints
New preprint. We extend PlanBench to OpenAI’s o1-preview and o1-mini, and provide a preliminary analysis of the models’ capabilities: LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench.
New preprint. We extend PlanBench to OpenAI’s o1-preview and o1-mini, and provide a preliminary analysis of the models’ capabilities: LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench.