x66ccff / liveideabenchLinks
π€π‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
β15Updated 3 months ago
Alternatives and similar repositories for liveideabench
Users that are interested in liveideabench are comparing it to the libraries listed below
Sorting:
- LLM for Scientific Research Surveyβ98Updated 6 months ago
- A curated list of papers on LLMs and agents for scientific research and developmentβ70Updated 8 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"β120Updated 9 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discoveryβ97Updated 2 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award β¦β42Updated 9 months ago
- Code/data for MARG (multi-agent review generation)β47Updated 8 months ago
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590β64Updated last week
- Process Reward Models That Thinkβ47Updated last month
- [ACL 2025] An inference-time decoding strategy with adaptive foresight samplingβ104Updated 2 months ago
- β66Updated 4 months ago
- [ACL 2025] Agentic Knowledgeable Self-awarenessβ80Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examplesβ104Updated 2 weeks ago
- Official Implementation of the Baby-AIGS systemβ23Updated 8 months ago
- β23Updated 7 months ago
- β59Updated last month
- A collection of resources and papers on AI Scientist / Robot Scientistβ86Updated 2 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"β100Updated last month
- The repository for ACL 2024 paper "TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models"β31Updated last year
- RL Scaling and Test-Time Scaling (ICML'25)β110Updated 6 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Aβ¦β47Updated last year
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learningβ52Updated 2 months ago
- Evaluate the Quality of Critiqueβ36Updated last year
- The official repo for the code and data of paper SMARTβ31Updated 5 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoningβ48Updated 9 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"β57Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimizationβ64Updated 5 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionβ76Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β127Updated 4 months ago
- GΓΆdel Agent: A Self-Referential Agent Framework for Recursive Self-Improvementβ122Updated 5 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ61Updated 8 months ago