allenai / codescientist
CodeScientist: An automated scientific discovery system for code-based experiments
☆67Updated this week
Alternatives and similar repositories for codescientist:
Users that are interested in codescientist are comparing it to the libraries listed below
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆107Updated last month
- ⚖️ Awesome LLM Judges ⚖️☆87Updated last month
- An automated tool for discovering insights from research papaer corpora☆137Updated 9 months ago
- ☆61Updated last month
- ☆50Updated 4 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆140Updated 3 weeks ago
- ☆74Updated 5 months ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- The first dense retrieval model that can be prompted like an LM☆68Updated 6 months ago
- AWM: Agent Workflow Memory☆252Updated 2 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆48Updated last month
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆43Updated 5 months ago
- Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.01935☆87Updated 2 weeks ago
- Verdict is a library for scaling judge-time compute.☆190Updated 2 weeks ago
- ☆188Updated last month
- Official code of the paper "SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation"☆105Updated 3 months ago
- ☆85Updated 6 months ago
- ☆107Updated 2 weeks ago
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 3 weeks ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆120Updated last month
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆56Updated 2 weeks ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆74Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 6 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆169Updated 2 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆63Updated 3 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆190Updated 8 months ago
- An agent benchmark with tasks in a simulated software company.☆274Updated 2 weeks ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆70Updated last month
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 2 months ago