CosineAI / experimentsLinks
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆15Updated 11 months ago
Alternatives and similar repositories for experiments
Users that are interested in experiments are comparing it to the libraries listed below
Sorting:
- ☆18Updated 2 weeks ago
- Public repository containing METR's DVC pipeline for eval data analysis☆96Updated 4 months ago
- ☆24Updated 3 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆18Updated last year
- Everything for the Paper: 'Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing'☆17Updated last year
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆43Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1☆21Updated last week
- A framework for few-shot evaluation of autoregressive language models.☆12Updated last month
- Interactive Textbook Demo☆45Updated last year
- LLM reads a paper and produce a working prototype☆57Updated 4 months ago
- ☆54Updated 9 months ago
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 9 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated 7 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆73Updated 8 months ago
- Inference examples☆56Updated 6 months ago
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆47Updated 3 weeks ago
- CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval☆18Updated 2 months ago
- ☆55Updated 2 months ago
- ☆39Updated last year
- Automatic Prompt Optimization☆40Updated last year
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- ☆23Updated 3 months ago
- ☆18Updated last year
- ☆20Updated 5 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆31Updated last week
- you.com's framework for evaluating deep research systems.☆29Updated 3 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 8 months ago
- ☆34Updated 3 weeks ago