CosineAI / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆15Updated 7 months ago
Alternatives and similar repositories for experiments:
Users that are interested in experiments are comparing it to the libraries listed below
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- Transform unstructured documents into actionable, structured data with enterprise-grade precision and reliability, ready for large-scale …☆19Updated this week
- ☆14Updated last month
- ☆18Updated 6 months ago
- ☆14Updated last year
- The world's first fully automated VC fund.☆21Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 5 months ago
- Uses a Gradio interface to stream coding related responses from local and cloud based large language models. Pulls context from GitHub Re…☆21Updated last month
- BH hackathon☆14Updated last year
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 10 months ago
- ☆11Updated 9 months ago
- The original BabyAGI, updated with LiteLLM and no vector database reliance (csv instead)☆21Updated 6 months ago
- LlamaWorksDB is a Retrieval Augmented Generation (RAG) product designed to interact with the documentation of various products such as Ll…☆16Updated 11 months ago
- OpenPipe Reinforcement Learning Experiments☆22Updated last month
- ☆32Updated last year
- ☆22Updated 11 months ago
- Tools for merging pretrained large language models.☆19Updated 10 months ago
- AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…☆13Updated this week
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- Streamlit app for recommending eval functions using prompt diffs☆27Updated last year
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated 3 weeks ago
- ☆19Updated 8 months ago
- ☆38Updated 9 months ago
- ☆24Updated last year
- ☆16Updated 11 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- Implementation☆24Updated last month
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 9 months ago
- ☆48Updated last year
- ☆18Updated last month