CosineAI / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆15Updated 6 months ago
Alternatives and similar repositories for experiments:
Users that are interested in experiments are comparing it to the libraries listed below
- Public repository containing METR's DVC pipeline for eval data analysis☆33Updated this week
- ☆15Updated 6 months ago
- Uses a Gradio interface to stream coding related responses from local and cloud based large language models. Pulls context from GitHub Re…☆20Updated 3 weeks ago
- A python command-line tool to download & manage MLX AI models from Hugging Face.☆17Updated 7 months ago
- ☆50Updated 4 months ago
- Streamlit app for recommending eval functions using prompt diffs☆27Updated last year
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆15Updated last year
- Tools for merging pretrained large language models.☆19Updated 9 months ago
- Tutorial for DSPy☆23Updated 11 months ago
- AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…☆13Updated 2 weeks ago
- A repository of projects and datasets under active development by Alignment Lab AI☆22Updated last year
- The world's first fully automated VC fund.☆20Updated 2 weeks ago
- ☆26Updated 3 weeks ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 9 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 9 months ago
- ☆35Updated last week
- ☆50Updated 3 weeks ago
- Automatic Prompt Optimization☆28Updated 10 months ago
- BH hackathon☆14Updated 11 months ago
- ☆24Updated last year
- FinRAG: Financial Retrieval Augmented Generation☆17Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Inference examples☆40Updated last month
- Interactive Textbook Demo☆40Updated last year
- ☆20Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆56Updated 2 weeks ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- ☆29Updated last year
- Proceedings of Innovative Use of NLP for Building Educational Applications 2023: SIGHT: A Large Annotated Dataset on Student Insights Gat…☆10Updated 8 months ago
- ☆41Updated 3 months ago