CosineAI / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆12Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for experiments
- ☆19Updated 3 months ago
- Official homepage for "Self-Harmonized Chain of Thought"☆83Updated 2 months ago
- ☆28Updated 8 months ago
- Fullstack chatbot application☆11Updated 3 months ago
- ☆37Updated this week
- An NVIDIA AI Workbench Example Project for Finetuning Llama 2☆27Updated 2 months ago
- Dynamic Metadata based RAG Framework☆71Updated 3 months ago
- The world's first fully automated VC fund.☆16Updated last week
- ☆40Updated last month
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆65Updated 4 months ago
- ☆66Updated 2 months ago
- Voyage AI Official Python Library☆41Updated 2 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Inference examples☆20Updated 2 months ago
- ☆87Updated 10 months ago
- This repository contains a toy implementation of a basic RAQA system.☆20Updated 5 months ago
- ☆36Updated 3 months ago
- ☆48Updated last year
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆23Updated last year
- ☆45Updated 7 months ago
- Evaluating LLMs with CommonGen-Lite☆85Updated 8 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆47Updated last month
- LlamaWorksDB is a Retrieval Augmented Generation (RAG) product designed to interact with the documentation of various products such as Ll…☆15Updated 6 months ago
- Automatic Evals for Instruction-Tuned Models☆65Updated this week
- A seamless matchmaking application that is programmed with Cohere Command R+, Stanford NLP DSPy framework, Weaviate Vector store and Crew…☆58Updated 7 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆13Updated 8 months ago
- ☆74Updated 3 weeks ago
- Github repo for storing LlamaDatasets☆30Updated 10 months ago
- Experimental Code for StructuredRAG: Structured Outputs in Retrieval-Augmented Generation☆94Updated this week