shreyashankar / spade-experimentsLinks
Experiments to assess SPADE on different LLM pipelines.
☆17Updated last year
Alternatives and similar repositories for spade-experiments
Users that are interested in spade-experiments are comparing it to the libraries listed below
Sorting:
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆21Updated 2 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆42Updated last week
- [EMNLP 2024 Main] Virtual Personas for Language Models via an Anthology of Backstories☆27Updated 6 months ago
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆49Updated last month
- ☆26Updated 4 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 4 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.☆48Updated 9 months ago
- ☆17Updated last week
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- Efficient and Scalable Estimation of Tool Representations in Vector Space☆23Updated 9 months ago
- ☆29Updated last year
- ☆33Updated 3 months ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 6 months ago
- The repository contains generative AI analytics platform application code.☆26Updated 3 weeks ago
- AI Evaluation Platform☆46Updated last week
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 3 months ago
- ☆23Updated last year
- ☆75Updated 2 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆18Updated this week
- Streamlit app for recommending eval functions using prompt diffs☆27Updated last year
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆13Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 8 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆12Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆36Updated 7 months ago
- ☆27Updated this week
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆13Updated 3 weeks ago