GAIR-NLP/daVinci-Dev

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GAIR-NLP/daVinci-Dev)

GAIR-NLP / daVinci-Dev

[ICML 2026 Oral] Agent-native Mid-training for Software Engineering

☆74

Alternatives and similar repositories for daVinci-Dev

Users that are interested in daVinci-Dev are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GAIR-NLP / Data-Darwinism
View on GitHub
[ACL 2026] This is the repo of Data Darwinism.
☆26Apr 16, 2026Updated 3 months ago
GAIR-NLP / DataEvolve
View on GitHub
☆31Mar 15, 2026Updated 4 months ago
RUCAIBox / SWE-Master
View on GitHub
☆93Feb 28, 2026Updated 5 months ago
RUCAIBox / SWE-World
View on GitHub
☆49Mar 6, 2026Updated 4 months ago
zhenglw02 / SWE-Hub
View on GitHub
A toolkit for synthesizing high-quality code training data using LLM agents. It provides three independent pipelines, each producing a di…
☆17Mar 10, 2026Updated 4 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
GAIR-NLP / OpenSWE
View on GitHub
☆199Mar 16, 2026Updated 4 months ago
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆23May 15, 2026Updated 2 months ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
GAIR-NLP / AgencyBench
View on GitHub
[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆90Jan 23, 2026Updated 6 months ago
LLM360 / MegaMath
View on GitHub
[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
☆110Apr 4, 2025Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
GAIR-NLP / LIMOPro
View on GitHub
☆15May 27, 2025Updated last year
xszheng2020 / memorization
View on GitHub
An Empirical Study of Memorization in NLP (ACL 2022)
☆13Jun 22, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 4 months ago
djm209 / HSTGODE
View on GitHub
HSTGODE code
☆11Nov 26, 2023Updated 2 years ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
chenjianhuii / Mechanistic-Data-Attribution
View on GitHub
☆16May 25, 2026Updated 2 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
GAIR-NLP / daVinci-Agency
View on GitHub
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
☆38Feb 4, 2026Updated 5 months ago
neulab / agent-data-protocol
View on GitHub
☆187Jul 14, 2026Updated 2 weeks ago
DeepSoftwareAnalytics / Awesome-Issue-Resolution
View on GitHub
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey
☆85Apr 22, 2026Updated 3 months ago
neulab / SWE-Playground
View on GitHub
Official Repository for "Training Versatile Coding Agents in Synthetic Environments"
☆22Jan 11, 2026Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ByteDance-Seed / DATAMASK
View on GitHub
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
☆21Jan 4, 2026Updated 6 months ago
sjtu-sai-agents / Browse-Master
View on GitHub
Official implementation of Browse-Master, a tool-augmented web-search agent.
☆36Aug 22, 2025Updated 11 months ago
AIGeeksGroup / MMA
View on GitHub
MMA: Multimodal Memory Agent
☆23Mar 30, 2026Updated 3 months ago
Open-Galapagos / evolution-fine-tuning
View on GitHub
Official code, models, and dataset for "Evolution Fine-Tuning (EFT): Learning to Discover Across 371 Optimization Tasks"
☆25Jun 30, 2026Updated 3 weeks ago
Timothyxxx / TestTimeTrainingPapers
View on GitHub
☆59Apr 13, 2026Updated 3 months ago
GAIR-NLP / Safety-J
View on GitHub
Safety-J: Evaluating Safety with Critique
☆16Jul 28, 2024Updated 2 years ago
OpenCoder-llm / opc_data_filtering
View on GitHub
Heuristic filtering framework for RefineCode
☆87Mar 13, 2025Updated last year
chenxran / Orion
View on GitHub
[NeurIPS 2021] Open Rule Induction
☆19May 22, 2022Updated 4 years ago
yaof20 / verl
View on GitHub
verl: Volcano Engine Reinforcement Learning for LLMs
☆22Nov 6, 2025Updated 8 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
GAIR-NLP / daVinci-LLM
View on GitHub
☆155Mar 31, 2026Updated 3 months ago
limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
chenllliang / Gradient-Vaccine
View on GitHub
(Unofficial) Implementation of ICLR 2021 paper "Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multil…
☆14Sep 14, 2022Updated 3 years ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
GAIR-NLP / thinking-with-generated-images
View on GitHub
Doodling our way to AGI ✏️ 🖼️ 🧠
☆128May 29, 2025Updated last year
koalazf99 / Awesome-DataCentric-LLM
View on GitHub
Trending projects & awesome papers about data-centric llm studies.
☆40May 20, 2025Updated last year