A collection of datasets for machine learning for big code
☆65Oct 8, 2021Updated 4 years ago
Alternatives and similar repositories for ml4code-dataset
Users that are interested in ml4code-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Code Efficiency Benchmark for Code Generation☆14May 26, 2025Updated 11 months ago
- A redistributable subset of the ETH Py150 corpus [https://www.sri.inf.ethz.ch/py150], introduced in the ICML 2020 paper 'Learning and Eva…☆32Aug 11, 2020Updated 5 years ago
- ☆22Nov 17, 2021Updated 4 years ago
- ☆16Nov 12, 2025Updated 6 months ago
- A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection☆26Oct 8, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NeurIPS'24] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning☆28Nov 19, 2024Updated last year
- A dataset for natural language code search.☆14Feb 13, 2020Updated 6 years ago
- ☆16Jun 18, 2024Updated last year
- Repository for the Adversarial ML on Code things☆16Jun 25, 2020Updated 5 years ago
- Deadline countdowns for academic conferences relevant to the SSE chair.☆13Feb 10, 2026Updated 3 months ago
- Replication package for "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection", ICSE 2024.☆76Sep 24, 2024Updated last year
- A tool for mining graph-based change patterns in Python code☆21Dec 12, 2025Updated 5 months ago
- This is the tool released in ICSE 2024 paper "Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Er…☆17Jun 5, 2023Updated 2 years ago
- tool of llm-based indirect-call analyzer☆31Feb 18, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆16Mar 22, 2024Updated 2 years ago
- Learning from what we know: How to perform vulnerability prediction using noisy historical data, Empirical Software Engineering (EMSE)☆13Sep 20, 2023Updated 2 years ago
- PyTorch's implementation of the code2seq model.☆62Jul 25, 2024Updated last year
- Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks☆258Jan 19, 2024Updated 2 years ago
- A Python implementation of a language-agnostic Code Property Graph☆19Jun 10, 2024Updated last year
- Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…☆21Oct 22, 2018Updated 7 years ago
- ☆104Oct 25, 2024Updated last year
- Probing pre-trained source code models☆15Apr 27, 2022Updated 4 years ago
- The replication package of <Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?>. Accepted by IC…☆11Nov 29, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆20Feb 20, 2017Updated 9 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆294Feb 7, 2025Updated last year
- A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries☆367Mar 25, 2021Updated 5 years ago
- ☆44Jun 24, 2025Updated 10 months ago
- 📱 RUNIC tamper detection demo - designed to serve as a parallel for understanding more complex tamper detection and integrity systems su…☆16Apr 13, 2024Updated 2 years ago
- This is the tool released in the ASE'23 paper "Generative Type Inference for Python".☆28Sep 12, 2023Updated 2 years ago
- ComPy-Learn is a framework for exploring program representations for ML4CODE tasks.☆22Aug 7, 2023Updated 2 years ago
- Improving Code Readability Classification using Convolutional Neural Networks☆10Apr 18, 2018Updated 8 years ago
- HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs☆43Oct 18, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- CodeXGLUE☆1,822Apr 23, 2024Updated 2 years ago
- ☆32Jan 14, 2025Updated last year
- Cottontail: A LLM-Driven Concolic Execution Engine (Accepted by IEEE S&P'26)☆41Dec 4, 2025Updated 5 months ago
- This is the artifact for paper “Are Machine Learning Cloud APIs Used Correctly? (#421)” in ICSE2021☆16Feb 27, 2021Updated 5 years ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆12Dec 9, 2021Updated 4 years ago
- This is the tool released in ICSE 2022 paper "Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python"☆45Oct 19, 2023Updated 2 years ago
- ☆11May 3, 2019Updated 7 years ago