A collection of datasets for machine learning for big code
☆62Oct 8, 2021Updated 4 years ago
Alternatives and similar repositories for ml4code-dataset
Users that are interested in ml4code-dataset are comparing it to the libraries listed below
Sorting:
- A Code Efficiency Benchmark for Code Generation☆13May 26, 2025Updated 9 months ago
- A redistributable subset of the ETH Py150 corpus [https://www.sri.inf.ethz.ch/py150], introduced in the ICML 2020 paper 'Learning and Eva…☆32Aug 11, 2020Updated 5 years ago
- ☆15Nov 12, 2025Updated 4 months ago
- ☆22Nov 17, 2021Updated 4 years ago
- A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection☆22Oct 8, 2024Updated last year
- [NeurIPS'24] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning☆27Nov 19, 2024Updated last year
- ☆15Jun 18, 2024Updated last year
- A dataset for natural language code search.☆14Feb 13, 2020Updated 6 years ago
- AexPy /eikspai/ is Api EXplorer in PYthon for detecting API breaking changes in Python packages.☆26Jun 10, 2024Updated last year
- JEMMA: An Extensible Java dataset for Many ML4Code Applications☆19Dec 12, 2022Updated 3 years ago
- Repository for the Adversarial ML on Code things☆16Jun 25, 2020Updated 5 years ago
- Deadline countdowns for academic conferences relevant to the SSE chair.☆13Feb 10, 2026Updated last month
- Replication package for "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection", ICSE 2024.☆74Sep 24, 2024Updated last year
- A tool for mining graph-based change patterns in Python code☆20Dec 12, 2025Updated 3 months ago
- This is the tool released in ICSE 2024 paper "Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Er…☆17Jun 5, 2023Updated 2 years ago
- tool of llm-based indirect-call analyzer☆30Feb 18, 2025Updated last year
- ☆16Mar 22, 2024Updated 2 years ago
- Learning from what we know: How to perform vulnerability prediction using noisy historical data, Empirical Software Engineering (EMSE)☆14Sep 20, 2023Updated 2 years ago
- Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks☆256Jan 19, 2024Updated 2 years ago
- Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…☆21Oct 22, 2018Updated 7 years ago
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated last year
- Probing pre-trained source code models☆15Apr 27, 2022Updated 3 years ago
- Program analysis tools built on tree-sitter (https://github.com/tree-sitter/tree-sitter).☆62Nov 24, 2025Updated 3 months ago
- ☆20Feb 20, 2017Updated 9 years ago
- The replication package of <Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?>. Accepted by IC…☆11Nov 29, 2023Updated 2 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆292Feb 7, 2025Updated last year
- A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries☆357Mar 25, 2021Updated 4 years ago
- ☆44Jun 24, 2025Updated 8 months ago
- 📱 RUNIC tamper detection demo - designed to serve as a parallel for understanding more complex tamper detection and integrity systems su…☆15Apr 13, 2024Updated last year
- This is the tool released in the ASE'23 paper "Generative Type Inference for Python".☆28Sep 12, 2023Updated 2 years ago
- Improving Code Readability Classification using Convolutional Neural Networks☆10Apr 18, 2018Updated 7 years ago
- ComPy-Learn is a framework for exploring program representations for ML4CODE tasks.☆22Aug 7, 2023Updated 2 years ago
- HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs☆43Oct 18, 2022Updated 3 years ago
- ☆32Jan 14, 2025Updated last year
- CodeXGLUE☆1,808Apr 23, 2024Updated last year
- Cottontail: A LLM-Driven Concolic Execution Engine (Accepted by IEEE S&P'26)☆37Dec 4, 2025Updated 3 months ago
- This is the artifact for paper “Are Machine Learning Cloud APIs Used Correctly? (#421)” in ICSE2021☆16Feb 27, 2021Updated 5 years ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆12Dec 9, 2021Updated 4 years ago
- This is the tool released in ICSE 2022 paper "Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python"☆45Oct 19, 2023Updated 2 years ago