A collection of datasets for machine learning for big code
☆62Oct 8, 2021Updated 4 years ago
Alternatives and similar repositories for ml4code-dataset
Users that are interested in ml4code-dataset are comparing it to the libraries listed below
Sorting:
- ☆22Nov 17, 2021Updated 4 years ago
- [NeurIPS'24] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning☆27Nov 19, 2024Updated last year
- A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection☆22Oct 8, 2024Updated last year
- Deadline countdowns for academic conferences relevant to the SSE chair.☆12Feb 10, 2026Updated 2 weeks ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆12Dec 9, 2021Updated 4 years ago
- A redistributable subset of the ETH Py150 corpus [https://www.sri.inf.ethz.ch/py150], introduced in the ICML 2020 paper 'Learning and Eva…☆32Aug 11, 2020Updated 5 years ago
- Learning from what we know: How to perform vulnerability prediction using noisy historical data, Empirical Software Engineering (EMSE)☆14Sep 20, 2023Updated 2 years ago
- ☆15Nov 12, 2025Updated 3 months ago
- Code Snippet Recommendation from Stack Overflow Post☆19Jun 30, 2021Updated 4 years ago
- A tool for mining graph-based change patterns in Python code☆20Dec 12, 2025Updated 2 months ago
- Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks☆256Jan 19, 2024Updated 2 years ago
- ☆44Jun 24, 2025Updated 8 months ago
- ☆103Oct 25, 2024Updated last year
- This is the artifact for paper “Are Machine Learning Cloud APIs Used Correctly? (#421)” in ICSE2021☆16Feb 27, 2021Updated 5 years ago
- AexPy /eikspai/ is Api EXplorer in PYthon for detecting API breaking changes in Python packages.☆26Jun 10, 2024Updated last year
- A Python implementation of a language-agnostic Code Property Graph☆19Jun 10, 2024Updated last year
- JEMMA: An Extensible Java dataset for Many ML4Code Applications☆19Dec 12, 2022Updated 3 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆291Feb 7, 2025Updated last year
- ☆20Mar 21, 2024Updated last year
- A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries☆352Mar 25, 2021Updated 4 years ago
- Cottontail: A LLM-Driven Concolic Execution Engine (Accepted by IEEE S&P'26)☆36Dec 4, 2025Updated 2 months ago
- Program analysis tools built on tree-sitter (https://github.com/tree-sitter/tree-sitter).☆62Nov 24, 2025Updated 3 months ago
- ☆26May 27, 2024Updated last year
- ☆20Oct 25, 2023Updated 2 years ago
- ☆24Jun 17, 2021Updated 4 years ago
- Learning graph-based code representations for source-level functional similarity detection. ICSE'23☆59Mar 27, 2023Updated 2 years ago
- ☆223Jul 25, 2024Updated last year
- ☠️ Ground-truth dataset for vulnerability prediction (known research datasets and data sources included such as NVD, CVE Details and OSV)…☆104Sep 2, 2023Updated 2 years ago
- ☆23Nov 10, 2023Updated 2 years ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆67Aug 15, 2024Updated last year
- PyTorch's implementation of the code2seq model.☆62Jul 25, 2024Updated last year
- Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"☆25Nov 13, 2021Updated 4 years ago
- This is the tool released in the ASE'23 paper "Generative Type Inference for Python".☆28Sep 12, 2023Updated 2 years ago
- Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…☆21Oct 22, 2018Updated 7 years ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Oct 25, 2023Updated 2 years ago
- ☆33Jan 14, 2025Updated last year
- CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software☆317Jul 30, 2024Updated last year
- R Ultimate 2023 - R for Data Science and Machine Learning, by Packt Publishing☆15Dec 15, 2025Updated 2 months ago
- The most up-to-date list of Turkish ads to block ads on Turkish websites☆20Dec 12, 2025Updated 2 months ago