A collection of datasets for machine learning for big code
☆65Oct 8, 2021Updated 4 years ago
Alternatives and similar repositories for ml4code-dataset
Users that are interested in ml4code-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Code Efficiency Benchmark for Code Generation☆14May 26, 2025Updated 11 months ago
- A redistributable subset of the ETH Py150 corpus [https://www.sri.inf.ethz.ch/py150], introduced in the ICML 2020 paper 'Learning and Eva…☆32Aug 11, 2020Updated 5 years ago
- ☆22Nov 17, 2021Updated 4 years ago
- ☆16Nov 12, 2025Updated 5 months ago
- A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection☆24Oct 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [NeurIPS'24] SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning☆28Nov 19, 2024Updated last year
- A dataset for natural language code search.☆14Feb 13, 2020Updated 6 years ago
- ☆16Jun 18, 2024Updated last year
- AexPy /eikspai/ is Api EXplorer in PYthon for detecting API breaking changes in Python packages.☆26Jun 10, 2024Updated last year
- JEMMA: An Extensible Java dataset for Many ML4Code Applications☆19Dec 12, 2022Updated 3 years ago
- Repository for the Adversarial ML on Code things☆16Jun 25, 2020Updated 5 years ago
- Deadline countdowns for academic conferences relevant to the SSE chair.☆13Feb 10, 2026Updated 2 months ago
- Replication package for "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection", ICSE 2024.☆75Sep 24, 2024Updated last year
- This is the tool released in ICSE 2024 paper "Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Er…☆17Jun 5, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- tool of llm-based indirect-call analyzer☆31Feb 18, 2025Updated last year
- ☆16Mar 22, 2024Updated 2 years ago
- PyTorch's implementation of the code2seq model.☆62Jul 25, 2024Updated last year
- Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks☆258Jan 19, 2024Updated 2 years ago
- A Python implementation of a language-agnostic Code Property Graph☆19Jun 10, 2024Updated last year
- Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…☆21Oct 22, 2018Updated 7 years ago
- ☆104Oct 25, 2024Updated last year
- Probing pre-trained source code models☆15Apr 27, 2022Updated 4 years ago
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The replication package of <Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?>. Accepted by IC…☆11Nov 29, 2023Updated 2 years ago
- ☆20Feb 20, 2017Updated 9 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆293Feb 7, 2025Updated last year
- A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries☆364Mar 25, 2021Updated 5 years ago
- ☆44Jun 24, 2025Updated 10 months ago
- 📱 RUNIC tamper detection demo - designed to serve as a parallel for understanding more complex tamper detection and integrity systems su…☆16Apr 13, 2024Updated 2 years ago
- This is the tool released in the ASE'23 paper "Generative Type Inference for Python".☆28Sep 12, 2023Updated 2 years ago
- ComPy-Learn is a framework for exploring program representations for ML4CODE tasks.☆22Aug 7, 2023Updated 2 years ago
- Improving Code Readability Classification using Convolutional Neural Networks☆10Apr 18, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs☆43Oct 18, 2022Updated 3 years ago
- CodeXGLUE☆1,814Apr 23, 2024Updated 2 years ago
- ☆32Jan 14, 2025Updated last year
- This repository contains the code, the dataset and the experimental results related to the paper "Vulnerabilities in AI Code Generators: …☆14Aug 5, 2024Updated last year
- This is the artifact for paper “Are Machine Learning Cloud APIs Used Correctly? (#421)” in ICSE2021☆16Feb 27, 2021Updated 5 years ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆12Dec 9, 2021Updated 4 years ago
- This is the tool released in ICSE 2022 paper "Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python"☆45Oct 19, 2023Updated 2 years ago