A toolkit for pre-processing large source code corpora
☆45Sep 30, 2022Updated 3 years ago
Alternatives and similar repositories for codeprep
Users that are interested in codeprep are comparing it to the libraries listed below
Sorting:
- Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlie…☆84Mar 24, 2023Updated 2 years ago
- A Python 3 module that provides functions for splitting identifiers found in source code files.☆48Jan 12, 2023Updated 3 years ago
- IST'21 & SANER'22: Semantic-Preserving Program Transformations☆31Oct 25, 2022Updated 3 years ago
- Learning to Update Natural Language Comments Based on Code Changes: Artifact☆33Oct 24, 2020Updated 5 years ago
- ☆24Jun 17, 2021Updated 4 years ago
- Probing pre-trained source code models☆15Apr 27, 2022Updated 3 years ago
- ☆51Dec 3, 2022Updated 3 years ago
- ☆16Oct 2, 2024Updated last year
- 基于CodeBert预训练模型,微调后/直接对目标数据集进行测试☆14Oct 19, 2021Updated 4 years ago
- Your library for dynamic language modeling☆67Oct 23, 2018Updated 7 years ago
- Implementation of 'Commit message generation for source code change'.☆25Oct 23, 2019Updated 6 years ago
- Adversarial Attack for Pre-trained Code Models☆10Jul 19, 2022Updated 3 years ago
- Code for "Generative Code Modeling with Graphs" (ICLR'19)☆172Dec 8, 2022Updated 3 years ago
- ☆15May 6, 2022Updated 3 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆292Feb 7, 2025Updated last year
- Code search model based the self-attention☆12Oct 16, 2020Updated 5 years ago
- Sequence-to-Sequence Learning for End-to-End Program Repair (IEEE TSE 2019). Open-science repo. http://arxiv.org/pdf/1901.01808☆86Jun 9, 2023Updated 2 years ago
- Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…☆21Oct 22, 2018Updated 7 years ago
- A Systematic Literature Review of Deep Learning in Software Engineering☆20Aug 28, 2024Updated last year
- Automatic Repair Framework that abstract repair tools and bug benchmarks☆72May 1, 2023Updated 2 years ago
- Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.☆211Jul 13, 2020Updated 5 years ago
- ☆16Apr 26, 2021Updated 4 years ago
- The dataset for the variable-misuse task, used in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/f…☆22Aug 19, 2020Updated 5 years ago
- Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].☆195May 28, 2022Updated 3 years ago
- Lyra: A Benchmark for Turducken-Style Code Generation☆15Apr 22, 2022Updated 3 years ago
- Source Code for "A multi-modal transformer-based code summarization approach for smart contracts"☆27Mar 16, 2021Updated 5 years ago
- Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Sourc…☆66Dec 3, 2021Updated 4 years ago
- Probabilistic Type Inference using Graph Neural Networks☆50Dec 9, 2022Updated 3 years ago
- MODIT: On Multi-Modal Learning of Editing Source Code.☆20Apr 24, 2021Updated 4 years ago
- This repo will contain replication package for the paper "Feeding Trees to Transformers for Code Completion"☆99Jun 3, 2022Updated 3 years ago
- Information about the CodedotAI reading group sessions.☆12Aug 16, 2021Updated 4 years ago
- Contrastive Code Representation Learning: functionality-based JavaScript embeddings through self-supervised learning☆169Dec 26, 2021Updated 4 years ago
- ☆18Apr 14, 2021Updated 4 years ago
- DataSet and source code for PyART☆11Nov 27, 2022Updated 3 years ago
- ☆10Jan 8, 2015Updated 11 years ago
- Semantic Scaffolds for Pseudocode-to-Code Generation (accepted by ACL 2020)☆14Jun 7, 2021Updated 4 years ago
- ☆13Jul 6, 2023Updated 2 years ago
- The replication package of <Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?>. Accepted by IC…☆11Nov 29, 2023Updated 2 years ago
- A benchmark for evaluating embeddings of identifiers in source code.☆22Aug 23, 2021Updated 4 years ago