A toolkit for pre-processing large source code corpora
☆45Sep 30, 2022Updated 3 years ago
Alternatives and similar repositories for codeprep
Users that are interested in codeprep are comparing it to the libraries listed below
Sorting:
- Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlie…☆84Mar 24, 2023Updated 2 years ago
- ☆51Dec 3, 2022Updated 3 years ago
- Learning to Update Natural Language Comments Based on Code Changes: Artifact☆33Oct 24, 2020Updated 5 years ago
- IST'21 & SANER'22: Semantic-Preserving Program Transformations☆31Oct 25, 2022Updated 3 years ago
- A Python 3 module that provides functions for splitting identifiers found in source code files.☆48Jan 12, 2023Updated 3 years ago
- ☆24Jun 17, 2021Updated 4 years ago
- 基于CodeBert预训练模型,微调后/直接对目标数据集进行测试☆14Oct 19, 2021Updated 4 years ago
- Implementation of 'Commit message generation for source code change'.☆25Oct 23, 2019Updated 6 years ago
- Probing pre-trained source code models☆15Apr 27, 2022Updated 3 years ago
- ☆10Jan 8, 2015Updated 11 years ago
- A Systematic Literature Review of Deep Learning in Software Engineering☆20Aug 28, 2024Updated last year
- Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].☆195May 28, 2022Updated 3 years ago
- ☆16Apr 26, 2021Updated 4 years ago
- Your library for dynamic language modeling☆66Oct 23, 2018Updated 7 years ago
- ☆18Apr 14, 2021Updated 4 years ago
- MODIT: On Multi-Modal Learning of Editing Source Code.☆20Apr 24, 2021Updated 4 years ago
- Code for "Generative Code Modeling with Graphs" (ICLR'19)☆172Dec 8, 2022Updated 3 years ago
- Sequence-to-Sequence Learning for End-to-End Program Repair (IEEE TSE 2019). Open-science repo. http://arxiv.org/pdf/1901.01808☆86Jun 9, 2023Updated 2 years ago
- The dataset for the variable-misuse task, used in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/f…☆22Aug 19, 2020Updated 5 years ago
- A benchmark for evaluating embeddings of identifiers in source code.☆22Aug 23, 2021Updated 4 years ago
- ☆11Jul 20, 2021Updated 4 years ago
- Website for "A Survey of Machine Learning for Big Code and Naturalness"☆291Feb 7, 2025Updated last year
- Contrastive Code Representation Learning: functionality-based JavaScript embeddings through self-supervised learning☆169Dec 26, 2021Updated 4 years ago
- Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Sourc…☆66Dec 3, 2021Updated 4 years ago
- ☆24Jan 19, 2022Updated 4 years ago
- Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus☆13Jan 2, 2021Updated 5 years ago
- Bachelor's grad work on code autocompletion with rnn☆10May 19, 2019Updated 6 years ago
- ☆12Updated this week
- Adversarial Attack for Pre-trained Code Models☆10Jul 19, 2022Updated 3 years ago
- A Betty Blocks Component Set based on Material UI☆25Updated this week
- ☆10Aug 25, 2020Updated 5 years ago
- toolsuite for analyzing cpp-preprocessor-based software product lines☆12Jul 19, 2023Updated 2 years ago
- Semantic Scaffolds for Pseudocode-to-Code Generation (accepted by ACL 2020)☆14Jun 7, 2021Updated 4 years ago
- Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.☆211Jul 13, 2020Updated 5 years ago
- Evaluation of source authorship attribution tool☆23Jun 5, 2021Updated 4 years ago
- ☆48Nov 19, 2025Updated 3 months ago
- Code search model based the self-attention☆12Oct 16, 2020Updated 5 years ago
- Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021☆112Aug 30, 2022Updated 3 years ago
- Program Transformation Tool for Java Methods☆11Sep 16, 2022Updated 3 years ago