giganticode/codeprep

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/giganticode/codeprep)

giganticode / codeprep

A toolkit for pre-processing large source code corpora

☆45

Alternatives and similar repositories for codeprep

Users that are interested in codeprep are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mast-group / OpenVocabCodeNLM
View on GitHub
Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlie…
☆84Mar 24, 2023Updated 3 years ago
casics / spiral
View on GitHub
A Python 3 module that provides functions for splitting identifiers found in source code files.
☆48Jan 12, 2023Updated 3 years ago
mdrafiqulrabin / tnpa-generalizability
View on GitHub
IST'21 & SANER'22: Semantic-Preserving Program Transformations
☆31Oct 25, 2022Updated 3 years ago
panthap2 / LearningToUpdateNLComments
View on GitHub
Learning to Update Natural Language Comments Based on Code Changes: Artifact
☆33Oct 24, 2020Updated 5 years ago
sola-st / SemSeed
View on GitHub
☆24Jun 17, 2021Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
giganticode / probes
View on GitHub
Probing pre-trained source code models
☆15Apr 27, 2022Updated 4 years ago
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
CC2Vec / CC2Vec
View on GitHub
☆51Dec 3, 2022Updated 3 years ago
eth-sri / learning-real-bug-detector
View on GitHub
☆15Oct 2, 2024Updated last year
zfj1998 / CodeBert-Code2Text
View on GitHub
基于CodeBert预训练模型，微调后/直接对目标数据集进行测试
☆14Oct 19, 2021Updated 4 years ago
SLP-team / SLP-Core
View on GitHub
Your library for dynamic language modeling
☆69Oct 23, 2018Updated 7 years ago
SoftWiser-group / CoDiSum
View on GitHub
Implementation of 'Commit message generation for source code change'.
☆25Oct 23, 2019Updated 6 years ago
ZZR0 / CodeAttack
View on GitHub
Adversarial Attack for Pre-trained Code Models
☆10Jul 19, 2022Updated 4 years ago
microsoft / graph-based-code-modelling
View on GitHub
Code for "Generative Code Modeling with Graphs" (ICLR'19)
☆172Dec 8, 2022Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
ml4code / ml4code.github.io
View on GitHub
Website for "A Survey of Machine Learning for Big Code and Naturalness"
☆295Feb 7, 2025Updated last year
TomasAndersonFang / SANCS
View on GitHub
Code search model based the self-attention
☆12Oct 16, 2020Updated 5 years ago
ASSERT-KTH / sequencer
View on GitHub
Sequence-to-Sequence Learning for End-to-End Program Repair (IEEE TSE 2019). Open-science repo. http://arxiv.org/pdf/1901.01808
☆87Jun 9, 2023Updated 3 years ago
mwcvitkovic / Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor
View on GitHub
Library for preprocessing java source code into Augmented ASTs, as per the paper Open Vocabulary Learning on Source Code with a Graph-Str…
☆21Oct 22, 2018Updated 7 years ago
WM-SEMERU / dl4se
View on GitHub
A Systematic Literature Review of Deep Learning in Software Engineering
☆20Aug 28, 2024Updated last year
program-repair / RepairThemAll
View on GitHub
Automatic Repair Framework that abstract repair tools and bug benchmarks
☆72May 1, 2023Updated 3 years ago
EdinburghNLP / code-docstring-corpus
View on GitHub
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
☆211Jul 13, 2020Updated 6 years ago
google-research-datasets / great
View on GitHub
The dataset for the variable-misuse task, used in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/f…
☆22Aug 19, 2020Updated 5 years ago
wasiahmad / NeuralCodeSum
View on GitHub
Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].
☆192May 28, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
TruX-DTF / FL-VS-APR
View on GitHub
☆16Apr 26, 2021Updated 5 years ago
MrVPlusOne / LambdaNet
View on GitHub
Probabilistic Type Inference using Graph Neural Networks
☆49Dec 9, 2022Updated 3 years ago
LIANGQINGYUAN / Lyra
View on GitHub
Lyra: A Benchmark for Turducken-Style Code Generation
☆15Apr 22, 2022Updated 4 years ago
yz1019117968 / ICPC-21-MMTrans
View on GitHub
Source Code for "A multi-modal transformer-based code summarization approach for smart contracts"
☆27Mar 16, 2021Updated 5 years ago
Attn-to-FC / Attn-to-FC
View on GitHub
☆18Apr 14, 2021Updated 5 years ago
bayesgroup / code_transformers
View on GitHub
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Sourc…
☆66Dec 3, 2021Updated 4 years ago
modit-team / MODIT
View on GitHub
MODIT: On Multi-Modal Learning of Editing Source Code.
☆20Apr 24, 2021Updated 5 years ago
facebookresearch / code-prediction-transformer
View on GitHub
This repo will contain replication package for the paper "Feeding Trees to Transformers for Code Completion"
☆100Jun 3, 2022Updated 4 years ago
parasj / contracode
View on GitHub
Contrastive Code Representation Learning: functionality-based JavaScript embeddings through self-supervised learning
☆172Dec 26, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CodedotAl / reading-group
View on GitHub
Information about the CodedotAI reading group sessions.
☆13Aug 16, 2021Updated 4 years ago
cmuvariability / PaperReadingGroup
View on GitHub
☆10Jan 8, 2015Updated 11 years ago
PYART0 / PyART
View on GitHub
DataSet and source code for PyART
☆11Nov 27, 2022Updated 3 years ago
ruiqi-zhong / SemanticScaffold
View on GitHub
Semantic Scaffolds for Pseudocode-to-Code Generation (accepted by ACL 2020)
☆14Jun 7, 2021Updated 5 years ago
CGCL-codes / naturalcc
View on GitHub
NaturalCC: An Open-Source Toolkit for Code Intelligence
☆317Jul 9, 2026Updated last week
saikat107 / Codit
View on GitHub
☆13Jul 6, 2023Updated 3 years ago
sola-st / IdBench
View on GitHub
A benchmark for evaluating embeddings of identifiers in source code.
☆23Aug 23, 2021Updated 4 years ago