BK-SCOSS / sctokenizerLinks
A Source Code Tokenizer
☆13Updated 8 months ago
Alternatives and similar repositories for sctokenizer
Users that are interested in sctokenizer are comparing it to the libraries listed below
Sorting:
- A Dataset of 600k Java Source Code Changes Categorized by Diff Size http://arxiv.org/pdf/2108.04631☆22Updated last year
- A toolkit for pre-processing large source code corpora☆47Updated 2 years ago
- A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.☆53Updated 3 years ago
- A collection of recent papers, benchmarks and datasets of AI4Code domain.☆58Updated last year
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Updated 2 years ago
- A Comparative Study of Various Code Embeddings in Software Semantic Matching☆16Updated 2 years ago
- Deep Just-In-Time Inconsistency Detection Between Comments and Source Code: Artifact☆22Updated this week
- Implementation of "Automatic Source Code Summarization with Extended Tree-LSTM"☆36Updated 2 years ago
- A dataset for natural language code search.☆14Updated 5 years ago
- ☆23Updated 2 years ago
- Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".☆12Updated 2 months ago
- Models and datasets for annotated code search.☆35Updated 2 years ago
- A curated list of software engineering research, data set, tool.☆32Updated 2 years ago
- ☆13Updated 2 years ago
- ☆29Updated 4 years ago
- Set of tools to help working with "Big Code"☆43Updated 3 years ago
- Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)☆58Updated 3 years ago
- Implementation of the paper "Language-agnostic representation learning of source code from structure and context".☆169Updated 3 years ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆11Updated 3 years ago
- Code implementation for CoTexT: Multi-task Learning with Code-Text Transformer☆36Updated 3 years ago
- Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].☆193Updated 3 years ago
- an implementation of "code2vec: Learning Distributed Representations of Code"☆30Updated last year
- This repo is the benchmark for source code summarization on C language☆26Updated 4 years ago
- Replication Package for "Compressing Pre-trained Models of Code into 3 MB", ASE 2022☆30Updated 9 months ago
- A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.☆15Updated 3 years ago
- Improving Machine Translation Systems via Isotopic Replacement☆12Updated 2 years ago
- VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning☆39Updated 2 years ago
- PyTorch's implementation of the code2seq model.☆62Updated last year
- A PyTorch implementation of `code2vec: Learning Distributed Representations of Code` (Alon et al., 2018)☆37Updated 6 years ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Updated last year