BK-SCOSS / sctokenizerLinks
A Source Code Tokenizer
☆14Updated last year
Alternatives and similar repositories for sctokenizer
Users that are interested in sctokenizer are comparing it to the libraries listed below
Sorting:
- A Dataset of 600k Java Source Code Changes Categorized by Diff Size http://arxiv.org/pdf/2108.04631☆23Updated last year
- A collection of recent papers, benchmarks and datasets of AI4Code domain.☆58Updated last year
- A curated list of software engineering research, data set, tool.☆33Updated 3 years ago
- Models and datasets for annotated code search.☆35Updated 2 years ago
- Learning to Update Natural Language Comments Based on Code Changes: Artifact☆33Updated 5 years ago
- A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.☆55Updated 3 years ago
- A dataset for natural language code search.☆14Updated 5 years ago
- Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)☆59Updated 4 years ago
- Implementation of "Automatic Source Code Summarization with Extended Tree-LSTM"☆36Updated 3 years ago
- Code implementation for CoTexT: Multi-task Learning with Code-Text Transformer☆36Updated 4 years ago
- ☆23Updated 2 years ago
- A toolkit for pre-processing large source code corpora☆45Updated 3 years ago
- Semantic Code Search☆37Updated 2 years ago
- Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].☆195Updated 3 years ago
- Deep Just-In-Time Inconsistency Detection Between Comments and Source Code: Artifact☆22Updated 6 months ago
- ☆38Updated 4 years ago
- VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning☆40Updated 3 years ago
- Implementation of 'Commit message generation for source code change'.☆25Updated 6 years ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Updated 2 years ago
- Re-implementation of "CODE2SEQ: GENERATING SEQUENCES FROM STRUCTURED REPRESENTATIONS OF CODE"☆45Updated last year
- Reproduce the results of Tree-based Convolutional Neural Network (TBCNN)☆39Updated 2 years ago
- StaQC: a systematically mined dataset containing around 148K Python and 120K SQL domain question-code pairs, as described in "StaQC: A Sy…☆172Updated 4 years ago
- Code for "Deep Graph Matching and Searching for Semantic Code Retrieval"☆24Updated 4 years ago
- [UNMAINTAINED] A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN) for Graph Classification☆20Updated 6 years ago
- A Comparative Study of Various Code Embeddings in Software Semantic Matching☆18Updated 3 years ago
- Source Code for ACL-21 main conference paper "CoSQA: 20,000+ Web Queries for Code Search and Question Answering".☆46Updated 3 years ago
- ESEC/FSE'21: Prediction-Preserving Program Simplification☆10Updated 3 years ago
- TDCleaner: A Tool for Detecting Obsolete TODO Comments in Software Repos☆12Updated 4 years ago
- A benchmark for evaluating embeddings of identifiers in source code.☆22Updated 4 years ago
- ☆30Updated 5 years ago