Convert source code into numerical tokens
☆66Jul 27, 2023Updated 2 years ago
Alternatives and similar repositories for tokenizer
Users that are interested in tokenizer are comparing it to the libraries listed below
Sorting:
- Smelling smells using Deep Learning☆47Mar 2, 2021Updated 5 years ago
- Repository of the paper 'CodeQueries: A Dataset of Semantic Queries over Code' published in ISEC 2024☆13Apr 21, 2024Updated last year
- Artifacts and other data for "Code Vectors: Understanding Programs Through Embedded Abstraced Symbolic Traces"☆22Jun 5, 2020Updated 5 years ago
- ☆20Nov 6, 2019Updated 6 years ago
- C# Data Extraction for "Learning to Represent Edits"☆27Nov 3, 2018Updated 7 years ago
- Improving Code Readability Classification using Convolutional Neural Networks☆10Apr 18, 2018Updated 7 years ago
- ☆22Jun 3, 2019Updated 6 years ago
- An empirical study on patch correctness☆15Nov 5, 2022Updated 3 years ago
- ☆14May 28, 2024Updated last year
- AST factorization: transformation AST of Kotlin source code to a vector☆11Oct 17, 2019Updated 6 years ago
- OCaml library to transform an Llvm control flow graph in an SMT formula.☆13Apr 20, 2018Updated 7 years ago
- Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlie…☆84Mar 24, 2023Updated 2 years ago
- Babelfish Python client☆17Nov 6, 2019Updated 6 years ago
- ☆14May 27, 2022Updated 3 years ago
- Machine learning models for MLonCode trained using the source{d} stack☆19Oct 30, 2019Updated 6 years ago
- ☆17Dec 9, 2022Updated 3 years ago
- ProgQuery is a system to extract useful syntactic and semantic information from source code programs and store it in a graph database for…☆17Jan 22, 2025Updated last year
- Hexagon processor module for IDA Pro disassembler☆19Oct 11, 2022Updated 3 years ago
- DeepCS: Deep Code Search☆283May 26, 2022Updated 3 years ago
- fastText pretrained models for semantic representations of source code in Java, Python, PHP, C, C++ and C#.☆17Nov 11, 2020Updated 5 years ago
- Calculate the score of a repository based on best engineering practices.☆114Sep 27, 2020Updated 5 years ago
- Your library for dynamic language modeling☆66Oct 23, 2018Updated 7 years ago
- A Source Code Tokenizer☆14Oct 30, 2024Updated last year
- CD4Py: Code De-Duplication for Python☆23Dec 13, 2020Updated 5 years ago
- Process Orchestration Framework: A camunda 7 fork☆21Updated this week
- Utilize the capability of GPT-4o Vision on the UHHGPT web portal☆12Aug 26, 2024Updated last year
- [ICSE'18] Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code☆22May 18, 2018Updated 7 years ago
- A benchmark for evaluating embeddings of identifiers in source code.☆22Aug 23, 2021Updated 4 years ago
- ☆50Feb 12, 2020Updated 6 years ago
- Tool for analyzing git log messages and diffs.☆22Jan 13, 2021Updated 5 years ago
- Manipulate C-family ASTs with Clang☆68Oct 22, 2018Updated 7 years ago
- Vine: The BitBlaze Static Analysis Component☆26Sep 27, 2014Updated 11 years ago
- CodRep 2019 edition.☆20Nov 12, 2019Updated 6 years ago
- Unit testing for SQL queries☆26Aug 16, 2024Updated last year
- A tool for mining commits from Git repositories and diffs to automatically extract code change pattern instances and features with ast a…☆99Nov 13, 2024Updated last year
- Source code for the Naturalize project☆56Sep 5, 2015Updated 10 years ago
- TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"☆1,142Sep 20, 2023Updated 2 years ago
- Neural Code Comprehension: A Learnable Representation of Code Semantics☆216Nov 22, 2024Updated last year
- ☆24Jun 17, 2021Updated 4 years ago