smola / language-dataset
Dataset for programming language identification.
☆22Updated 2 years ago
Alternatives and similar repositories for language-dataset:
Users that are interested in language-dataset are comparing it to the libraries listed below
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Advanced similarity and duplicate source code at scale.☆55Updated 5 years ago
- My data is bigger than your data!☆39Updated 5 years ago
- Python library to share machine learning models easily and reliably.☆18Updated 5 years ago
- Machine learning models for MLonCode trained using the source{d} stack☆19Updated 5 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Burglary prediction for mortals☆10Updated 11 months ago
- Database smell detector☆13Updated 7 years ago
- Chorus, now for Elasticsearch!☆16Updated 10 months ago
- An experimental patchset management tool.☆12Updated 4 years ago
- Online service for analyzing research profiles of scientists and conferences☆13Updated 2 years ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆14Updated 7 years ago
- Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search appl…☆65Updated 9 months ago
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 5 years ago
- Programmatic Control Flow☆12Updated 7 years ago
- An Excel formula parser☆12Updated 6 years ago
- Lookout Style Analyzer: fixing code formatting and typos during code reviews☆32Updated 2 years ago
- Vecino is a command line application to discover Git repositories which are similar to the one that the user provides.☆49Updated 5 years ago
- Common Crawl Index Server☆68Updated last month
- Examples for my book "Power Java"☆21Updated 2 years ago
- Python Implementation of Super and Hyper Log Log Sketches☆49Updated 13 years ago
- Assessing Source Code Semantic Similarity with Unsupervised Learning☆41Updated 7 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Updated 5 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- Java implementation of Lempel-Ziv Jaccard Distance☆21Updated 7 years ago
- Código en PL/SQL☆9Updated 3 years ago
- ☆20Updated 8 years ago
- Last-seen sketch implementation in Go☆16Updated 4 years ago
- Provides precise code intelligence via LSIF and Language Servers, and fuzzy code intelligence using ctags and text search☆27Updated last year