smola / language-datasetLinks
Dataset for programming language identification.
☆24Updated 2 years ago
Alternatives and similar repositories for language-dataset
Users that are interested in language-dataset are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 3 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 2 weeks ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆15Updated 8 years ago
- sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees☆142Updated 6 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12Updated last year
- Indri search implementation on top of Lucene search engine☆35Updated last year
- Interactive SQL analytics in your browser!☆22Updated 8 years ago
- The LAW next generation crawler.☆90Updated 4 years ago
- A record and replay system for the browser (renamed Ringer)☆30Updated 8 years ago
- Deep learning spelling patterns with a recurrent neural network☆12Updated 8 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Tools and other things for people who work on search relevance & information retrieval☆88Updated 2 years ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- ☆22Updated 6 years ago
- Lightning Fast Language Prediction 🚀☆167Updated 5 months ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- Common Crawl Index Server☆71Updated 11 months ago
- Experiments to help discussion on Wikipedia talk pages☆68Updated this week
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- Fixes Java syntax errors with LSTM neural networks! [proof-of-concept]☆18Updated 4 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- Say "ni" to data of any size☆86Updated 2 months ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 4 years ago
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 6 years ago
- Java implementation of Lempel-Ziv Jaccard Distance☆21Updated 8 years ago
- Automatically check mismatch between code and comments using AI and ML☆54Updated 4 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Updated 8 years ago
- This is a minimal acyclic finite-state automata algorithm in Java based on the paper, "Incremental Construction of Minimal Acyclic Finite…☆20Updated 12 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 8 years ago