smola / language-datasetLinks
Dataset for programming language identification.
☆23Updated 2 years ago
Alternatives and similar repositories for language-dataset
Users that are interested in language-dataset are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 5 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees☆141Updated 6 years ago
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 5 years ago
- Machine Learning for Information Retrieval☆86Updated 4 months ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 6 years ago
- Text similarity based on Word2Vec vectors.☆10Updated 8 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 3 years ago
- 🐈 Code Annotation Tool☆28Updated 5 years ago
- Vecino is a command line application to discover Git repositories which are similar to the one that the user provides.☆49Updated 6 years ago
- Interactive SQL analytics in your browser!☆22Updated 7 years ago
- Advanced desktop search/corpus exploration prototype☆21Updated 4 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Lightning Fast Language Prediction 🚀☆167Updated last month
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆15Updated 7 years ago
- The LAW next generation crawler.☆89Updated 3 years ago
- The first, open access evaluation dataset for methods to identify bias by word choice and labeling☆25Updated 2 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆133Updated 6 months ago
- This module contains an implementation of the Nilsimsa locality-sensitive hashing algorithm in Java.☆18Updated 6 years ago
- Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.☆23Updated last year
- Deep learning spelling patterns with a recurrent neural network☆12Updated 8 years ago
- Scripts as a service. Builds on systemd (for Linux)☆21Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- Automatically check mismatch between code and comments using AI and ML☆53Updated 4 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- An efficient and flexible token-based regular expression language and engine.☆75Updated 11 years ago
- ☆31Updated 2 years ago