smola / language-datasetLinks
Dataset for programming language identification.
☆24Updated 2 years ago
Alternatives and similar repositories for language-dataset
Users that are interested in language-dataset are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 6 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12Updated last year
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 6 years ago
- Fixes Java syntax errors with LSTM neural networks! [proof-of-concept]☆18Updated 4 years ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆15Updated 7 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Extract Data from Wikipedia Tables☆34Updated 8 years ago
- Code and data for the Walert large language model-based chatbot☆12Updated 3 months ago
- Tools and other things for people who work on search relevance & information retrieval☆87Updated 2 years ago
- Machine learning models for MLonCode trained using the source{d} stack☆19Updated 6 years ago
- sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees☆141Updated 6 years ago
- Common Crawl Index Server☆71Updated 8 months ago
- Advanced desktop search/corpus exploration prototype☆21Updated 4 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Updated last year
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- 🐈 Code Annotation Tool☆28Updated 6 years ago
- 🔍 Mirror of https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch. See https://www.mediawiki.org/wiki/Developer_access for co…☆43Updated this week
- The LAW next generation crawler.☆88Updated 4 years ago
- MozoLM: A language model (LM) serving library☆45Updated 2 weeks ago
- ☆22Updated 6 years ago
- ☆31Updated 2 years ago
- Search relevance evaluation toolkit☆74Updated 3 years ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆43Updated 7 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆133Updated last month
- Natural language detection, Java bindings for CLD2☆14Updated last month
- Machine Learning for Information Retrieval☆86Updated 6 months ago
- Multilingual NLP annotation projection☆52Updated 3 years ago