smola / language-datasetLinks
Dataset for programming language identification.
☆23Updated 2 years ago
Alternatives and similar repositories for language-dataset
Users that are interested in language-dataset are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 5 years ago
- sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees☆141Updated 6 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Fixes Java syntax errors with LSTM neural networks! [proof-of-concept]☆18Updated 3 years ago
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 5 years ago
- A record and replay system for the browser (renamed Ringer)☆30Updated 7 years ago
- Machine learning models for MLonCode trained using the source{d} stack☆19Updated 5 years ago
- Vecino is a command line application to discover Git repositories which are similar to the one that the user provides.☆49Updated 6 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12Updated last year
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Assessing Source Code Semantic Similarity with Unsupervised Learning☆41Updated 7 years ago
- KDD Hands-On Tutorial (2018)☆29Updated 2 years ago
- Lightning Fast Language Prediction 🚀☆167Updated last week
- ☆22Updated 6 years ago
- Code for the paper "World of Bits: An Open-Domain Platform for Web-Based Agents"☆30Updated 6 years ago
- MozoLM: A language model (LM) serving library☆45Updated 3 weeks ago
- ☆31Updated 2 years ago
- Common Crawl Index Server☆70Updated 6 months ago
- High-performance program to spell-check and auto-correct large documents☆41Updated 5 years ago
- The Data Linter identifies potential issues (lints) in your ML training data.☆88Updated 7 years ago
- Background materials for the article "Productivity Assessment of Neural Code Completion"☆12Updated 2 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆131Updated 5 months ago
- My data is bigger than your data!☆39Updated 6 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- Experiments to help discussion on Wikipedia talk pages☆66Updated last month
- Launch NMT tasks on the cloud☆13Updated 2 years ago
- Mixer provides the translator engine and API interface to access Data Commons graph☆16Updated last week
- Text similarity based on Word2Vec vectors.☆10Updated 8 years ago