smola / language-datasetLinks
Dataset for programming language identification.
☆24Updated 2 years ago
Alternatives and similar repositories for language-dataset
Users that are interested in language-dataset are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 6 years ago
- sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees☆142Updated 6 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- A record and replay system for the browser (renamed Ringer)☆30Updated 8 years ago
- ☆10Updated 5 years ago
- The LAW next generation crawler.☆90Updated 4 years ago
- source{d} MLonCode foundation - core algorithms and models.☆14Updated 6 years ago
- ☆31Updated 2 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12Updated last year
- Vecino is a command line application to discover Git repositories which are similar to the one that the user provides.☆49Updated 6 years ago
- The GHtorrent project website☆158Updated last year
- Lightning Fast Language Prediction 🚀☆167Updated 4 months ago
- ☆43Updated last year
- Accelerated bulk diff on GPU☆11Updated 9 years ago
- Launch NMT tasks on the cloud☆13Updated 2 years ago
- ☆22Updated 6 years ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- Machine Learning for Information Retrieval☆86Updated 7 months ago
- ☆20Updated 6 years ago
- Firefox add-on to send scientific articles to your e-reader directly from journal websites.☆42Updated 9 years ago
- NLP2Code: Code Snippet Content Assist via Natural Language Tasks☆34Updated 6 years ago
- Text similarity based on Word2Vec vectors.☆10Updated 8 years ago
- jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source co…☆71Updated 6 years ago
- Similarity algorithm (computes the similarity between two files as a 0 to 1 score) with linear complexity, based on context triggered pie…☆34Updated 8 years ago
- MozoLM: A language model (LM) serving library☆47Updated 2 weeks ago
- Automatically check mismatch between code and comments using AI and ML☆54Updated 4 years ago
- Source code for the Naturalize project☆56Updated 10 years ago
- Tree-based Autofolding Software Summarization Algorithm☆43Updated 9 years ago
- The Data Linter identifies potential issues (lints) in your ML training data.☆88Updated 8 years ago