smola / language-dataset
Dataset for programming language identification.
☆22Updated 2 years ago
Alternatives and similar repositories for language-dataset:
Users that are interested in language-dataset are comparing it to the libraries listed below
- Advanced similarity and duplicate source code at scale.☆54Updated 5 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- Elasticsearch like search engine supporting real time indexing and querying☆14Updated 7 years ago
- My data is bigger than your data!☆39Updated 5 years ago
- Recurrent neural network to split code snippets from text.☆12Updated 6 years ago
- bootstrap for my dev setup☆60Updated 5 years ago
- This project explores the use of Apache Gora as a query broker which can be used within a federated web search scenario.☆17Updated last year
- Database smell detector☆13Updated 7 years ago
- Python library to share machine learning models easily and reliably.☆18Updated 5 years ago
- Programmatic Control Flow☆12Updated 7 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Launch NMT tasks on the cloud☆13Updated last year
- Website for standardized execution and evaluation of algorithms on datasets.☆36Updated 5 years ago
- ☆98Updated 4 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Minutes from nteract monthly contributor meeting; reports and metrics☆9Updated 3 years ago
- Apache Spark under Docker☆9Updated 8 years ago
- Automated Measurement and Analysis of Open-Source Software☆13Updated 7 years ago
- ☆20Updated 8 years ago
- Samples of ML models learning from source code☆19Updated 2 years ago
- Simple ASCII Dashboard created with Drill & Node.js☆10Updated 9 years ago
- Burglary prediction for mortals☆10Updated 10 months ago
- Collaboration app for sharing and reviewing jupyter notebooks☆16Updated last year
- Cross-platform keyboard and mouse event capture tool.☆18Updated 4 years ago
- Genrates python dependency graph☆21Updated 6 years ago
- KeyTerms centralized terminology management tool☆13Updated 6 years ago
- Python functions for popular relevance metrics (ndcg, err, etc)☆16Updated last year
- A machine learning software for extracting information from scholarly documents☆23Updated 4 years ago
- Exploration Library in Java☆12Updated last year
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆38Updated 3 months ago