Reference implementations of data-intensive algorithms in MapReduce and Spark
☆82Sep 3, 2018Updated 7 years ago
Alternatives and similar repositories for bespin
Users that are interested in bespin are comparing it to the libraries listed below
Sorting:
- Common web archive utility code.☆61Feb 6, 2026Updated 3 weeks ago
- ☆25Feb 20, 2026Updated last week
- ↕️ Intuitive axiomatic retrieval experimentation.☆31Feb 9, 2026Updated 3 weeks ago
- A toolkit for simulating interactive information retrieval☆21Sep 7, 2018Updated 7 years ago
- Predicting Political Instability and Social Conflicts Using Multimodal Data☆10Jun 6, 2016Updated 9 years ago
- A Hadoop toolkit for web-scale information retrieval research☆85Dec 12, 2014Updated 11 years ago
- CS 451/651 Data-Intensive Distribute Computing (Fall 2018) at the University of Waterloo☆23Nov 29, 2018Updated 7 years ago
- Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)☆12May 25, 2019Updated 6 years ago
- Track app memory usage.☆11Jan 13, 2015Updated 11 years ago
- Utility for cui2vec in Go☆13Feb 25, 2023Updated 3 years ago
- Exploiting SNP correlations within Random Forest for Genome-Wide Association Studies☆13Oct 20, 2014Updated 11 years ago
- data amusement on the microsoft academic graph☆20Feb 7, 2017Updated 9 years ago
- Sample PySpark code for interacting with the Microsoft Academic Graph☆22Mar 12, 2021Updated 4 years ago
- This toolkit provides an implementation of Modified Adsorption (MAD), a graph-based semi-supervised learning (SSL) algorithm.☆24Jun 20, 2017Updated 8 years ago
- CUDA kernel and JNI code which is called by Apache Spark's MLlib.☆19Jun 18, 2016Updated 9 years ago
- Minimalistic BM25 search engine in C/C++, Java, and nearly 20 other languages☆22Jun 19, 2024Updated last year
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Sep 30, 2016Updated 9 years ago
- Interactive SQL analytics in your browser!☆22Jan 31, 2018Updated 8 years ago
- NLP Utilities in Java☆43Dec 14, 2022Updated 3 years ago
- Fusion for TREC run files with popular fusion techniques☆21Aug 26, 2022Updated 3 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Jun 12, 2020Updated 5 years ago
- ☆50Sep 3, 2019Updated 6 years ago
- Algorithms to find Bertrand Nash equilibria in pricing games☆27Nov 28, 2022Updated 3 years ago
- Open-Source Information Retrieval Reproducibility Challenge☆51Jan 11, 2016Updated 10 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Feb 27, 2014Updated 12 years ago
- Java 8 Factorization Machines Library☆28Feb 17, 2017Updated 9 years ago
- Data pipeline automation tool☆27Jan 11, 2024Updated 2 years ago
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 2 years ago
- Simple Spark app that reads and writes Avro data☆31Apr 13, 2015Updated 10 years ago
- ☆26Feb 28, 2015Updated 11 years ago
- A Python interface to PISA☆37Sep 23, 2025Updated 5 months ago
- Learning the structure of graphical models from datasets with thousands of variables☆35Jul 24, 2018Updated 7 years ago
- tensorflow deep RL for driving a rover around☆64May 13, 2017Updated 8 years ago
- Civilian Topographic Map (CTM) product☆16Feb 28, 2025Updated last year
- A large scale feature extraction tool for text-based machine learning☆32Sep 6, 2022Updated 3 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Jul 17, 2015Updated 10 years ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆153Dec 5, 2025Updated 2 months ago
- scikit-learn addon to operate on set/"group"-based features☆41Aug 8, 2016Updated 9 years ago
- WebConf 2020 paper Leading Conversational Search by Suggesting Useful Questions☆33May 4, 2020Updated 5 years ago