C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs
☆11Jan 13, 2023Updated 3 years ago
Alternatives and similar repositories for c4repset
Users that are interested in c4repset are comparing it to the libraries listed below
Sorting:
- website for MS Marco☆34Mar 26, 2025Updated 11 months ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- Wikimedia Enterprise - client SDK in Python☆20Nov 11, 2025Updated 3 months ago
- ☆10Jul 6, 2023Updated 2 years ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/☆11Aug 9, 2024Updated last year
- Work done for "From Nand to Tetris: Building a Modern Computer from First Principles"☆11Jan 7, 2016Updated 10 years ago
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- Containerfile for the Vanilla OS Desktop+Nvidia image.☆16Feb 5, 2026Updated last month
- Code and data for the Walert large language model-based chatbot☆12Aug 14, 2025Updated 6 months ago
- LLM Assistent with Chat Integration☆13Sep 5, 2024Updated last year
- A remark plugin for making interactive markdown documents with Tangle.☆13Oct 25, 2021Updated 4 years ago
- IAI Style Guide☆10Jun 27, 2025Updated 8 months ago
- Calculate various properties of the Universe at a given time☆11Nov 16, 2025Updated 3 months ago
- A UI designer for constructing AI applications with OpenSearch☆16Feb 26, 2026Updated last week
- A monolithic index that supports worst-case optimal joins (WCOJ) by providing all collation orders in a single redundancy eliminating dat…☆16Sep 18, 2025Updated 5 months ago
- QALD-9-Plus Dataset for Knowledge Graph Question Answering☆12Aug 31, 2022Updated 3 years ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- Rank-Biased Precision, Overlap, Recall, and Alignment☆12Feb 18, 2025Updated last year
- Arabic - English emotion lexicon☆12Apr 24, 2017Updated 8 years ago
- R library for common information retrieval metrics☆14Jun 5, 2023Updated 2 years ago
- Building applications with DeepSeek R1 model☆12Feb 15, 2025Updated last year
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 3 months ago
- ☆10Jan 5, 2022Updated 4 years ago
- Converting the Enron email collection to mbox format☆11Dec 9, 2016Updated 9 years ago
- Simulated user for TREC 2016-2017 Dynamic Domain track☆10Dec 27, 2017Updated 8 years ago
- How to backdoor Diffie-Hellman, lessons learned from the Socat non-prime prime☆11Jun 29, 2021Updated 4 years ago
- Python wrapper around Yossi Rubner's Earth Mover's Distance implementation (http://ai.stanford.edu/~rubner/emd/default.htm)☆22Jul 9, 2015Updated 10 years ago
- Offline RandomAPI npm module☆12Apr 22, 2018Updated 7 years ago
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- Evaluation of GPT-3 for clinical information extraction tasks.☆11Dec 13, 2022Updated 3 years ago
- ☆11May 6, 2025Updated 10 months ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 3 years ago
- prevent XSS attacks by sanitizing html (this is different then escaping!)☆22Oct 14, 2023Updated 2 years ago
- Elastic computing platform☆30Feb 28, 2026Updated last week
- Blazing fast signature detection☆11Sep 5, 2022Updated 3 years ago
- parse_mediawiki_dump clone☆11Mar 22, 2025Updated 11 months ago
- Fair Benchmarks☆10Mar 14, 2019Updated 6 years ago