☆25Mar 20, 2024Updated last year
Alternatives and similar repositories for download-from-common-crawl
Users that are interested in download-from-common-crawl are comparing it to the libraries listed below
Sorting:
- Applying Reinforcement Learning from Human Feedback to language models to teach them to write short story responses to writing prompts.☆14May 5, 2022Updated 3 years ago
- A News Article Collection Library☆22Mar 31, 2023Updated 2 years ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆32Jan 23, 2025Updated last year
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs☆11Jan 13, 2023Updated 3 years ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- Containerfile for the Vanilla OS Desktop+Nvidia image.☆16Mar 1, 2026Updated last week
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- scrape web content into readable markdown for llms and human readers☆10Feb 19, 2024Updated 2 years ago
- ☆16Updated this week
- private repository checkout action via github apps☆11Dec 28, 2022Updated 3 years ago
- ☆11Sep 27, 2024Updated last year
- ☆10Jul 6, 2023Updated 2 years ago
- Wikimedia Enterprise - client SDK in Python☆20Nov 11, 2025Updated 3 months ago
- Code and data for the Walert large language model-based chatbot☆12Aug 14, 2025Updated 6 months ago
- IT'S TRIVIA! FOR DEVS! GO!☆10Aug 13, 2021Updated 4 years ago
- A UI designer for constructing AI applications with OpenSearch☆16Feb 26, 2026Updated last week
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 3 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- Headless agent for test driven relevancy with Quepid.com☆11Mar 6, 2024Updated 2 years ago
- Utility for generating html elements with tagged`template literal`. Only 649 bytes.☆12Sep 25, 2024Updated last year
- Rank-Biased Precision, Overlap, Recall, and Alignment☆12Feb 18, 2025Updated last year
- Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool☆14Dec 12, 2025Updated 2 months ago
- A docker-compose file to set up a full DDB Cluster☆12Sep 29, 2017Updated 8 years ago
- prevent XSS attacks by sanitizing html (this is different then escaping!)☆22Oct 14, 2023Updated 2 years ago
- ☆14May 6, 2018Updated 7 years ago
- Blazing fast signature detection☆11Sep 5, 2022Updated 3 years ago
- Converting the Enron email collection to mbox format☆11Dec 9, 2016Updated 9 years ago
- Python wrapper around Yossi Rubner's Earth Mover's Distance implementation (http://ai.stanford.edu/~rubner/emd/default.htm)