jackbandy / bookcorpus-datasheetView external linksLinks
Documentation effort for the BookCorpus dataset
☆34Jun 2, 2021Updated 4 years ago
Alternatives and similar repositories for bookcorpus-datasheet
Users that are interested in bookcorpus-datasheet are comparing it to the libraries listed below
Sorting:
- An alternative approach for probabilistic topic modeling based on agglomerative clustering of topics (not documents)☆12Apr 14, 2021Updated 4 years ago
- Baseline models for the paper: "Modeling Naive Psychology of Characters in Simple Commonsense Stories" by Hannah Rashkin, Antoine Bosselu…☆16Feb 23, 2021Updated 4 years ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆19Oct 12, 2024Updated last year
- Random material having to do with Daniel Lemire's talks☆32Jan 5, 2026Updated last month
- All of the ImageNet meta-datasets and notebooks☆64Jul 21, 2024Updated last year
- Implementation for "Rational Recurrences", Peng et al., EMNLP 2018.☆28Jun 21, 2022Updated 3 years ago
- Data sets and ML models versioning example from DVC get started☆10Jun 4, 2024Updated last year
- ☆16Jul 7, 2025Updated 7 months ago
- Rank Aggregation Algorithms☆12Jul 22, 2014Updated 11 years ago
- Hierarchical Text Classifier of News Group Messages using Facebook's FastText☆10Jul 8, 2019Updated 6 years ago
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- ☆11Updated this week
- Generalised UDRL☆37May 12, 2022Updated 3 years ago
- Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.☆37May 23, 2016Updated 9 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆46Nov 13, 2023Updated 2 years ago
- Basic template for using Flan-t5 on Banana's serverless GPU platform. Ready for 1-Click deploy☆11Jan 30, 2023Updated 3 years ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 8 months ago
- ☆16Updated this week
- A Discord bot that answers questions about Replicate.☆16Jan 5, 2024Updated 2 years ago
- ☆11Aug 4, 2022Updated 3 years ago
- Trains small LMs. Designed for training on SimpleStories☆12Sep 15, 2025Updated 5 months ago
- A library for training crosscoders☆15May 28, 2025Updated 8 months ago
- ☆38Apr 17, 2024Updated last year
- scrape web content into readable markdown for llms and human readers☆10Feb 19, 2024Updated last year
- A tool for creating a repository of transcribed videos☆53Dec 3, 2023Updated 2 years ago
- ☆48Jan 21, 2024Updated 2 years ago
- SpeechYOLO Interspeech 2019☆46Aug 16, 2022Updated 3 years ago
- ☆23Feb 3, 2026Updated last week
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- Nile plugin adding coverage reports for Cairo Smart Contracts (from Pytest test suite).☆11Nov 22, 2022Updated 3 years ago
- ☆15Oct 29, 2024Updated last year
- Reacts component for interacting with the Ethereum Name Service.☆16Aug 27, 2023Updated 2 years ago
- Async web scraping framework on top of Rust. Works with Free-threaded Python (`PYTHON_GIL=0`).☆24Updated this week
- Ather Control Area Network (ACAN) is a utility for CAN communication.☆26Jun 2, 2025Updated 8 months ago
- Open evolution proposals for the Twitter API☆49Dec 10, 2022Updated 3 years ago
- ☆10Feb 22, 2022Updated 3 years ago
- Hardhat made easy with a flexible CLI to help run test, deploy and more.☆10Apr 10, 2024Updated last year
- Example how to append data to a Haskell executable using sqlite☆10Mar 16, 2020Updated 5 years ago