Documentation effort for the BookCorpus dataset
☆34Jun 2, 2021Updated 4 years ago
Alternatives and similar repositories for bookcorpus-datasheet
Users that are interested in bookcorpus-datasheet are comparing it to the libraries listed below
Sorting:
- Feature Decay Algorithms☆11Mar 5, 2014Updated 12 years ago
- An alternative approach for probabilistic topic modeling based on agglomerative clustering of topics (not documents)☆12Apr 14, 2021Updated 4 years ago
- Transformers at any scale☆42Jan 18, 2024Updated 2 years ago
- Baseline models for the paper: "Modeling Naive Psychology of Characters in Simple Commonsense Stories" by Hannah Rashkin, Antoine Bosselu…☆16Feb 23, 2021Updated 5 years ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆19Oct 12, 2024Updated last year
- Random material having to do with Daniel Lemire's talks☆32Feb 20, 2026Updated 2 weeks ago
- ☆20Aug 17, 2021Updated 4 years ago
- Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"☆27May 1, 2025Updated 10 months ago
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- Data sets and ML models versioning example from DVC get started☆10Jun 4, 2024Updated last year
- Implementation for "Rational Recurrences", Peng et al., EMNLP 2018.☆28Jun 21, 2022Updated 3 years ago
- ☆32Dec 29, 2020Updated 5 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- ☆11Feb 25, 2026Updated last week
- Rank Aggregation Algorithms☆12Jul 22, 2014Updated 11 years ago
- Code to reproduce experiments from the EMNLP 2015 paper about Rumour Stance Classification with Gaussian Processes.☆37May 23, 2016Updated 9 years ago
- Generalised UDRL☆37May 12, 2022Updated 3 years ago
- Command-line tool for building Gephi force-directed graph diagrams.☆10Nov 10, 2017Updated 8 years ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 9 months ago
- Implementation of data dimensionality reduction algorithms SVD and CUR without using library functions.☆10Jul 24, 2017Updated 8 years ago
- ☆11Oct 14, 2021Updated 4 years ago
- scrape web content into readable markdown for llms and human readers☆10Feb 19, 2024Updated 2 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Basic template for using Flan-t5 on Banana's serverless GPU platform. Ready for 1-Click deploy☆11Jan 30, 2023Updated 3 years ago
- Trains small LMs. Designed for training on SimpleStories☆12Sep 15, 2025Updated 5 months ago
- ☆16Feb 28, 2026Updated last week
- ☆11Aug 4, 2022Updated 3 years ago
- Data and Code for "The Values Encoded in Machine Learning Research"☆45Jun 10, 2022Updated 3 years ago
- ☆48Jan 21, 2024Updated 2 years ago
- ☆12Apr 2, 2024Updated last year
- Artemis Academy capstone project☆10Sep 10, 2022Updated 3 years ago
- Functional Linear Algebra with Block Matrices☆11Feb 17, 2022Updated 4 years ago
- Collaborative web framework for analyzing text (e.g., tweets). Supports standard labeling and pairwise comparison.☆14Sep 15, 2021Updated 4 years ago
- ☆10Feb 22, 2022Updated 4 years ago
- This project is part of the CS course 'Systems Engineering Meets Life Sciences II' at Goethe University Frankfurt. In this Computer Visio…☆11Mar 15, 2021Updated 4 years ago
- Price options by fitting a Lévy distribution☆10Jan 20, 2021Updated 5 years ago
- Python SDK for DeFi analysis using the Transpose SQL API☆10Jul 17, 2023Updated 2 years ago
- Lars's datasets☆12Jun 16, 2024Updated last year