thoppe / The-Pile-FreeLaw
Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.
☆11Updated last year
Alternatives and similar repositories for The-Pile-FreeLaw:
Users that are interested in The-Pile-FreeLaw are comparing it to the libraries listed below
- ☆17Updated 9 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 5 months ago
- Tools for formatting large language model prompts.☆13Updated last year
- Python library to use Pleias-RAG models☆27Updated this week
- Lightweight tools for quick and easy LLM demo's☆26Updated 7 months ago
- ☆41Updated 2 months ago
- [COLM '24] Source-Aware Training Enables Knowledge Attribution in Language Models☆17Updated 3 weeks ago
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- ☆48Updated 5 months ago
- A script for collecting the PubMed Central dataset in a language modelling friendly format.☆24Updated 4 years ago
- ☆15Updated 2 weeks ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 5 months ago
- Efficiently computing & storing token n-grams from large corpora☆23Updated 6 months ago
- An attribution library for LLMs☆38Updated 7 months ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆81Updated last year
- ☆20Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 7 months ago
- Targeted Data Generation with Large Language Models☆17Updated 10 months ago
- ☆90Updated 2 years ago
- ☆11Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆43Updated 4 years ago
- 🚂 Fine-tune OpenAI models for text classification, question answering, and more☆16Updated last year
- Downloads 2020 English Wikipedia articles as plaintext☆24Updated 2 years ago
- ☆22Updated 2 months ago
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆24Updated last year
- Blenderbot☆9Updated 3 years ago
- ☆53Updated 4 months ago
- ☆19Updated 2 weeks ago
- ☆51Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago