thoppe / The-Pile-FreeLaw
Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.
☆11Updated last year
Alternatives and similar repositories for The-Pile-FreeLaw:
Users that are interested in The-Pile-FreeLaw are comparing it to the libraries listed below
- ☆26Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆41Updated 4 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆81Updated last year
- ☆30Updated 8 months ago
- Experimental sampler to make LLMs more creative☆30Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆107Updated 2 years ago
- Efficiently computing & storing token n-grams from large corpora☆19Updated 5 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- Blenderbot☆9Updated 3 years ago
- Lightweight tools for quick and easy LLM demo's☆26Updated 5 months ago
- ☆57Updated 5 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆20Updated this week
- ☆19Updated 4 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated 6 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated 11 months ago
- ☆17Updated 8 months ago
- Tools for formatting large language model prompts.☆12Updated last year
- LLM finetuning☆42Updated last year
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆43Updated last year
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆23Updated 2 years ago
- ☆20Updated last year
- ☆18Updated 6 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 5 months ago
- [COLM '24] Source-Aware Training Enables Knowledge Attribution in Language Models☆17Updated 7 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆41Updated 11 months ago
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆23Updated last year
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- ☆15Updated last year
- ☆16Updated 2 months ago