noanabeshima / wikipedia-downloader
Downloads 2020 English Wikipedia articles as plaintext
☆25Updated 2 years ago
Alternatives and similar repositories for wikipedia-downloader
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
Sorting:
- ☆90Updated 2 years ago
- Pre-training code for CrystalCoder 7B LLM☆54Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated 2 years ago
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆11Updated last year
- The data processing pipeline for the Koala chatbot language model☆117Updated 2 years ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated last year
- A library for squeakily cleaning and filtering language datasets.☆47Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Repository for analysis and experiments in the BigCode project.☆118Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆82Updated last year
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆43Updated 7 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆72Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- distill chatGPT coding ability into small model (1b)☆29Updated last year
- Techniques used to run BLOOM at inference in parallel☆37Updated 2 years ago
- Experiments on speculative sampling with Llama models☆126Updated last year
- ☆33Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- ☆14Updated last year
- An open source ChatGPT UI for ToolLlama☆28Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 11 months ago
- Reward Model framework for LLM RLHF☆61Updated last year
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆142Updated 2 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year