noanabeshima / wikipedia-downloader
Downloads 2020 English Wikipedia articles as plaintext
☆22Updated last year
Alternatives and similar repositories for wikipedia-downloader:
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
- ☆87Updated 2 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆80Updated last year
- ☆77Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆16Updated 6 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Script for downloading GitHub.☆90Updated 6 months ago
- A library for squeakily cleaning and filtering language datasets.☆45Updated last year
- distill chatGPT coding ability into small model (1b)☆26Updated last year
- Codebase for the arxiver dataset☆13Updated 2 months ago
- Efficiently computing & storing token n-grams from large corpora☆17Updated 3 months ago
- Techniques used to run BLOOM at inference in parallel☆37Updated 2 years ago
- ☆32Updated last year
- A collection of models built with ColossalAI☆32Updated 2 years ago
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆44Updated 3 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆63Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆105Updated last month
- Open Implementations of LLM Analyses☆98Updated 3 months ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆58Updated last year
- The data processing pipeline for the Koala chatbot language model☆117Updated last year
- GPT Demo with hybrid distributed training☆10Updated 2 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- ☆12Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆84Updated 3 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year
- QuIP quantization☆48Updated 10 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated last month
- Developing tools to automatically analyze datasets☆74Updated 3 months ago
- Repository for analysis and experiments in the BigCode project.☆117Updated 10 months ago