noanabeshima / wikipedia-downloaderLinks
Downloads 2020 English Wikipedia articles as plaintext
☆25Updated 2 years ago
Alternatives and similar repositories for wikipedia-downloader
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
Sorting:
- ☆91Updated 3 years ago
- The data processing pipeline for the Koala chatbot language model☆118Updated 2 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆84Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.☆47Updated 2 years ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Updated 2 years ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- ☆157Updated 4 years ago
- Repository for analysis and experiments in the BigCode project.☆124Updated last year
- Safety Score for Pre-Trained Language Models☆96Updated last year
- ☆33Updated 2 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆67Updated 2 years ago
- ☆39Updated 2 years ago
- Open Implementations of LLM Analyses☆107Updated 11 months ago
- Script for downloading GitHub.☆97Updated last year
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆48Updated this week
- ☆127Updated 2 years ago
- QLoRA with Enhanced Multi GPU Support☆37Updated 2 years ago
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆43Updated 11 months ago
- ☆16Updated 5 months ago
- Reward Model framework for LLM RLHF☆61Updated 2 years ago
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.☆168Updated 2 months ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated 2 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆44Updated 5 years ago
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆94Updated 2 years ago
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆35Updated 2 years ago
- ☆79Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated 2 years ago