noanabeshima / wikipedia-downloaderLinks
Downloads 2020 English Wikipedia articles as plaintext
☆25Updated 2 years ago
Alternatives and similar repositories for wikipedia-downloader
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
Sorting:
- Script for downloading GitHub.☆95Updated last year
- The data processing pipeline for the Koala chatbot language model☆117Updated 2 years ago
- ☆90Updated 3 years ago
- A library for squeakily cleaning and filtering language datasets.☆47Updated 2 years ago
- ☆79Updated last year
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.☆168Updated last month
- Repository for analysis and experiments in the BigCode project.☆120Updated last year
- ☆151Updated 4 years ago
- Pre-training code for CrystalCoder 7B LLM☆54Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆83Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆65Updated 2 years ago
- 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.☆56Updated 3 years ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated 2 years ago
- ☆39Updated 2 years ago
- Developing tools to automatically analyze datasets☆74Updated 8 months ago
- ☆84Updated last year
- Open Implementations of LLM Analyses☆105Updated 9 months ago
- Techniques used to run BLOOM at inference in parallel☆37Updated 2 years ago
- Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF☆9Updated 2 years ago
- Multi-Domain Expert Learning☆67Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆35Updated 2 years ago
- Reward Model framework for LLM RLHF☆61Updated 2 years ago
- ☆15Updated 3 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆171Updated 5 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Safety Score for Pre-Trained Language Models☆95Updated last year
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- ☆33Updated 2 years ago