noanabeshima / wikipedia-downloaderLinks
Downloads 2020 English Wikipedia articles as plaintext
☆25Updated 2 years ago
Alternatives and similar repositories for wikipedia-downloader
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
Sorting:
- ☆90Updated 3 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆83Updated last year
- A library for squeakily cleaning and filtering language datasets.☆47Updated 2 years ago
- Repository for analysis and experiments in the BigCode project.☆124Updated last year
- The data processing pipeline for the Koala chatbot language model☆118Updated 2 years ago
- Script for downloading GitHub.☆97Updated last year
- ☆33Updated 2 years ago
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆43Updated 10 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated 2 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- ☆154Updated 4 years ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- ☆85Updated 2 years ago
- distill chatGPT coding ability into small model (1b)☆30Updated last year
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆13Updated 2 years ago
- ☆79Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆43Updated 4 years ago
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.☆168Updated last month
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆26Updated 2 years ago
- ☆16Updated 4 months ago
- Open Implementations of LLM Analyses☆106Updated 10 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated 2 years ago
- We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts…☆94Updated last year
- ☆63Updated 3 weeks ago
- Developing tools to automatically analyze datasets☆74Updated 10 months ago
- Evaluation suite for large-scale language models.☆127Updated 4 years ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Updated last year
- Multi-Domain Expert Learning☆67Updated last year