noanabeshima / wikipedia-downloaderLinks
Downloads 2020 English Wikipedia articles as plaintext
☆25Updated 2 years ago
Alternatives and similar repositories for wikipedia-downloader
Users that are interested in wikipedia-downloader are comparing it to the libraries listed below
Sorting:
- ☆92Updated 3 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆85Updated 2 years ago
- Script for downloading GitHub.☆97Updated last year
- Repository for analysis and experiments in the BigCode project.☆128Updated last year
- The data processing pipeline for the Koala chatbot language model☆118Updated 2 years ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- A library for squeakily cleaning and filtering language datasets.☆49Updated 2 years ago
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆44Updated last year
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.☆171Updated 2 months ago
- ☆39Updated 3 years ago
- Open Implementations of LLM Analyses☆108Updated last year
- ☆128Updated 2 years ago
- ☆17Updated 8 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- Safety Score for Pre-Trained Language Models☆95Updated 2 years ago
- Everything for the Paper: 'Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing'☆19Updated 2 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated 2 years ago
- ☆32Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated 2 years ago
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- ☆85Updated 2 years ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated 2 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated 2 years ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆32Updated last year
- Data preparation code for Amber 7B LLM☆93Updated last year
- Experiments on speculative sampling with Llama models☆127Updated 2 years ago
- ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple inpu…☆54Updated 2 years ago
- Developing tools to automatically analyze datasets☆75Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Updated 2 years ago
- A GPT-based generative LM for combined text and math formulas, leveraging tree-based formula encoding. Published as "Tree-Based Represent…☆41Updated 2 years ago