noanabeshima / wikipedia-downloaderLinks

Downloads 2020 English Wikipedia articles as plaintext

☆25

Alternatives and similar repositories for wikipedia-downloader

Users that are interested in wikipedia-downloader are comparing it to the libraries listed below

Sorting:

EleutherAI / github-downloader
Script for downloading GitHub.
☆95Updated last year
young-geng / koala_data_pipeline
The data processing pipeline for the Koala chatbot language model
☆117Updated 2 years ago
EleutherAI / openwebtext2
☆90Updated 3 years ago
CarperAI / squeakily
A library for squeakily cleaning and filtering language datasets.
☆47Updated 2 years ago
leogao2 / lm_dataformat
☆79Updated last year
EleutherAI / DeeperSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
☆168Updated last month
bigcode-project / bigcode-analysis
Repository for analysis and experiments in the BigCode project.
☆120Updated last year
EleutherAI / lm_perplexity
☆151Updated 4 years ago
LLM360 / crystalcoder-train
Pre-training code for CrystalCoder 7B LLM
☆54Updated last year
EleutherAI / stackexchange-dataset
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆83Updated last year
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆65Updated 2 years ago
finetunej / transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
☆56Updated 3 years ago
CarperAI / decontamination
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆25Updated 2 years ago
Narsil / bloomserver
☆39Updated 2 years ago
huggingface / data-measurements-tool
Developing tools to automatically analyze datasets
☆74Updated 8 months ago
togethercomputer / Llama-2-7B-32K-Instruct
☆84Updated last year
LLM360 / Analysis360
Open Implementations of LLM Analyses
☆105Updated 9 months ago
huggingface / transformers_bloom_parallel
Techniques used to run BLOOM at inference in parallel
☆37Updated 2 years ago
ckkissane / rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
☆9Updated 2 years ago
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
explodinggradients / Funtuner
Supervised instruction finetuning for LLM with HF trainer and Deepspeed
☆35Updated 2 years ago
explodinggradients / nemesis
Reward Model framework for LLM RLHF
☆61Updated 2 years ago
choosewhatulike / case2code
☆15Updated 3 months ago
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated last year
huggingface / api-inference-community
☆171Updated 5 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
microsoft / SafeNLP
Safety Score for Pre-Trained Language Models
☆95Updated last year
huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆66Updated 2 years ago
EleutherAI / best-download
URL downloader supporting checkpointing and continuous checksumming.
☆19Updated last year
leogao2 / commoncrawl_downloader
☆33Updated 2 years ago