EleutherAI / best-download
URL downloader supporting checkpointing and continuous checksumming.
☆19Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for best-download
- One stop shop for all things carp☆59Updated 2 years ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated this week
- **ARCHIVED** Filesystem interface to 🤗 Hub☆56Updated last year
- A library for squeakily cleaning and filtering language datasets.☆45Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- Efficiently computing & storing token n-grams from large corpora☆15Updated last month
- ☆32Updated last year
- GPT-jax based on the official huggingface library☆13Updated 3 years ago
- Convenient Text-to-Text Training for Transformers☆19Updated 2 years ago
- Experiments with generating opensource language model assistants☆97Updated last year
- ☆46Updated last week
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆31Updated last year
- Implementation of stop sequencer for Huggingface Transformers☆15Updated last year
- ☆76Updated 11 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- Anh - LAION's multilingual assistant datasets and models☆27Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆92Updated last year
- ☆16Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆76Updated 11 months ago
- Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE☆18Updated 3 years ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆60Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- Embedding Recycling for Language models☆38Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated 6 months ago
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- ☆86Updated 2 years ago