rom1504 / any2datasetLinks

Turn any collection of files into a dataset

☆45

Alternatives and similar repositories for any2dataset

Users that are interested in any2dataset are comparing it to the libraries listed below

Sorting:

LAION-AI / interesting-text-datasets
☆43Updated 2 years ago
CarperAI / squeakily
A library for squeakily cleaning and filtering language datasets.
☆47Updated 2 years ago
data2ml / all-clip
Load any clip model with a standardized interface
☆21Updated last year
huggingface / fuego
[WIP] A 🔥 interface for running code in the cloud
☆85Updated 2 years ago
lucidrains / holodeck-pytorch
Implementation of a holodeck, written in Pytorch
☆18Updated last year
CERC-AAI / Robin
☆63Updated 10 months ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated last year
CarperAI / treasure_trove
☆22Updated last year
crypdick / timm-lr-scheduler-explorer
A dashboard for exploring timm learning rate schedulers
☆19Updated 8 months ago
TheoCoombes / crawlingathome
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
☆32Updated 2 years ago
borisdayma / sora-mini
☆17Updated last year
huggingface / hffs
**ARCHIVED** Filesystem interface to 🤗 Hub
☆58Updated 2 years ago
EleutherAI / magiCARP
One stop shop for all things carp
☆59Updated 2 years ago
nateraw / huggingface-sync-action
GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗
☆76Updated 9 months ago
kyegomez / EXA-1
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
☆40Updated last year
huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆66Updated 2 years ago
modal-labs / ci-on-modal
A sample pattern for running CI tests on Modal
☆18Updated 3 months ago
kyegomez / MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
☆24Updated 2 weeks ago
kyegomez / SelfExtend
Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta
☆13Updated 8 months ago
EleutherAI / training-jacobian
☆23Updated 7 months ago
CarperAI / decontamination
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆25Updated 2 years ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 10 months ago
jquesnelle / ctranslate2-rs
Rust bindings for CTranslate2
☆14Updated 2 years ago
facebookresearch / MultiModalExplorer
Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…
☆27Updated last year
nateraw / huggingface-datasets-converter
Scripts to convert datasets from various sources to Hugging Face Datasets.
☆57Updated 2 years ago
joey00072 / TinyLora
Low-Rank Adaptation of Large Language Models clean implementation
☆8Updated 2 years ago
facebookresearch / DIG-In
This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.
☆20Updated last year
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated last year
sytelus / pcprep
Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.
☆40Updated 3 months ago