LAION-AI/Big-Interleaved-Dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LAION-AI/Big-Interleaved-Dataset)

LAION-AI / Big-Interleaved-Dataset

Big-Interleaved-Dataset

☆59

Alternatives and similar repositories for Big-Interleaved-Dataset

Users that are interested in Big-Interleaved-Dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TheoCoombes / crawlingathome
View on GitHub
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
☆33Mar 21, 2023Updated 3 years ago
pbaylies / clustering-laion400m
View on GitHub
Script and models for clustering LAION-400m CLIP embeddings.
☆26Jan 10, 2022Updated 4 years ago
LAION-AI / OCR-ensemble
View on GitHub
☆42Jun 15, 2023Updated 3 years ago
WaihinWong / Papers
View on GitHub
The list of some conference papers.
☆11Apr 19, 2019Updated 7 years ago
Picsart-AI-Research / Social-Reward
View on GitHub
[ICLR 2024 Spotlight] Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Communi…
☆12Mar 29, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
zhjohnchan / bert-clip-synesthesia
View on GitHub
[Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.
☆14Jun 7, 2023Updated 3 years ago
LAION-AI / laion-dedup
View on GitHub
☆18Nov 7, 2022Updated 3 years ago
wyndwarrior / autoregressive-bbox
View on GitHub
☆17Oct 18, 2022Updated 3 years ago
rl-lang-grounding / rl-lang-ground
View on GitHub
Tensorflow code for WACV 2019 paper "Attention Based Natural Language Grounding by Navigating Virtual Environment" - https://arxiv.org/ab…
☆17Nov 7, 2018Updated 7 years ago
kakaobrain / coyo-dataset
View on GitHub
COYO-700M: Large-scale Image-Text Pair Dataset
☆1,256Nov 30, 2022Updated 3 years ago
simonw / webvid-datasette
View on GitHub
A Datasette instance for searching WebVid-10M
☆15Sep 30, 2022Updated 3 years ago
LAION-AI / laion50BU
View on GitHub
Un-*** 50 billions multimodality dataset
☆24Sep 14, 2022Updated 3 years ago
xmrec / xmrec.github.io
View on GitHub
☆23Dec 16, 2022Updated 3 years ago
IDEA-Research / hana
View on GitHub
Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch
☆17Dec 22, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
edchengg / oven_eval
View on GitHub
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆44Jun 7, 2025Updated last year
wzp8023391 / Interactive-CD-tool
View on GitHub
☆10Dec 12, 2023Updated 2 years ago
marvl-challenge / marvl-code
View on GitHub
[EMNLP 2021] Code and data for our paper "Visually Grounded Reasoning across Languages and Cultures"
☆30Dec 30, 2021Updated 4 years ago
uiuctml / fair-classification
View on GitHub
Post-processing for fair classification
☆16Jun 30, 2025Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
HYPJUDY / Sparkles
View on GitHub
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆45Jun 14, 2024Updated 2 years ago
HomoScriptor-Project / HomoScriptor
View on GitHub
Fuel innovation and advance language models with HomoScriptor: A vibrant, community-driven dataset for fine-tuning large language models.
☆18Oct 14, 2023Updated 2 years ago
allenbai01 / transformers-as-statisticians
View on GitHub
☆36Jul 5, 2023Updated 3 years ago
facebookresearch / grounding-inductive-biases
View on GitHub
reproduces experiments from "Grounding inductive biases in natural images: invariance stems from variations in data"
☆17Sep 25, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
tensorfork / OBST
View on GitHub
Your fruity companion for transformers
☆14May 25, 2022Updated 4 years ago
w3c / mediacapture-handle
View on GitHub
☆15Mar 6, 2025Updated last year
toxtli / daw
View on GitHub
GridSound wants to be a free browser-based HTML5 DAW (Digital Audio Workstation) following the new Web Audio API. You can test the applic…
☆12Dec 9, 2018Updated 7 years ago
rom1504 / laion-prepro
View on GitHub
Get hundred of million of image+url from the crawling at home dataset and preprocess them
☆222May 26, 2024Updated 2 years ago
AdamRain / YFCC15M_downloader
View on GitHub
A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).
☆19Nov 13, 2024Updated last year
daeh / computed-appraisals
View on GitHub
Computed Appraisals Model. Code and data for the 2023 paper, "Emotion prediction as computation over a generative theory of mind"
☆13Jun 12, 2023Updated 3 years ago
eminorhan / humanlike-vits
View on GitHub
ViT models pretrained with up to ~5k hours of human-like video data
☆14Aug 10, 2023Updated 2 years ago
zzxslp / XL-VLN
View on GitHub
Dataset for Bilingual VLN
☆11Dec 5, 2020Updated 5 years ago
myrho / bright-db
View on GitHub
Offline-first, decentralized graph database of collaborative Web apps
☆15May 12, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ksOAn6g5 / TaiSu
View on GitHub
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆192Nov 17, 2023Updated 2 years ago
beaver-lodge / manx
View on GitHub
MLIR backend for Nx
☆14May 24, 2024Updated 2 years ago
caravelo / airports
View on GitHub
Airports information per language
☆14Oct 17, 2015Updated 10 years ago
LAION-AI / Conditional-Pretraining-of-Large-Language-Models
View on GitHub
☆37May 7, 2023Updated 3 years ago
toxtli / github-pages-cms
View on GitHub
The easiest way to update static sites hosted on GitHub Pages with a visual editor
☆11Mar 28, 2018Updated 8 years ago
tair-opensource / AlibabaCloud.TairSDK
View on GitHub
Based on StackExchange.Redis that operates Tair For Redis Modules.
☆11Feb 28, 2025Updated last year
allenai / gpv2-web10k
View on GitHub
Download Web-10K data by querying Bing Image Search
☆10Feb 1, 2022Updated 4 years ago