saicoco / webdatasetLinks
pytorch大规模数据读取dataset
☆13Updated 3 years ago
Alternatives and similar repositories for webdataset
Users that are interested in webdataset are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆31Updated 6 months ago
- ICCV2023-Diffusion-Papers☆108Updated last year
- The official repository for the RealSyn dataset☆34Updated last month
- FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing☆15Updated 2 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆53Updated last year
- ☆43Updated 2 years ago
- Official repo for 【FaceScore: Benchmarking and Enhancing Face Quality in Human Generation】☆69Updated 5 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆209Updated 2 months ago
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆30Updated last year
- ☆26Updated last month
- Large scale image dataset visiualization tool.☆119Updated last year
- ☆77Updated last year
- [ICLR 2023] Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models☆50Updated last year
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆71Updated last year
- Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch☆17Updated 2 years ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated last year
- An official pytorch implementation of "MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts"☆32Updated 6 months ago
- ☆149Updated 4 months ago
- ☆133Updated last year
- ☆87Updated 11 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆145Updated 6 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆122Updated this week
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆107Updated last month
- Official repo for 【TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps】☆34Updated 5 months ago
- The HD-VG-130M Dataset☆117Updated last year
- ☆95Updated last year