☆121Jan 15, 2026Updated 4 months ago
Alternatives and similar repositories for laion5b-downloader
Users that are interested in laion5b-downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AAAI 2024: Visual Instruction Generation and Correction☆97Feb 4, 2024Updated 2 years ago
- 万卷1.0多模态语料☆574Oct 20, 2023Updated 2 years ago
- PICABench: How Far Are We from Physically Realistic Image Editing?☆38Nov 5, 2025Updated 7 months ago
- Open-source multimodal data annotation platform with AI auto-annotation support.☆1,583Updated this week
- Data annotation component library --provided as NPM packages☆152Jun 2, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- SDK of OpenDataLab - https://opendatalab.org.cn☆60Jul 31, 2025Updated 10 months ago
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆14Apr 18, 2024Updated 2 years ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆302May 22, 2025Updated last year
- We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enablin…☆74Apr 12, 2026Updated last month
- A Python package for interacting with the MinerU Vision-Language Model.☆128Updated this week
- The Official PyTorch Implementation of OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation☆34Jul 6, 2024Updated last year
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.☆4,424Oct 19, 2025Updated 7 months ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆72Feb 11, 2025Updated last year
- Framewise online action recognition using 4D data☆13Dec 3, 2019Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"☆18Dec 22, 2021Updated 4 years ago
- ☆51Jan 27, 2026Updated 4 months ago
- A huge dataset for Document Visual Question Answering☆22Jul 29, 2024Updated last year
- Official repository of paper: "FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation"☆26Mar 2, 2023Updated 3 years ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆369Jul 24, 2025Updated 10 months ago
- A collection of resources and papers on diffusion models of video generation.☆10Feb 11, 2023Updated 3 years ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆19May 7, 2022Updated 4 years ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆192Nov 17, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆85Nov 2, 2022Updated 3 years ago
- A paper set about Machine Learning and Deep Learning.机器学习,深度学习等理论与应用☆11Jul 15, 2017Updated 8 years ago
- Given an input RGB image, we generate novel viewpoints that simulate a 3D interactive experience.☆23Apr 26, 2023Updated 3 years ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆472Jan 19, 2024Updated 2 years ago
- Chinese CLIP models with SOTA performance.☆62Aug 28, 2023Updated 2 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆289Jan 14, 2024Updated 2 years ago
- datasets resource☆144May 27, 2026Updated 2 weeks ago
- ☆58Feb 13, 2022Updated 4 years ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆141Dec 16, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- An unofficial implementation of DreamScene360.☆83Jun 13, 2024Updated last year
- Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "☆14Jul 19, 2024Updated last year
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆423May 5, 2025Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆93Apr 30, 2024Updated 2 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆327Jan 20, 2025Updated last year
- [IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives☆35Nov 25, 2025Updated 6 months ago
- Implemention of "Realtime Multi Person Pose-Estimation" in pytorch with data from AI Challenger☆13Nov 24, 2017Updated 8 years ago