[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Dec 25, 2024Updated last year
Alternatives and similar repositories for COSA
Users that are interested in COSA are comparing it to the libraries listed below
Sorting:
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆306Dec 25, 2024Updated last year
- ☆33Nov 12, 2018Updated 7 years ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- ☆12Nov 5, 2024Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Nov 28, 2023Updated 2 years ago
- ☆18Jul 8, 2025Updated 7 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Jun 20, 2023Updated 2 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆298Mar 14, 2024Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated 10 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆360Jan 14, 2025Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆42Dec 16, 2025Updated 2 months ago
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 2 years ago
- ☆19Dec 22, 2022Updated 3 years ago
- Implementations of Multi-Task and Meta-Learning baselines for the Metaworld benchmark☆33Aug 20, 2025Updated 6 months ago
- ☆28Aug 13, 2025Updated 6 months ago
- ☆24May 8, 2024Updated last year
- [ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement☆20Aug 18, 2021Updated 4 years ago
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- ☆44Mar 8, 2021Updated 4 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- ☆48Sep 5, 2024Updated last year
- Fun with wgpu: Simulating slime mold☆24Aug 22, 2024Updated last year
- Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…☆21Apr 7, 2021Updated 4 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆90Sep 6, 2023Updated 2 years ago
- ☆26Nov 7, 2023Updated 2 years ago
- ☆27Mar 21, 2024Updated last year
- Repository for an end-to-end image captioning method PTSN(ACM MM22).☆60Dec 11, 2022Updated 3 years ago
- ☆28Feb 10, 2025Updated last year
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Jun 28, 2021Updated 4 years ago
- ☆32Feb 8, 2024Updated 2 years ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆270Aug 9, 2023Updated 2 years ago
- ☆29Dec 16, 2025Updated 2 months ago
- ☆34Feb 6, 2026Updated 3 weeks ago
- 3D Gaussian Splatting for underwater scene reconstruction via physcial-based appearance-medium decoupling☆23Feb 13, 2026Updated 2 weeks ago
- OpenForensics Dataset☆28Oct 21, 2022Updated 3 years ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year