[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Dec 25, 2024Updated last year
Alternatives and similar repositories for COSA
Users that are interested in COSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆55Sep 4, 2023Updated 2 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆311Dec 25, 2024Updated last year
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆73Jun 3, 2024Updated last year
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆301Mar 14, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆12Nov 5, 2024Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆26Nov 28, 2023Updated 2 years ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- ☆33Nov 12, 2018Updated 7 years ago
- ☆19Dec 22, 2022Updated 3 years ago
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆19Oct 15, 2022Updated 3 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆57Mar 9, 2025Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- ☆14Aug 13, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 3 years ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆363Jan 14, 2025Updated last year
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆91Sep 6, 2023Updated 2 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- ☆32Feb 17, 2026Updated 3 months ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- ☆16Jun 4, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆131Apr 4, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 6 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆17Jun 20, 2023Updated 2 years ago
- [ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement☆20Aug 18, 2021Updated 4 years ago
- Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…☆21Apr 7, 2021Updated 5 years ago
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆26Feb 1, 2024Updated 2 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆30Dec 1, 2022Updated 3 years ago
- Coursework for Mathematics for Machine Learning (70015) at Imperial College London☆10Nov 12, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆18Jul 8, 2025Updated 10 months ago
- ☆259Dec 10, 2022Updated 3 years ago
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Apr 22, 2024Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- [CVPR 2023] Official code for paper: Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detecti…☆31Jun 23, 2023Updated 2 years ago
- [ECCV 2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval☆79Nov 29, 2022Updated 3 years ago
- Official implementation for paper TEVAD: Improved video anomaly detection with captions☆40Apr 5, 2023Updated 3 years ago