[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Dec 25, 2024Updated last year
Alternatives and similar repositories for COSA
Users that are interested in COSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆55Sep 4, 2023Updated 2 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆311Dec 25, 2024Updated last year
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated 2 years ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆73Jun 3, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆302Mar 14, 2024Updated 2 years ago
- ☆12Nov 5, 2024Updated last year
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- ☆33Nov 12, 2018Updated 7 years ago
- ☆19Dec 22, 2022Updated 3 years ago
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆19Oct 15, 2022Updated 3 years ago
- Code for "APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training"☆41Dec 23, 2025Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆56Mar 9, 2025Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆14Aug 13, 2021Updated 4 years ago
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- Generating Video Caption Using LSTM☆12May 29, 2023Updated 3 years ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆364Jan 14, 2025Updated last year
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated 2 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆91Sep 6, 2023Updated 2 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- ☆37Feb 17, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆26Nov 7, 2023Updated 2 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- ☆16Jun 4, 2023Updated 3 years ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆131Apr 4, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆28Nov 18, 2025Updated 6 months ago
- [ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement☆20Aug 18, 2021Updated 4 years ago
- OpenGL based 3D engine with a viewer API for pointcloud, surface meshes☆14May 10, 2023Updated 3 years ago
- Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…☆21Apr 7, 2021Updated 5 years ago
- ☆24May 8, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- Coursework for Mathematics for Machine Learning (70015) at Imperial College London☆10Nov 12, 2024Updated last year
- ☆18Jul 8, 2025Updated 11 months ago
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Apr 22, 2024Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- [CVPR 2023] Official code for paper: Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detecti…☆32Jun 23, 2023Updated 2 years ago
- [ECCV 2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval☆79Nov 29, 2022Updated 3 years ago