CASIA-IVA-Lab/COSA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CASIA-IVA-Lab/COSA)

CASIA-IVA-Lab / COSA

[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

☆43

Alternatives and similar repositories for COSA

Users that are interested in COSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CASIA-IVA-Lab / OPT_Questioner
View on GitHub
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Aug 9, 2023Updated 2 years ago
CASIA-IVA-Lab / ChatBridge
View on GitHub
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…
☆55Sep 4, 2023Updated 2 years ago
CASIA-IVA-Lab / VALOR
View on GitHub
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
☆311Dec 25, 2024Updated last year
CASIA-IVA-Lab / MRES
View on GitHub
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆73Jun 3, 2024Updated last year
CASIA-IVA-Lab / VAST
View on GitHub
[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
☆301Mar 14, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
a-antoniades / swe-search
View on GitHub
☆12Nov 5, 2024Updated last year
CNVid / CNVid-3.5M
View on GitHub
This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…
☆26Nov 28, 2023Updated 2 years ago
takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
yj-yu / lsmdc
View on GitHub
☆33Nov 12, 2018Updated 7 years ago
WingsBrokenAngel / MSR-VTT-DataCleaning
View on GitHub
☆19Dec 22, 2022Updated 3 years ago
xmu-xiaoma666 / SDATR
View on GitHub
Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)
☆19Oct 15, 2022Updated 3 years ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
aorwall / moatless-testbeds
View on GitHub
Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…
☆14Apr 9, 2025Updated last year
irvingzhang0512 / open-images-downloader
View on GitHub
☆14Aug 13, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ivattyue / Ada-K
View on GitHub
Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"
☆12Mar 1, 2025Updated last year
bladewaltz1 / ModeCap
View on GitHub
Controllable mage captioning model with unsupervised modes
☆21Apr 14, 2023Updated 3 years ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆363Jan 14, 2025Updated last year
aimagelab / PMA-Net
View on GitHub
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Jun 7, 2024Updated last year
jayleicn / TVCaption
View on GitHub
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆91Sep 6, 2023Updated 2 years ago
Rubics-Xuan / Med-DANet
View on GitHub
Med-DANet Series (ECCV 2022 & WACV 2024)
☆13Jan 2, 2024Updated 2 years ago
ByteDance-Seed / Seed2.0
View on GitHub
☆32Feb 17, 2026Updated 3 months ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
zhengsipeng / VRDFormer_VRD
View on GitHub
☆16Jun 4, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆131Apr 4, 2025Updated last year
CASIA-IVA-Lab / VRoPE
View on GitHub
[EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
☆27Nov 18, 2025Updated 6 months ago
TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆17Jun 20, 2023Updated 2 years ago
xrenaa / CS-DisMo
View on GitHub
[ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement
☆20Aug 18, 2021Updated 4 years ago
crodriguezo / DORi
View on GitHub
Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…
☆21Apr 7, 2021Updated 5 years ago
flageval-baai / CMMU
View on GitHub
[IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
☆26Feb 1, 2024Updated 2 years ago
pkunlp-icler / PCA-EVAL
View on GitHub
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆107Mar 14, 2024Updated 2 years ago
aimagelab / camel
View on GitHub
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
☆30Dec 1, 2022Updated 3 years ago
matthewwicker / Kryptonite-N
View on GitHub
Coursework for Mathematics for Machine Learning (70015) at Imperial College London
☆10Nov 12, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zwbx / Chain-of-Action
View on GitHub
☆18Jul 8, 2025Updated 10 months ago
CryhanFang / CLIP2Video
View on GitHub
☆259Dec 10, 2022Updated 3 years ago
CASIA-IVA-Lab / SC-Tune
View on GitHub
Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
☆16Apr 22, 2024Updated 2 years ago
tianyi-lab / R2-T2
View on GitHub
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆19Mar 10, 2025Updated last year
ArielZc / CU-Net
View on GitHub
[CVPR 2023] Official code for paper: Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detecti…
☆31Jun 23, 2023Updated 2 years ago
LiuRicky / ts2_net
View on GitHub
[ECCV 2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
☆79Nov 29, 2022Updated 3 years ago
coranholmes / TEVAD
View on GitHub
Official implementation for paper TEVAD: Improved video anomaly detection with captions
☆40Apr 5, 2023Updated 3 years ago