jh-yi/Video-Panda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jh-yi/Video-Panda)

jh-yi / Video-Panda

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]

☆81

Alternatives and similar repositories for Video-Panda

Users that are interested in Video-Panda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

derkbreeze / LPT
View on GitHub
Official implementation of the CVPR2022 paper "Learning of Global Objective for Network Flow in Multi-Object Tracking"
☆17Dec 30, 2025Updated 6 months ago
Luoadore / RACnet
View on GitHub
[ICIP 2024]Rethinking temporal self-similarity for repetitive action counting
☆10Mar 10, 2025Updated last year
olga-zats / GTDA
View on GitHub
[ECCV2024] Gated Temporal Action Anticipation for Stochastic Long-Term Anticipation
☆24May 29, 2025Updated last year
SHI-Labs / VisPer-LM
View on GitHub
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆73Oct 17, 2025Updated 9 months ago
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TalalWasim / Vita-CLIP
View on GitHub
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
☆126Jul 1, 2023Updated 3 years ago
HashmatShadab / HSAT
View on GitHub
[MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
☆12Jun 17, 2025Updated last year
TalalWasim / Video-FocalNets
View on GitHub
Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]
☆102Apr 30, 2024Updated 2 years ago
mahtabbigverdi / Aurora
View on GitHub
☆12Dec 4, 2024Updated last year
jh-yi / DND-Diko-WWWR
View on GitHub
☆14Aug 22, 2025Updated 11 months ago
princetonvisualai / merv
View on GitHub
Unifying Specialized Visual Encoders for Video Language Models
☆25Nov 22, 2025Updated 8 months ago
olga-zats / DIFF_MANTA
View on GitHub
[CVPR 2025] MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
☆27Jun 13, 2025Updated last year
Sueqk / LMM-VQA
View on GitHub
LMM for VQA, tcsvt version
☆10Jul 19, 2024Updated 2 years ago
ChengHan111 / VPT-or-FT
View on GitHub
Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)
☆13Mar 8, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
olga-zats / goal_consistency
View on GitHub
[ICIP2023] Code for the paper 'Action Anticipation with Goal Consistency'
☆12Apr 5, 2024Updated 2 years ago
fahadshamshad / deep-facial-privacy-prior
View on GitHub
[ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".
☆12Oct 11, 2024Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated 2 years ago
HashmatShadab / Robustness-of-Volumetric-Medical-Segmentation-Models
View on GitHub
[BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
☆15Nov 1, 2024Updated last year
tyshiwo1 / Awesome-Visual-Tokenizer
View on GitHub
Awesome Visual Tokenizers/Autoencoders
☆20Nov 19, 2025Updated 8 months ago
ShahinaKK / LWI-VMS
View on GitHub
Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]
☆22Oct 27, 2024Updated last year
mzeeshankaramat / SafeAgents
View on GitHub
☆20Jun 4, 2026Updated last month
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
Muzammal-Naseer / DCViT-AT
View on GitHub
Official repository for "Boosting Adversarial Transferability using Dynamic Cues " (ICLR 2023)
☆20Aug 24, 2023Updated 2 years ago
OpenGVLab / PVC
View on GitHub
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆54Jun 12, 2025Updated last year
BonnBytes / PyTorch-FWD
View on GitHub
[ICLR2025] Frechet Wavelet Distance: A metric to detect domain bias in Generative models.
☆18Sep 2, 2025Updated 10 months ago
derkbreeze / AwesomeActionSegmentation
View on GitHub
☆33Jun 19, 2026Updated last month
renjie-liang / HUAL
View on GitHub
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
☆15Dec 12, 2023Updated 2 years ago
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
CVG-Bonn / EgoControl
View on GitHub
[CVPR'26] EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
☆24Jul 11, 2026Updated last week
PKU-YuanGroup / LLMBind
View on GitHub
LLMBind: A Unified Modality-Task Integration Framework
☆19Jun 16, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
GasolSun36 / MVP
View on GitHub
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆24Sep 9, 2024Updated last year
HashmatShadab / MambaRobustness
View on GitHub
[CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"
☆26Jun 8, 2025Updated last year
PKU-YuanGroup / GPT-as-Language-Tree
View on GitHub
GPT as a Monte Carlo Language Tree: A Probabilistic Perspective
☆46Jan 18, 2025Updated last year
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 8 months ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
kyegomez / BRAVE-ViT-Swarm
View on GitHub
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"
☆26Jun 22, 2026Updated last month