KangarooGroup/Kangaroo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KangarooGroup/Kangaroo)

KangarooGroup / Kangaroo

official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

☆67

Alternatives and similar repositories for Kangaroo

Users that are interested in Kangaroo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆73Oct 14, 2024Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
doc-doc / NExT-GQA
View on GitHub
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆89Jul 1, 2024Updated 2 years ago
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
Vision-CAIR / Infinibench
View on GitHub
Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
☆20Nov 4, 2025Updated 8 months ago
gyxxyg / VTG-LLM
View on GitHub
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆130Dec 10, 2024Updated last year
ljang0 / videowebarena
View on GitHub
☆14Dec 25, 2024Updated last year
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆526Updated this week
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
sudo-Boris / mr-Blip
View on GitHub
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆95Mar 9, 2025Updated last year
bytedance / Shot2Story
View on GitHub
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆178Jan 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆144Jul 9, 2025Updated last year
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆126Nov 25, 2024Updated last year
bigai-nlco / VideoLLaMB
View on GitHub
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆87Feb 27, 2025Updated last year
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated last year
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
RupertLuo / Valley
View on GitHub
The official repository of "Video assistant towards large language model makes everything easy"
☆232Dec 24, 2024Updated last year
Jazzcharles / OVSegmentor
View on GitHub
OVSegmentor, CVPR23
☆62Apr 22, 2024Updated 2 years ago
huggingface / docmatix
View on GitHub
A huge dataset for Document Visual Question Answering
☆24Jul 29, 2024Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SaraGhazanfari / CoF
View on GitHub
Chain-of-Frames [CVPR 2026]
☆40Jul 2, 2025Updated last year
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆547Aug 14, 2025Updated 11 months ago
ttengwang / Awesome_Long_Form_Video_Understanding
View on GitHub
Awesome papers & datasets specifically focused on long-term videos.
☆381Oct 9, 2025Updated 9 months ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
Vision-CAIR / LongVU
View on GitHub
[ICML 2025] Official PyTorch implementation of LongVU
☆429May 8, 2025Updated last year
JUNJIE99 / MLVU
View on GitHub
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆263Apr 13, 2026Updated 3 months ago
ziplab / LongVLM
View on GitHub
☆108Jul 30, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
egoschema / EgoSchema
View on GitHub
☆117Dec 30, 2024Updated last year
AV-Odyssey / AV-Odyssey
View on GitHub
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆31Dec 23, 2024Updated last year
DAMO-NLP-SG / LongPO
View on GitHub
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆43Feb 27, 2025Updated last year
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
cankocagil / TT-SRN
View on GitHub
TT-SPN: Twin Transformers with Sinusoidal Representation Networks for Video Instance Segmentation
☆16Oct 8, 2021Updated 4 years ago