shilinyan99/CrossLMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shilinyan99/CrossLMM)

shilinyan99 / CrossLMM

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

☆25

Alternatives and similar repositories for CrossLMM

Users that are interested in CrossLMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / MUTR
View on GitHub
「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation
☆85Jun 13, 2025Updated last year
shilinyan99 / PanoVOS
View on GitHub
「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
☆21Jul 2, 2024Updated 2 years ago
JaaackHongggg / WorldSense
View on GitHub
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆50Jul 12, 2026Updated last week
CaraJ7 / DraCo
View on GitHub
Offical Repository for Paper: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
☆17Dec 7, 2025Updated 7 months ago
Accio-Lab / Metis
View on GitHub
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
☆35Apr 10, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Accio-Lab / SwimBird
View on GitHub
☆18Apr 9, 2026Updated 3 months ago
PinxueGuo / X-Prompt
View on GitHub
☆17Oct 4, 2024Updated last year
allenai / container
View on GitHub
☆57Oct 17, 2021Updated 4 years ago
ZrrSkywalker / MAVIS
View on GitHub
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆156Dec 5, 2024Updated last year
CaraJ7 / CoMat
View on GitHub
[NeurIPS 2024] 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
☆169Nov 18, 2024Updated last year
Guillem96 / data2vec-vision
View on GitHub
PyTorch implementation of Data2Vec self-supervised approach for vision use cases.
☆18Oct 7, 2022Updated 3 years ago
KMnP / nn-revisit
View on GitHub
Rethinking Nearest Neighbors for Visual Classification
☆32Dec 17, 2021Updated 4 years ago
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
rockywind / ADD
View on GitHub
☆11Nov 21, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
gaopengcuhk / Container
View on GitHub
Official Code Release for Container : Context Aggregation Network
☆46Oct 17, 2021Updated 4 years ago
L599wy / OneVOS
View on GitHub
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
☆13Feb 27, 2025Updated last year
Mobiuslqm / Nabla-R2D3
View on GitHub
[NeurIPS 2025 Official Codes] Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards
☆46Sep 23, 2025Updated 10 months ago
ASGMVLP / ASGMVLP_CODE
View on GitHub
The repo of ASGMVLP
☆19Jan 16, 2026Updated 6 months ago
ZiyuGuo99 / MME-CoF
View on GitHub
Are Video Models Ready as Zero-shot Reasoners?
☆87Nov 24, 2025Updated 8 months ago
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago
JinjieNi / Quokka
View on GitHub
The official github repo for "Training Optimal Large Diffusion Language Models", the first-ever large-scale diffusion language models sca…
☆46Nov 6, 2025Updated 8 months ago
nex-agi / NexA4A
View on GitHub
Nex Agent for Agent is a meta-agent system that automatically creates specialized AI agents based on natural language requirements.
☆29Nov 18, 2025Updated 8 months ago
sailxjx / ai-fairy-tales
View on GitHub
AI-generated (dark) fairy tales
☆11Mar 2, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
qjy981010 / cocoapi
View on GitHub
COCO API Customized for OVIS evaluation
☆17Nov 8, 2021Updated 4 years ago
e-bug / fine-grained-evals
View on GitHub
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
☆13Jun 11, 2023Updated 3 years ago
zhangmiaosen2000 / Phi-Ground
View on GitHub
Home page for Microsoft Phi-Ground tech-report
☆22Sep 8, 2025Updated 10 months ago
penghao-wu / ProxyV
View on GitHub
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20May 22, 2025Updated last year
gogoduan / GoT-R1
View on GitHub
[ICLR26] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
☆106Jan 27, 2026Updated 5 months ago
MCG-NJU / CaReBench
View on GitHub
A Fine-grained Benchmark for Video Captioning and Retrieval
☆30Jul 16, 2025Updated last year
yqx7150 / WACM
View on GitHub
Wavelet Transform-assisted Adaptive Generative Modeling for Colorization
☆20Dec 26, 2022Updated 3 years ago
XianguiKang / AdvAD
View on GitHub
AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks
☆20May 12, 2025Updated last year
FeipengMa6 / VLoRA
View on GitHub
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ULMEvalKit / ULMEvalKit
View on GitHub
ULMEvalKit: One-Stop Eval ToolKit for Image Generation
☆56Dec 17, 2025Updated 7 months ago
MME-Benchmarks / MME-CoT
View on GitHub
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆136Aug 5, 2025Updated 11 months ago
nex-agi / NexGAP
View on GitHub
Nex General Agentic Data Pipeline, an end-to-end pipeline for generating high-quality agentic training data.
☆36Nov 19, 2025Updated 8 months ago
Ivan-Tang-3D / ENEL
View on GitHub
[ICLR 2026]The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"
☆11Jan 26, 2026Updated 5 months ago
asalarpour / Point_GN
View on GitHub
Official WACV 2025 code for Point-GN: A non-parametric, training-free method for 3D point cloud classification using Gaussian Positional …
☆15Jul 22, 2025Updated last year
Augusta-A / Awesome-EfficientVideo
View on GitHub
☆12Sep 11, 2021Updated 4 years ago
CASIA-LMC-Lab / Obj2Seq
View on GitHub
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)
☆85Nov 2, 2022Updated 3 years ago