[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆47Jan 8, 2025Updated last year
Alternatives and similar repositories for FlexAttention
Users that are interested in FlexAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆45Apr 18, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models☆19Jun 18, 2025Updated 10 months ago
- [RSE25] Official implementation of the paper mKGR.☆21Jan 15, 2026Updated 3 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆23Feb 26, 2025Updated last year
- [TPAMI2024] Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery☆15Mar 18, 2025Updated last year
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆22Jul 5, 2024Updated last year
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆167Mar 8, 2026Updated last month
- ☆21Aug 8, 2024Updated last year
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆11May 26, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- ☆24Jul 8, 2023Updated 2 years ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- CVPR25☆28Jul 2, 2025Updated 9 months ago
- Streaming Video Instruction Tuning☆71Feb 25, 2026Updated 2 months ago
- KTCN: Enhancing Open-World Object Detection with Knowledge Tansfer and Class-Awareness Neutralization (IJCAI 24)☆12Aug 13, 2024Updated last year
- DOFA-CLIP: Multimodal Vision–Language Foundation Models for Earth Observation☆39Jul 30, 2025Updated 9 months ago
- ☆14Sep 6, 2024Updated last year
- ☆23Aug 20, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆23Jan 24, 2024Updated 2 years ago
- An up-to-date & curated list of awesome layout to image papers, methods & resources.☆13Jun 28, 2024Updated last year
- [ECCV 2024 Workshop🎈] The first agriculture benchmark to evaluate MM-LLMs.☆25Jan 1, 2025Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆13Sep 30, 2023Updated 2 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆49Mar 2, 2026Updated last month
- An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designe…☆21Jun 25, 2025Updated 10 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆114Mar 21, 2025Updated last year
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆419Dec 20, 2025Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A collection of papers related to Geo-spatial Information Science in NeurIPS 2024.☆56Jan 5, 2025Updated last year
- Masked Angle-Aware Autoencoder for Remote Sensing Images (ECCV 2024)☆28Nov 12, 2024Updated last year
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆203Jun 18, 2025Updated 10 months ago
- ECCV24 "ReMamber: Referring Image Segmentation with Mamba Twister" official repository.☆45Jul 11, 2024Updated last year
- A curated list of few-shot segmentation / few shot semantic segmentation / few shot image segmentation in remote sensing imagery.☆29Jun 25, 2024Updated last year
- ☆11Oct 2, 2024Updated last year
- ☆29Apr 23, 2025Updated last year