[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
Alternatives and similar repositories for FlexAttention
Users that are interested in FlexAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆45Apr 18, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Dynamic, high-resolution poverty measurement in data-scarce environments☆11Dec 8, 2024Updated last year
- Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models☆19Jun 18, 2025Updated 11 months ago
- [RSE25] Official implementation of the paper mKGR.☆22May 17, 2026Updated 3 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- [TPAMI2024] Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery☆15Mar 18, 2025Updated last year
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆22Feb 26, 2025Updated last year
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆21Jul 5, 2024Updated last year
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- ☆22Aug 8, 2024Updated last year
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆12May 26, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆171Mar 8, 2026Updated 3 months ago
- ☆18Jul 16, 2019Updated 6 years ago
- ☆24Jul 8, 2023Updated 2 years ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- CVPR25☆28Jul 2, 2025Updated 11 months ago
- Streaming Video Instruction Tuning☆75Feb 25, 2026Updated 3 months ago
- ☆33Apr 14, 2026Updated last month
- DOFA-CLIP: Multimodal Vision–Language Foundation Models for Earth Observation☆41Jul 30, 2025Updated 10 months ago
- ☆23Aug 20, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An up-to-date & curated list of awesome layout to image papers, methods & resources.☆13Jun 28, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆13Sep 30, 2023Updated 2 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆49Mar 2, 2026Updated 3 months ago
- An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designe…☆22Jun 25, 2025Updated 11 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆115Mar 21, 2025Updated last year
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆423Dec 20, 2025Updated 5 months ago
- A collection of papers related to Geo-spatial Information Science in NeurIPS 2024.☆56Jan 5, 2025Updated last year
- Masked Angle-Aware Autoencoder for Remote Sensing Images (ECCV 2024)☆28Nov 12, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆205Jun 18, 2025Updated 11 months ago
- ☆69Mar 22, 2026Updated 2 months ago
- ☆11Oct 2, 2024Updated last year
- ☆29Apr 23, 2025Updated last year
- The PyTorch implementation of AlignSeg.☆21Feb 26, 2025Updated last year
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)☆22Mar 29, 2026Updated 2 months ago
- [CVPR 2025] Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation☆36Jun 27, 2025Updated 11 months ago