[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
Alternatives and similar repositories for FlexAttention
Users that are interested in FlexAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆56Apr 18, 2025Updated last year
- ☆11Dec 20, 2024Updated last year
- Dynamic, high-resolution poverty measurement in data-scarce environments☆11Dec 8, 2024Updated last year
- Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models☆20Jun 18, 2025Updated last year
- [RSE25] Official implementation of the paper mKGR.☆22May 17, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- [TPAMI2024] Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery☆15Mar 18, 2025Updated last year
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆22Feb 26, 2025Updated last year
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆21Jul 5, 2024Updated last year
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- ☆22Aug 8, 2024Updated last year
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆12May 26, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆163Dec 26, 2024Updated last year
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆172Mar 8, 2026Updated 3 months ago
- ☆18Jul 16, 2019Updated 6 years ago
- ☆24Jul 8, 2023Updated 2 years ago
- Streaming Video Instruction Tuning☆76Feb 25, 2026Updated 4 months ago
- ☆33Apr 14, 2026Updated 2 months ago
- DOFA-CLIP: Multimodal Vision–Language Foundation Models for Earth Observation☆42Jul 30, 2025Updated 11 months ago
- ☆23Jan 24, 2024Updated 2 years ago
- An up-to-date & curated list of awesome layout to image papers, methods & resources.☆13Jun 28, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ECCV 2024 Workshop🎈] The first agriculture benchmark to evaluate MM-LLMs.☆26Jan 1, 2025Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆13Sep 30, 2023Updated 2 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆48Mar 2, 2026Updated 3 months ago
- An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designe…☆24Jun 25, 2025Updated last year
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆115Mar 21, 2025Updated last year
- [ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.☆15Mar 12, 2024Updated 2 years ago
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆425Dec 20, 2025Updated 6 months ago
- A collection of papers related to Geo-spatial Information Science in NeurIPS 2024.☆56Jan 5, 2025Updated last year
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆205Jun 18, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆69Jun 11, 2026Updated 2 weeks ago
- ECCV24 "ReMamber: Referring Image Segmentation with Mamba Twister" official repository.☆46Jul 11, 2024Updated last year
- A curated list of few-shot segmentation / few shot semantic segmentation / few shot image segmentation in remote sensing imagery.☆30Jun 25, 2024Updated 2 years ago
- ☆11Oct 2, 2024Updated last year
- The PyTorch implementation of AlignSeg.☆21Feb 26, 2025Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Sep 12, 2024Updated last year
- [CVPR 2025] Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation☆36Jun 27, 2025Updated last year