torphix / infini-attentionLinks
Pytorch implementation of https://arxiv.org/html/2404.07143v1
☆21Updated last year
Alternatives and similar repositories for infini-attention
Users that are interested in infini-attention are comparing it to the libraries listed below
Sorting:
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆118Updated 7 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆145Updated 2 weeks ago
- ☆75Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆128Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆138Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated 11 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆98Updated last year
- Open-Pandora: On-the-fly Control Video Generation☆35Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated 2 years ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆95Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆147Updated last year
- ☆35Updated 11 months ago
- ☆29Updated last year
- Our 2nd-gen LMM☆34Updated last year
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆62Updated 5 months ago
- ☆84Updated 8 months ago
- ☆187Updated 10 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆212Updated 11 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆171Updated last year
- a family of highly capabale yet efficient large multimodal models☆191Updated last year
- VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.☆237Updated 7 months ago
- Geometric-Mean Policy Optimization☆94Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated last year
- Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.☆249Updated 2 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆68Updated 2 months ago
- ☆36Updated last year
- ☆176Updated last week
- ☆95Updated last year
- ☆99Updated 4 months ago