torphix / infini-attention
Pytorch implementation of https://arxiv.org/html/2404.07143v1
☆19Updated 10 months ago
Alternatives and similar repositories for infini-attention:
Users that are interested in infini-attention are comparing it to the libraries listed below
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆128Updated 8 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆73Updated 3 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆120Updated last month
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆140Updated 4 months ago
- ☆89Updated 2 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆86Updated last month
- ☆45Updated 8 months ago
- ☆73Updated 11 months ago
- Reformatted Alignment☆114Updated 4 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆241Updated 2 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆78Updated 3 months ago
- ☆17Updated last year
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆212Updated this week
- FuseAI Project☆83Updated 3 weeks ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆125Updated 6 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated 11 months ago
- Mixture-of-Experts (MoE) Language Model☆184Updated 5 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆114Updated 3 months ago
- ☆27Updated 5 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆81Updated 2 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 8 months ago
- ☆74Updated last month
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆260Updated 2 weeks ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆189Updated last month
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆47Updated last month