OpenSparseLLMs / LLaMA-MoE-v2
🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆63Updated last month
Alternatives and similar repositories for LLaMA-MoE-v2:
Users that are interested in LLaMA-MoE-v2 are comparing it to the libraries listed below
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆45Updated 2 weeks ago
- The official code repository for PRMBench.☆45Updated this week
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆86Updated 2 months ago
- Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)☆19Updated 2 months ago
- ☆56Updated 7 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆54Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoning☆58Updated last month
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆33Updated 9 months ago
- ✈️ Accelerating Vision Diffusion Transformers with Skip Branches.☆58Updated 3 weeks ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆35Updated 3 months ago
- ☆120Updated 5 months ago
- ☆91Updated 6 months ago
- A Survey on the Honesty of Large Language Models☆51Updated last month
- Open-Pandora: On-the-fly Control Video Generation☆31Updated last month
- SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆31Updated last month
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆28Updated 5 months ago
- DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆17Updated last month
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆45Updated last month
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆39Updated 2 months ago
- Code release for VTW (AAAI 2025)☆27Updated last month
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆38Updated last month
- [ATTRIB @ NeurIPS 2024 Oral] When Attention Sink Emerges in Language Models: An Empirical View☆37Updated 2 months ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆92Updated 2 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆30Updated 6 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆23Updated 3 months ago
- my commonly-used tools☆47Updated this week
- Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)☆31Updated 6 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆34Updated 9 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆40Updated 2 months ago
- ☆92Updated last year