lamm-mit / Cephalo-Phi-3-Vision-MoELinks

☆14

Alternatives and similar repositories for Cephalo-Phi-3-Vision-MoE

Users that are interested in Cephalo-Phi-3-Vision-MoE are comparing it to the libraries listed below

Sorting:

fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 9 months ago
ariG23498 / mmdp
☆29Updated 3 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆45Updated 10 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆82Updated 2 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆133Updated 4 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
mistralai / mistral-evals
☆77Updated last month
ByungKwanLee / TroL
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆98Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated this week
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆51Updated last month
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆161Updated 6 months ago
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆191Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆90Updated 4 months ago
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 9 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
UbiquitousLearning / SLM_Survey
☆97Updated last year
FasterDecoding / BitDelta
☆201Updated 10 months ago
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆62Updated 3 weeks ago
minyoungg / LTE
☆69Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆120Updated 8 months ago
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 10 months ago
RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆53Updated 6 months ago
FudanNLPLAB / MouSi
☆74Updated last year
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆137Updated last year
alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-Distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
☆49Updated last year
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆148Updated this week
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆249Updated last week
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 8 months ago
devvrit / matformer
MatFormer repo
☆62Updated 10 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year