lamm-mit / Cephalo-Phi-3-Vision-MoELinks
☆12Updated 11 months ago
Alternatives and similar repositories for Cephalo-Phi-3-Vision-MoE
Users that are interested in Cephalo-Phi-3-Vision-MoE are comparing it to the libraries listed below
Sorting:
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆44Updated 9 months ago
- A repository for research on medium sized language models.☆76Updated last year
- DPO, but faster 🚀☆42Updated 6 months ago
- ☆17Updated last year
- My fork os allen AI's OLMo for educational purposes.☆30Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆156Updated last month
- ☆33Updated last month
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆69Updated 2 weeks ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago
- Linear Attention Sequence Parallelism (LASP)☆83Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆93Updated 5 months ago
- ☆50Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- ☆34Updated last month
- ☆20Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆48Updated 10 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 10 months ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆27Updated 3 months ago
- GoldFinch and other hybrid transformer components☆45Updated 10 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 11 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆44Updated 2 months ago
- 👷 Build compute kernels☆44Updated this week
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- working implimention of deepseek MLA☆41Updated 4 months ago
- ☆24Updated 8 months ago