redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture π
β897Updated 6 months ago
Related projects: β
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"β778Updated last month
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Modelsβ1,353Updated 6 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAIβ1,309Updated 5 months ago
- ReFT: Representation Finetuning for Language Modelsβ1,076Updated 2 weeks ago
- Minimalistic large language model 3D-parallelism trainingβ1,116Updated this week
- β1,167Updated last week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projectionβ1,354Updated last week
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,308Updated 5 months ago
- β856Updated 9 months ago
- Reference implementation of Megalodon 7B modelβ503Updated 5 months ago
- The repository for the code of the UltraFastBERT paperβ508Updated 5 months ago
- Code for Quiet-STaRβ478Updated last month
- LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processinβ¦β667Updated this week
- The official implementation of Self-Play Fine-Tuning (SPIN)β958Updated 4 months ago
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β623Updated last month
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β530Updated 4 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β659Updated last week
- Fine-tune mistral-7B on 3090s, a100s, h100sβ701Updated 11 months ago
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Modelsβ786Updated 5 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ657Updated 5 months ago
- Open weights language model from Google DeepMind, based on Griffin.β595Updated 2 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,702Updated 8 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,099Updated 7 months ago
- β473Updated this week
- Convolutions for Sequence Modelingβ861Updated 3 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ595Updated 3 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β1,408Updated this week
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333β1,022Updated 8 months ago
- A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance Multi-Modal Model. Powered by Zeta, the simplestβ¦β430Updated last week