redotvideo / mamba-chatLinks
Mamba-Chat: A chat LLM based on the state-space model architecture π
β932Updated last year
Alternatives and similar repositories for mamba-chat
Users that are interested in mamba-chat are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingβ912Updated 5 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β560Updated 9 months ago
- β866Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAIβ1,401Updated last year
- Fine-tune mistral-7B on 3090s, a100s, h100sβ717Updated last year
- Inference code for Persimmon-8Bβ414Updated 2 years ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ661Updated last year
- β416Updated last year
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,613Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reductionβ388Updated last year
- Code for Quiet-STaRβ740Updated last year
- Reference implementation of Megalodon 7B modelβ523Updated 4 months ago
- Code for fine-tuning Platypus fam LLMs using LoRAβ628Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β291Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Modelsβ1,605Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ720Updated last year
- The repository for the code of the UltraFastBERT paperβ519Updated last year
- Open weights language model from Google DeepMind, based on Griffin.β652Updated 4 months ago
- β570Updated last year
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projectionβ1,610Updated 11 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,282Updated 6 months ago
- A repository for research on medium sized language models.β509Updated 3 months ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorchβ652Updated 9 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformersβ425Updated last year
- β547Updated 9 months ago
- A bagel, with everything.β325Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333β1,130Updated last year
- Finetuning Large Language Models on One Consumer GPU in 2 Bitsβ731Updated last year
- Inference code for Mistral and Mixtral hacked up into original Llama implementationβ370Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β747Updated last year