geronimi73 / mambaLinks
☆31Updated last year
Alternatives and similar repositories for mamba
Users that are interested in mamba are comparing it to the libraries listed below
Sorting:
- This is the code that went into our practical dive using mamba as information extraction☆57Updated last year
- Collection of autoregressive model implementation☆86Updated 6 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated last year
- The Next Generation Multi-Modality Superintelligence☆69Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆140Updated 2 years ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated 2 years ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆60Updated last year
- ☆55Updated last year
- ☆86Updated last year
- Tune MPTs☆84Updated 2 years ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆54Updated last year
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆150Updated last year
- ☆78Updated last year
- GoldFinch and other hybrid transformer components☆45Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆225Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆35Updated last year
- ☆39Updated last year
- Set of scripts to finetune LLMs☆38Updated last year
- A repository for research on medium sized language models.☆78Updated last year
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆201Updated 2 years ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- Plug in and play implementation of " Textbooks Are All You Need", ready for training, inference, and dataset generation☆73Updated 2 years ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆57Updated 5 months ago
- ☆63Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- Evaluating LLMs with CommonGen-Lite☆91Updated last year