geronimi73 / mambaLinks
☆31Updated last year
Alternatives and similar repositories for mamba
Users that are interested in mamba are comparing it to the libraries listed below
Sorting:
- This is the code that went into our practical dive using mamba as information extraction☆57Updated last year
- Implementation of the Mamba SSM with hf_integration.☆56Updated last year
- Collection of autoregressive model implementation☆85Updated 7 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆61Updated last year
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆126Updated last month
- ☆86Updated last year
- The Next Generation Multi-Modality Superintelligence☆70Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆170Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- Token Omission Via Attention☆128Updated last year
- Code repository for Black Mamba☆260Updated last year
- ☆78Updated last year
- ☆55Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 8 months ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆23Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- GoldFinch and other hybrid transformer components☆45Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆148Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Updated 2 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated 2 years ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆141Updated 2 years ago
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆150Updated last year
- This is the official repository for Inheritune.☆116Updated 10 months ago
- ☆63Updated last year
- Data preparation code for Amber 7B LLM☆93Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆168Updated 10 months ago
- ☆39Updated last year
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆179Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year