geronimi73 / mamba
☆32Updated last year
Alternatives and similar repositories for mamba:
Users that are interested in mamba are comparing it to the libraries listed below
- A repository for research on medium sized language models.☆76Updated 8 months ago
- Latent Large Language Models☆17Updated 5 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 5 months ago
- GoldFinch and other hybrid transformer components☆43Updated 6 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆27Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated last month
- The Next Generation Multi-Modality Superintelligence☆70Updated 4 months ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆88Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 10 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 8 months ago
- This is the code that went into our practical dive using mamba as information extraction☆51Updated last year
- possibly useful materials for learning RWKV language model.☆24Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- ☆48Updated 2 months ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 3 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Training hybrid models for dummies.☆18Updated 2 weeks ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 8 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆34Updated 9 months ago
- ☆43Updated 3 months ago
- RWKV model implementation☆37Updated last year
- ☆62Updated 4 months ago
- This is the official repository for Inheritune.☆109Updated 3 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 2 months ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆21Updated 3 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 4 months ago
- A simple reproducible template to implement AI research papers☆22Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated last month