ddidacus / llama-titans
Adaptation of titans-pytorch to llama models on HF
☆15Updated 2 months ago
Alternatives and similar repositories for llama-titans
Users that are interested in llama-titans are comparing it to the libraries listed below
Sorting:
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆50Updated 5 months ago
- Parallelizing non-linear sequential models over the sequence length☆51Updated 3 months ago
- ☆25Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆106Updated 7 months ago
- Official repo of paper LM2☆39Updated 2 months ago
- ☆53Updated 7 months ago
- ☆23Updated 7 months ago
- Unofficial implementation of Linear Recurrent Units, by Deepmind, in Pytorch☆69Updated 2 weeks ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- Official code for the paper "Attention as a Hypernetwork"☆33Updated 10 months ago
- ☆31Updated 6 months ago
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆25Updated 2 years ago
- Mixture of A Million Experts☆44Updated 9 months ago
- ☆78Updated 8 months ago
- Here we will test various linear attention designs.☆60Updated last year
- ☆36Updated last month
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆27Updated last month
- Stick-breaking attention☆53Updated last month
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated 2 months ago
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆76Updated last year
- Python implementation of the methods in Meulemans et al. 2020 - A Theoretical Framework For Target Propagation☆32Updated 6 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆161Updated last month
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆41Updated 3 months ago
- ☆52Updated 11 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆86Updated last year
- Code for ICML 2024 paper☆23Updated this week
- Implementations of various linear RNN layers using pytorch and triton☆50Updated last year
- ☆40Updated 3 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆66Updated 9 months ago