togethercomputer / stripedhyena
Repository for StripedHyena, a state-of-the-art beyond Transformer architecture
☆299Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for stripedhyena
- Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena☆602Updated 5 months ago
- Bi-Directional Equivariant Long-Range DNA Sequence Modeling☆160Updated last month
- ☆176Updated this week
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders☆104Updated this week
- (Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307…☆51Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆281Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆163Updated 6 months ago
- Simplified Masked Diffusion Language Model☆208Updated 2 weeks ago
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆52Updated last year
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆538Updated 6 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆157Updated 11 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆478Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆183Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆214Updated this week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆259Updated 2 weeks ago
- Multipack distributed sampler for fast padding-free training of LLMs☆178Updated 3 months ago
- A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much muc…☆147Updated 2 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆180Updated 5 months ago
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- 🧬 Generative modeling of regulatory DNA sequences with diffusion probabilistic models 💨☆366Updated this week
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆371Updated 4 months ago
- Orthrus is a mature RNA model for RNA property prediction. It uses a mamba encoder backbone, a variant of state-space models specifical…☆39Updated 3 weeks ago
- Annotated version of the Mamba paper☆457Updated 8 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆71Updated last month
- Code repository for Black Mamba☆231Updated 9 months ago
- PyTorch implementation of models from the Zamba2 series.☆159Updated this week
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆226Updated 2 months ago