Stability-AI / flash-attentionLinks
Fast and memory-efficient exact attention
☆11Updated 2 years ago
Alternatives and similar repositories for flash-attention
Users that are interested in flash-attention are comparing it to the libraries listed below
Sorting:
- Consistency models trained on CIFAR-10, in JAX.☆154Updated 2 years ago
- O-GIA is an umbrella for research, infrastructure and projects ecosystem that should provide open source, reproducible datasets, models, …☆89Updated 2 years ago
- 🏥 Health monitor for a Petals swarm☆39Updated last year
- Can RL solve simple problems?☆54Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Updated last month
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated last year
- Show Azure Openai demos for partners☆23Updated 2 years ago
- GoldFinch and other hybrid transformer components☆12Updated last month
- ☆13Updated 2 years ago
- vLLM adapter for a TGIS-compatible gRPC server.☆44Updated this week
- Implementation of the Llama architecture with RLHF + Q-learning☆168Updated 9 months ago
- ☆17Updated last year
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆188Updated 3 years ago
- A synthetic story narration dataset to study small audio LMs.☆31Updated last year
- Code base for internal reward models and PPO training☆24Updated 2 years ago
- ☆18Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 4 months ago
- Collection of autoregressive model implementation☆86Updated 7 months ago
- Simple large-scale training of stable diffusion with multi-node support.☆133Updated 2 years ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 weeks ago
- JAX implementation of the Llama 2 model☆216Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆69Updated last year
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated last year
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆91Updated 2 years ago
- ☆138Updated last year
- ☆277Updated this week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 8 months ago