☆22Sep 3, 2024Updated last year
Alternatives and similar repositories for flash-attn-101
Users that are interested in flash-attn-101 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Jan 28, 2024Updated 2 years ago
- ☆13Dec 22, 2024Updated last year
- Enhancing Domain Adaptation through Prompt Gradient Alignment (NeurIPS 2024)☆16Jun 16, 2024Updated 2 years ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆96Aug 18, 2023Updated 2 years ago
- BFloat16 Fused Adam Operator for PyTorch☆19Nov 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Pioneering in Vietnamese Multimodal Large Language Model☆53Jan 23, 2025Updated last year
- Official implementation of "From Implicit to Explicit Feedback: A deep neural network for modeling sequential behaviours and long-short t…☆19Oct 16, 2025Updated 8 months ago
- ArXiv daily dump and viewer using GitHub Actions - luvata.github.io/arxive☆14Updated this week
- [ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…☆44May 1, 2026Updated last month
- ☆11Nov 8, 2023Updated 2 years ago
- ☆40Dec 14, 2025Updated 6 months ago
- ☆10Sep 28, 2025Updated 8 months ago
- Official Pytorch Implementation of the paper: Wavelet Diffusion Models are fast and scalable Image Generators (CVPR'23)☆440Jul 23, 2024Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆111Jun 28, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Mixture of A Million Experts☆56Jul 30, 2024Updated last year
- Companion code to the preprint: E Bıyık, K Wang, N Anari, D Sadigh, "Batch Active Learning using Determinantal Point Processes". arXiv pr…☆15Jul 25, 2024Updated last year
- NanoGPT (124M) quality in 2.67B tokens☆28Sep 17, 2025Updated 9 months ago
- [ICADL] Named entity recognition architecture combining contextual and global features☆13Dec 14, 2021Updated 4 years ago
- ☆82May 5, 2026Updated last month
- ☆10Jun 14, 2025Updated last year
- Fluent dreaming for language models☆13Jul 22, 2024Updated last year
- Stochastic Multiple Target Sampling Gradient Descent (NeurIPS 2022)☆13Sep 19, 2022Updated 3 years ago
- Code for XPERT algorithm from Personalized Retrieval over Millions of Items☆13Sep 14, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Flash Attention in 300-500 lines of CUDA/C++☆37Aug 22, 2025Updated 9 months ago
- code and resources for our paper "Achieving Joint Training Accuracy in Continual Learning" in AAAI2025☆14Feb 25, 2025Updated last year
- ☆16Mar 14, 2020Updated 6 years ago
- Official Implementations "Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models" (ICLR2024)☆59Dec 3, 2024Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆30May 14, 2024Updated 2 years ago
- Watch and book travel tours with Laravel and Vuejs☆12Jan 18, 2024Updated 2 years ago
- Transformers components but in Triton☆34May 9, 2025Updated last year
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆23May 7, 2026Updated last month
- ☆11Nov 21, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for NeurIPS 2021 paper "Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning".☆16Oct 18, 2021Updated 4 years ago
- ☆25Aug 28, 2024Updated last year
- PyTorch Implementation of Image Generation with a Sphere Encoder☆45May 20, 2026Updated 3 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- [NeurIPS'23] Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization☆19Aug 4, 2024Updated last year
- Spark Java_Examples for all modules including GraphX☆19Dec 8, 2017Updated 8 years ago
- PhoGPT: Generative Pre-training for Vietnamese (2023)☆792Nov 12, 2024Updated last year