HuyNguyen-hust / flash-attn-101Links
☆21Updated 9 months ago
Alternatives and similar repositories for flash-attn-101
Users that are interested in flash-attn-101 are comparing it to the libraries listed below
Sorting:
- Pioneering in Vietnamese Multimodal Large Language Model☆47Updated 5 months ago
- ☆70Updated last year
- This is the official repository for Vista dataset - A Vietnamese multimodal dataset contains more than 700,000 samples of conversations a…☆26Updated last year
- Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation☆35Updated last year
- ☆216Updated 3 weeks ago
- PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)☆44Updated 3 weeks ago
- VNHSGE: Vietnamese High School Graduation Examination Dataset for Large Language Models☆27Updated last year
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆20Updated 3 months ago
- ☆16Updated last year
- This is an open-source repository for constructing and researching fusion-style deep learning methods combined with pretrained vision mod…☆14Updated 5 months ago
- VIT inference in triton because, why not?☆29Updated last year
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆40Updated 2 weeks ago
- ☆14Updated 2 years ago
- Machine Reading Comprehension special for the Vietnamese language☆40Updated 3 years ago
- Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model.☆22Updated 11 months ago
- Các thí nghiệm liên quan tới LLMs cho tiếng Việt (insprised by Physics of LLMs Series)☆10Updated 8 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆127Updated this week
- python scripts for crawling original image from Google Images☆22Updated 3 years ago
- ☆84Updated last month
- [ICLR 2024] Official implementation of Bellman Optimal Stepsize Straightening of Flow-Matching Models☆35Updated last year
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆88Updated 4 months ago
- Pre-training script for BART in JAX/Flax☆38Updated 2 years ago
- Work in progress.☆69Updated 3 weeks ago
- ☆18Updated 2 years ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆74Updated last week
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 2 months ago
- Bud500: A Comprehensive Vietnamese ASR Dataset☆66Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆69Updated last week