from MHA, MQA, GQA to MLA by 苏剑林, with code
☆47Feb 19, 2025Updated last year
Alternatives and similar repositories for MLA_tutorial
Users that are interested in MLA_tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Aug 20, 2025Updated 7 months ago
- A Python wrapper for the ROUGE summarization evaluation package☆14Aug 9, 2017Updated 8 years ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆30Mar 25, 2026Updated 2 weeks ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆15Oct 21, 2025Updated 5 months ago
- Implement Flash Attention using Cute.☆105Dec 17, 2024Updated last year
- DsNet: A Novel Hybrid Architecture of Convolution and Transformer for Real-time Weld Seam Image Segmentation☆13Sep 1, 2024Updated last year
- NetLogo models developed in the book "Agent-Based Evolutionary Game Dynamics"☆10Feb 19, 2026Updated last month
- Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Ranking☆25Apr 4, 2025Updated last year
- ☆31Aug 25, 2023Updated 2 years ago
- Implement custom operators in PyTorch with cuda/c++☆77Jan 1, 2023Updated 3 years ago
- The official code for NAACL 2024 paper: $E^5$: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, …☆15Jun 23, 2024Updated last year
- 中华药典RAG项目☆10Oct 26, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆10Jan 4, 2017Updated 9 years ago
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆20Nov 15, 2025Updated 4 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 8 months ago
- ☆56Jan 5, 2026Updated 3 months ago
- Modeling methods of System Dynamics – Supply Chain Simulation using the Anylogic software☆10Jan 8, 2026Updated 3 months ago
- ASR project with pytorch-lightning☆20Mar 21, 2025Updated last year
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆25Jan 4, 2026Updated 3 months ago
- Implementation of Stochastic Gaussian Process Motion Planning algorithm, IROS 2022.☆20Oct 30, 2023Updated 2 years ago
- Topic taxonomy completion with hierarchical discovery of novel topic clusters☆24Mar 7, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- CenterPoint model trained with MMDetection3d on custom dataset, and deployed with TensorRT☆35Mar 15, 2023Updated 3 years ago
- Cross Visual Prompt Tuning [ICCV 2025]☆13Aug 3, 2025Updated 8 months ago
- ☆36Aug 25, 2023Updated 2 years ago
- ☆99Feb 11, 2026Updated 2 months ago
- 跟着Tensorrt_pro学习各种知识☆39Nov 25, 2022Updated 3 years ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!☆11Oct 16, 2024Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- A tool to visualize convolutional layer activations on an input image.☆17Oct 23, 2019Updated 6 years ago
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆14Dec 16, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models☆16Jun 18, 2025Updated 9 months ago
- Official repository for the paper "Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules" (…☆23Jun 11, 2025Updated 10 months ago
- ☆16May 11, 2017Updated 8 years ago
- ☆13May 12, 2025Updated 10 months ago
- DSR☆15Apr 25, 2018Updated 7 years ago
- ☆43Apr 9, 2024Updated 2 years ago
- ☆19Mar 19, 2026Updated 3 weeks ago