Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆40Jan 4, 2024Updated 2 years ago
Alternatives and similar repositories for FastLLM
Users that are interested in FastLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos.☆33Apr 18, 2026Updated last week
- Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.☆26Apr 24, 2025Updated last year
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆52Nov 20, 2024Updated last year
- ☆80Jun 5, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 模型压缩的小白入门教程☆22Jul 7, 2024Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆148Oct 10, 2025Updated 6 months ago
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year
- The AI paper weekly report automation project in collaboration between Agent Universe and Kimi.☆23Aug 3, 2024Updated last year
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆45Nov 30, 2023Updated 2 years ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- 飞桨模型加密库☆10Nov 13, 2021Updated 4 years ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Jul 9, 2024Updated last year
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆11May 24, 2024Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆74May 7, 2025Updated 11 months ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- A simple Python tool to measure the performance of ONNX models.☆27Sep 15, 2024Updated last year
- distill large scale web page text☆12Jul 29, 2023Updated 2 years ago
- DeepTrace: A lightweight, scalable real-time diagnostic and analysis tool for distributed training tasks.☆18Nov 4, 2025Updated 5 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆65Oct 25, 2024Updated last year
- Agently Stage - Efficient Convenient Asynchronous & Multithreaded Programming☆13Apr 2, 2025Updated last year
- ☆30Aug 21, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- 无人机编队重构☆12Jul 28, 2018Updated 7 years ago
- Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources☆12Apr 12, 2018Updated 8 years ago
- Android runtime permissions manager☆13Jul 8, 2019Updated 6 years ago
- papers of llm compression☆13Mar 6, 2024Updated 2 years ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 9 months ago
- DSTC9 Submission☆16Apr 12, 2021Updated 5 years ago
- ☆16Mar 1, 2025Updated last year
- Using self-play to augment multi-turn text-to-SQL datasets☆11Oct 20, 2022Updated 3 years ago
- Code and data for the paper "Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems".☆14Aug 16, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- [Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter☆12Sep 4, 2023Updated 2 years ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆54Apr 22, 2026Updated last week
- A list of advice on doing research that is useful for me :)☆13Aug 17, 2019Updated 6 years ago
- ICLR 2025☆31May 21, 2025Updated 11 months ago
- Simple Conversational Data Augmentation for Semi-supervised Abstractive Conversation Summarization☆10Mar 7, 2022Updated 4 years ago
- ☆74Apr 2, 2024Updated 2 years ago