Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆40Jan 4, 2024Updated 2 years ago
Alternatives and similar repositories for FastLLM
Users that are interested in FastLLM are comparing it to the libraries listed below
Sorting:
- MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos.☆30Jul 9, 2025Updated 8 months ago
- Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.☆29Mar 11, 2025Updated last year
- ☆11May 9, 2022Updated 3 years ago
- Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.☆26Apr 24, 2025Updated 10 months ago
- [ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging☆39Jun 4, 2025Updated 9 months ago
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆52Nov 20, 2024Updated last year
- ☆29Sep 17, 2024Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …☆49Sep 18, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- 模型压缩的小白入门教程☆22Jul 7, 2024Updated last year
- 基于PaddlePaddle以及wechaty框架 建立的宇宙漫游指南机器人☆17Aug 3, 2021Updated 4 years ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆150Oct 10, 2025Updated 5 months ago
- Towards Systematic Measurement for Long Text Quality☆37Sep 5, 2024Updated last year
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆45Nov 30, 2023Updated 2 years ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Jul 9, 2024Updated last year
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆69May 7, 2025Updated 10 months ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Code for the EMNLP paper "Improving Detection and Categorization of Task-relevant Utterances through Integration of Discourse Structure a…☆12Nov 23, 2022Updated 3 years ago
- 一个桌面宠物程序,现在似乎发展成为桌面便签了。桌面便签程序见develop-todolist分支。☆11Nov 17, 2024Updated last year
- distill large scale web page text☆12Jul 29, 2023Updated 2 years ago
- DeepTrace: A lightweight, scalable real-time diagnostic and analysis tool for distributed training tasks.☆18Nov 4, 2025Updated 4 months ago
- ☆12Mar 20, 2020Updated 5 years ago
- Agently Stage - Efficient Convenient Asynchronous & Multithreaded Programming☆13Apr 2, 2025Updated 11 months ago
- Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources☆12Apr 12, 2018Updated 7 years ago
- https://subversion.assembla.com/svn/buddy-profiles.honorbuddy/trunk/☆11Jan 17, 2015Updated 11 years ago
- Python scripts to download and filters daily flight dumps from ADS-B Exchange☆11Aug 25, 2017Updated 8 years ago
- A demo project demonstrating the performance improvement by cpp extension, which wrapped with pybind11.☆10Nov 16, 2021Updated 4 years ago
- ☆11Nov 27, 2022Updated 3 years ago
- Code and data for COLING 2022 paper titled "Structural Bias For Aspect Sentiment Triplet Extraction"☆26May 28, 2023Updated 2 years ago
- pymur is a Python interface to The Lemur Toolkit.☆19Sep 17, 2018Updated 7 years ago
- Sequences from Adaptyv Bio’s EGFR Protein Design Competition☆15Aug 28, 2025Updated 6 months ago
- Using self-play to augment multi-turn text-to-SQL datasets☆11Oct 20, 2022Updated 3 years ago
- Learning aircraft operational factors to improve aircraft climb prediction: A large scale multi-airport study☆11Apr 7, 2020Updated 5 years ago
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- Code and data for the paper "Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems".☆14Aug 16, 2022Updated 3 years ago
- ☆13Mar 7, 2024Updated 2 years ago