To pioneer training long-context multi-modal transformer models
☆72Aug 8, 2025Updated 6 months ago
Alternatives and similar repositories for TeleTron
Users that are interested in TeleTron are comparing it to the libraries listed below
Sorting:
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆55Nov 11, 2025Updated 3 months ago
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆30Apr 22, 2025Updated 10 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Code for the paper "Interpreting and Improving Diffusion Models from an Optimization Perspective", appearing in ICML 2024☆14Sep 30, 2024Updated last year
- The official implementation of "Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers" (arXiv …☆50Jun 6, 2025Updated 8 months ago
- Ongoing research training transformer models at scale☆18Updated this week
- Unofficial implementation of Face0 with SDXL☆12Sep 1, 2023Updated 2 years ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Feb 9, 2026Updated 2 weeks ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- ☆31Updated this week
- ☆14May 17, 2022Updated 3 years ago
- [NeurIPS' 24] The PyTorch implementation of our paper: "Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learnin…☆21Oct 10, 2024Updated last year
- A PyTorch implementation of EMANet based on ICCV 2019 paper "Expectation-Maximization Attention Networks for Semantic Segmentation"☆18Feb 21, 2020Updated 6 years ago
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆31Jan 13, 2026Updated last month
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- Model code for inferencing T5☆66Mar 10, 2025Updated 11 months ago
- Code for full fintuing Mochi model with FSDP (and CP)☆30Jul 15, 2025Updated 7 months ago
- ☆65Apr 26, 2025Updated 10 months ago
- High performance inference engine for diffusion models☆105Sep 5, 2025Updated 5 months ago
- ☆28Nov 30, 2022Updated 3 years ago
- Dynamic resources changes for multi-dimensional parallelism training☆30Aug 22, 2025Updated 6 months ago
- [NeurIPS 2025] Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation☆34Oct 24, 2025Updated 4 months ago
- Explore how to get a VQ-VAE models efficiently!☆68Jul 24, 2025Updated 7 months ago
- The repository for paper Unsupervised Volumetric Animation☆69Sep 22, 2023Updated 2 years ago
- Storage Performance Development Kit☆11Updated this week
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- ☆23Feb 4, 2026Updated 3 weeks ago
- Code and dataset for "Detecting Human Artifacts from Text-to-Image Models"☆47Dec 26, 2024Updated last year
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated 3 weeks ago
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…☆58Oct 27, 2025Updated 4 months ago
- A collection of existing public 3D Cloth Data☆35Jul 5, 2022Updated 3 years ago
- implementation of AnimateDiff.☆32Jul 14, 2023Updated 2 years ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆674Oct 25, 2024Updated last year
- ☆81Mar 2, 2025Updated 11 months ago
- ☆131Jun 24, 2025Updated 8 months ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆66Dec 11, 2025Updated 2 months ago
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation☆45Jul 1, 2025Updated 8 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Feb 12, 2025Updated last year
- The ASPLOS 2025 / EuroSys 2025 Contest Track☆40Aug 7, 2025Updated 6 months ago