OpenDocCN / python-code-anlsLinks
☆42Updated 9 months ago
Alternatives and similar repositories for python-code-anls
Users that are interested in python-code-anls are comparing it to the libraries listed below
Sorting:
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆124Updated last year
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆123Updated 2 months ago
- 这是一个DiT-pytorch的代码,主要用于学习DiT结构。☆81Updated last year
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆106Updated last year
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆182Updated 2 years ago
- 多模态 MM +Chat 合集☆278Updated 3 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆169Updated last month
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆250Updated last year
- ☆76Updated 6 months ago
- [COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆283Updated 2 months ago
- DeepSpeed Tutorial☆102Updated last year
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆120Updated last year
- [ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-paramete…☆110Updated last year
- Margin-based Vision Transformer☆55Updated last month
- [COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation☆161Updated 4 months ago
- 和李沐一起读论文☆212Updated 5 months ago
- Building a VLM model starts from the basic module.☆18Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆265Updated 3 weeks ago
- The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"☆224Updated last month
- The Next Step Forward in Multimodal LLM Alignment☆186Updated 6 months ago
- leetcode-hot100的题目,和 Interview-code-practice-python互为一体,找工作的好帮手。☆28Updated last year
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆143Updated 6 months ago
- Efficient Multimodal Large Language Models: A Survey☆375Updated 6 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆107Updated last month
- New generation of CLIP with fine grained discrimination capability, ICML2025☆472Updated 3 weeks ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆141Updated 9 months ago
- Qwen2.5 0.5B GRPO☆71Updated 9 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆208Updated 7 months ago
- Collection of image and video datasets for generative AI and multimodal visual AI☆31Updated last year