hhnqqq / MyTransformersLinks
Personal Transformer models training library
β21Updated last week
Alternatives and similar repositories for MyTransformers
Users that are interested in MyTransformers are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024 Findingsπ₯] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inβ¦β97Updated 6 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuningβ69Updated 3 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.β57Updated 4 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Contβ¦β40Updated 6 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β72Updated this week
- Code release for VTW (AAAI 2025) Oralβ43Updated 4 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concenβ¦β66Updated last week
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β112Updated 2 weeks ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.β141Updated 3 months ago
- [arXiv 2025] Efficient Reasoning Models: A Surveyβ166Updated last week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ105Updated 2 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inferenceβ39Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ97Updated 3 months ago
- βοΈ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraintsβ67Updated 2 months ago
- π LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Trainingβ86Updated 6 months ago
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understandingβ84Updated 2 months ago
- β83Updated last month
- qwen-nsaβ66Updated last month
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.β51Updated this week
- π Collection of token-level model compression resources.β98Updated this week
- β46Updated last month
- The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β40Updated last week
- β131Updated 2 weeks ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β76Updated 5 months ago
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMsβ30Updated this week
- β84Updated 2 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encodingβ47Updated 5 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Mergingβ59Updated 3 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ130Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β73Updated 5 months ago