hhnqqq / MyTransformers
Personal Transformer models training library
☆16Updated this week
Alternatives and similar repositories for MyTransformers:
Users that are interested in MyTransformers are comparing it to the libraries listed below
- a brief repo about paper research☆14Updated 6 months ago
- ☆50Updated last week
- A tiny paper rating web☆36Updated last week
- ☆92Updated this week
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆296Updated 3 weeks ago
- Official repository for VisionZip (CVPR 2025)☆259Updated last month
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆403Updated 2 months ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first work to systematically explore R1 for video]☆205Updated this week
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆131Updated last month
- ☆75Updated 7 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆153Updated 2 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆44Updated 2 months ago
- ☆74Updated last week
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆522Updated this week
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆83Updated 3 weeks ago
- ☆107Updated last month
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆177Updated 3 months ago
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆404Updated last week
- A collection of vision foundation models unifying understanding and generation.☆47Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆54Updated last week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆81Updated 3 weeks ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆92Updated 4 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆59Updated 3 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆67Updated 2 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆121Updated 10 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆140Updated 3 weeks ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆169Updated 6 months ago
- 📚 Collection of awesome generation acceleration resources.☆182Updated this week
- This is a repo to track the latest autoregressive visual generation papers.☆178Updated this week
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆41Updated 6 months ago