CR400AF-A / SparseMMLinks
☆51Updated last week
Alternatives and similar repositories for SparseMM
Users that are interested in SparseMM are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Updated 10 months ago
- RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response☆41Updated 6 months ago
- [Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey☆137Updated this week
- [AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation☆3Updated 5 months ago
- SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation☆55Updated 2 months ago
- Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".☆42Updated last week
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆93Updated last year
- Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.☆70Updated 3 weeks ago
- ☆67Updated 3 months ago
- [Neurips 2023] dynpoint: dynamic neural point for view synthesis☆52Updated last year
- CoS: Chain-of-Shot Prompting for Long Video Understanding☆48Updated 4 months ago
- ☆54Updated last month
- Your efficient and accurate answer verification system for RL training.☆30Updated 2 weeks ago
- 🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'2…☆84Updated 2 months ago
- Panorama Generation as a Next-Token Prediction Task.☆20Updated 2 months ago
- [ICLR 2025] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities☆144Updated 4 months ago
- A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.☆34Updated last year
- Rethinking Video-Text Understanding Retrieval from Counterfactually Augmented Data☆39Updated 11 months ago
- ☆80Updated 7 months ago
- [ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"☆152Updated last month
- official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training☆38Updated 6 months ago
- Official Implementation for "Mask-based modeling for Neural Radiance Fields" (ICLR 2024)☆37Updated last year
- Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models☆154Updated 6 months ago
- Official code base for paper EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidan…☆104Updated last month
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆69Updated this week
- TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding [ACM MM'21]☆23Updated 3 years ago
- The official generation code and toolkits of VDW dataset (ICCV 2023)☆35Updated 11 months ago
- Hybrid Latent Reasoning via Reinforcement Learning☆120Updated 3 weeks ago
- [ECCV'24] ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation☆39Updated 4 months ago
- (NeurIPS 2024) Official PyTorch implementation of LOVA3☆89Updated 3 months ago