CS-BAOYAN / CSInternship2025Links
☆28Updated this week
Alternatives and similar repositories for CSInternship2025
Users that are interested in CSInternship2025 are comparing it to the libraries listed below
Sorting:
- A tiny paper rating web☆38Updated 3 months ago
- Personal Transformer models training library☆22Updated this week
- 在没有sudo权限的情况下,在linux上使用clash☆114Updated 7 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆246Updated this week
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 4 months ago
- 对llava官方代码的一些学习笔记☆26Updated 8 months ago
- ☆44Updated 2 weeks ago
- BoardCaster 是 CSBAOYAN 相关的数据库,使用JSON文件管理格式化的保研相 关信息,并通过 Issue 进行更新以简化参与开源流程的难度。☆19Updated this week
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆69Updated this week
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆75Updated this week
- ☆86Updated 3 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆402Updated this week
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆96Updated 7 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆111Updated 3 weeks ago
- A Collection of Papers on Diffusion Language Models☆82Updated last week
- ☆101Updated this week
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆120Updated 2 weeks ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆51Updated last week
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆36Updated last week
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 2 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆123Updated 3 weeks ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆33Updated this week
- R1-like Video-LLM for Temporal Grounding☆101Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆66Updated 11 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆64Updated 3 months ago
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆89Updated last week
- Latest Advances on Modality Priors in Multimodal Large Language Models☆20Updated last month
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆123Updated 7 months ago
- Code and data of We-Math☆129Updated 2 months ago
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".☆99Updated 7 months ago