saxenarohit / MovieSumLinks
☆15Updated last year
Alternatives and similar repositories for MovieSum
Users that are interested in MovieSum are comparing it to the libraries listed below
Sorting:
- ☆29Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆127Updated 11 months ago
- ☆50Updated 4 months ago
- The open-source code of MetaStone-S1.☆107Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆52Updated 10 months ago
- ☆97Updated 2 months ago
- helper functions for processing and integrating visual language information with Qwen-VL Series Model☆15Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆91Updated last year
- Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629☆16Updated last week
- Our 2nd-gen LMM☆34Updated last year
- ☆74Updated last year
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated last week
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆33Updated last month
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆115Updated 5 months ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆45Updated 8 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆70Updated last month
- ☆35Updated last year
- DELT: Data Efficacy for Language Model Training☆40Updated last month
- 😊 TPTT: Transforming Pretrained Transformers into Titans☆29Updated last week
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024☆64Updated last week
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆167Updated last year
- ☆28Updated 2 weeks ago
- [ICCV2025] WikiAutoGen offical page☆19Updated 3 months ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆56Updated 2 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Updated last year
- YesBut - Multimodal Satire Comprehension Dataset☆18Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Updated last year