[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
β27Apr 8, 2025Updated 11 months ago
Alternatives and similar repositories for Artemis
Users that are interested in Artemis are comparing it to the libraries listed below
Sorting:
- γCOLING 2025π₯γCode for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".β38Dec 5, 2024Updated last year
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".β36Jul 10, 2025Updated 8 months ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspectiveβ45Jan 18, 2025Updated last year
- β59Mar 16, 2025Updated 11 months ago
- γNature Computational Science 2025π₯γDeep peak property learning for efficient chiral molecules ECD spectra predictionβ51Jan 12, 2025Updated last year
- official repo for `thinking with images through-self-calling`β21Dec 28, 2025Updated 2 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Showsβ19Nov 4, 2025Updated 4 months ago
- [AAAI26] Next Patch Predictionβ132Jan 2, 2025Updated last year
- β13Mar 28, 2025Updated 11 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ80Oct 25, 2024Updated last year
- β31Sep 24, 2024Updated last year
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistryβ45Oct 9, 2025Updated 5 months ago
- [ACM MM 2025] HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generationβ157Sep 4, 2025Updated 6 months ago
- β16Mar 25, 2024Updated last year
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Inputβ67Aug 30, 2024Updated last year
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ185Dec 19, 2025Updated 2 months ago
- (EMNLP 2025 Main) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narrativesβ37Dec 20, 2025Updated 2 months ago
- Envision3D: One Image to 3D with Anchor Views Interpolationβ114May 16, 2024Updated last year
- An official implementation of SwapAnyone.β74Mar 14, 2025Updated 11 months ago
- β44Oct 20, 2025Updated 4 months ago
- β18Oct 28, 2025Updated 4 months ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- This dataset contains about 110k images annotated with the depth and occlusion relationships between arbitrary objects. It enables researβ¦β16Apr 28, 2021Updated 4 years ago
- [NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Promptsβ19Oct 7, 2024Updated last year
- Video Diffusion State Space Modelsβ19Mar 27, 2024Updated last year
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!β137Dec 31, 2023Updated 2 years ago
- ReMoMask: Retrieval-Augmented Masked Motion Generationβ39Feb 14, 2026Updated 3 weeks ago
- LLMBind: A Unified Modality-Task Integration Frameworkβ19Jun 16, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ54Mar 9, 2025Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"β48Sep 3, 2025Updated 6 months ago
- β58Feb 27, 2026Updated last week
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ187Nov 6, 2025Updated 4 months ago
- [AAAI 2025π₯] Official implementation of Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycleβ217Feb 16, 2025Updated last year
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolutionβ59Mar 4, 2025Updated last year
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Mindsβ96Jul 4, 2024Updated last year
- (CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modelingβ211Jul 28, 2024Updated last year
- Video-LlaVA fine-tune for CinePile evaluationβ51Aug 8, 2024Updated last year
- [ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localizationβ57Nov 10, 2023Updated 2 years ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understandingβ82Jul 4, 2025Updated 8 months ago