TeleAI-UAGI / TeleEgoLinks
The official repo of TeleEgo - A Benchmark for Egocentric AI Assistants.
☆56Updated last month
Alternatives and similar repositories for TeleEgo
Users that are interested in TeleEgo are comparing it to the libraries listed below
Sorting:
- [NeurIPS'25] ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and R…☆30Updated 4 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆37Updated last year
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆50Updated 7 months ago
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆34Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆33Updated 6 months ago
- ☆124Updated last year
- ☆26Updated 2 years ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).☆21Updated 6 months ago
- Training Autoregressive Image Generation models via Reinforcement Learning☆49Updated 2 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆145Updated 6 months ago
- Training code for CLIP-FlanT5☆30Updated last year
- Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"☆20Updated 2 years ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆77Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 7 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆170Updated last year
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆44Updated 9 months ago
- Introduce a novel Video Trimming (VT) task and proposes an agent-based approach (AVT) for detecting wasted footage, selecting valuable se…☆23Updated last year
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆24Updated 2 years ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [TIP] Exploring Effective Factors for Improving Visual In-Context Learning☆19Updated 7 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆41Updated 10 months ago
- ☆24Updated 7 months ago
- ☆54Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆44Updated last year