bytedance / F-16View external linksLinks
F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.
☆34Jul 3, 2025Updated 7 months ago
Alternatives and similar repositories for F-16
Users that are interested in F-16 are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆26Jan 1, 2026Updated last month
- Understanding Convolution for Semantic Segmentation, web: 1. https://zhuanlan.zhihu.com/p/26659914 2. https://blog.csdn.net/u011974639…☆16Dec 22, 2018Updated 7 years ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Jan 26, 2026Updated 2 weeks ago
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆36Jun 10, 2025Updated 8 months ago
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆29Sep 26, 2024Updated last year
- Structured Video Comprehension of Real-World Shorts☆230Sep 21, 2025Updated 4 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 2 months ago
- Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation☆12Feb 16, 2025Updated 11 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 7 months ago
- ☆11Dec 6, 2024Updated last year
- ☆24Jun 19, 2025Updated 7 months ago
- FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection☆24Jan 13, 2026Updated last month
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 7 months ago
- ☆22Dec 23, 2025Updated last month
- build vgg16 with pytorch 0.4.0 for classification of CIFAR datasets☆10Mar 31, 2019Updated 6 years ago
- ☆22Dec 11, 2025Updated 2 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆73Dec 14, 2025Updated last month
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training fo…☆19Oct 20, 2025Updated 3 months ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- FilterPlayer, using mediaplayer and GLSurfaceView to add filter on videos.☆10Dec 3, 2020Updated 5 years ago
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆15Nov 18, 2025Updated 2 months ago
- ☆14Dec 2, 2025Updated 2 months ago
- ☆13May 17, 2025Updated 8 months ago
- Remote sensing labwork☆12Feb 27, 2018Updated 7 years ago
- Search, download Vimeo videos and retrieve metadata in Go.☆11Feb 10, 2022Updated 4 years ago
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆12Mar 6, 2025Updated 11 months ago
- The first large scale formally verified reasoning dataset for Verilog☆19May 16, 2025Updated 8 months ago
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆11May 26, 2024Updated last year
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 3 years ago
- ☆11Jan 27, 2020Updated 6 years ago
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- Weakly Supervised Referring Video Object Segmentation with Object-Centric Pseudo-Guidance☆10Aug 17, 2024Updated last year
- Image Text Segmentation using FAST corner detection and DBSCAN clustering with k-d tree data structure☆13Feb 27, 2019Updated 6 years ago
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- This repository contains the video files (download links) and corresponding annotations used in the paper "Long-Term Face Tracking for Cr…☆14Dec 18, 2020Updated 5 years ago
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- ☆11Dec 11, 2023Updated 2 years ago