bytedance/F-16

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/F-16)

bytedance / F-16

F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.

☆40

Alternatives and similar repositories for F-16

Users that are interested in F-16 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
zai-org / MotionBench
View on GitHub
Official code for MotionBench (CVPR 2025)
☆76Mar 3, 2025Updated last year
zinuoli / TriSense
View on GitHub
[NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
☆27Feb 10, 2026Updated 5 months ago
zjuruizhechen / TVG-R1
View on GitHub
[EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
☆36Oct 22, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
bytedance / video-SALMONN-2
View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆204Feb 23, 2026Updated 5 months ago
CYF-cuber / HLoRA_MER_dinov2
View on GitHub
THE VISUAL COMPUTER “High-level LoRA and hierarchical fusion for enhanced micro-expression recognition”
☆15Oct 12, 2024Updated last year
yunzhuzhang0918 / flexselect
View on GitHub
The official repository for paper "FlexSelect: Flexible Token Selection for Efficient Long Video Understanding".
☆31Sep 19, 2025Updated 10 months ago
dingyue772 / OmniSIFT
View on GitHub
[ICML2026] OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
☆25May 21, 2026Updated 2 months ago
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
ant-research / Awesome-Fine-Grained-Multimodal-Perception
View on GitHub
A collection of the latest research and resources on Fine-Grained Multimodal Perception
☆30Jun 4, 2026Updated last month
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
chrisx599 / Video-Browser
View on GitHub
Official code repo of Video-Browser: Towards Agentic Open-web Video Browsing
☆28Jan 19, 2026Updated 6 months ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MSIIP / Connector-S
View on GitHub
☆13Apr 30, 2025Updated last year
EvolvingLMMs-Lab / VideoMMMU
View on GitHub
Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
☆72Sep 5, 2025Updated 10 months ago
MIPS-COLT / MER-MCE
View on GitHub
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
☆25Aug 2, 2024Updated last year
google-deepmind / vocap
View on GitHub
☆17Sep 5, 2025Updated 10 months ago
gyxxyg / TRACE
View on GitHub
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆156Aug 22, 2025Updated 11 months ago
NJU-LINK / MVU-Eval
View on GitHub
MVU-Eval @NeurIPS DB 2025
☆18Nov 11, 2025Updated 8 months ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
RIA1159 / FacialFlowNet
View on GitHub
Official release of FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model (ACMMM2024)
☆27Nov 11, 2024Updated last year
FudanCVL / AVI-Bench
View on GitHub
[ICML'26] Toward Human-like Audio-Visual Intelligence of Omni-MLLMs
☆16Jun 20, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shijian2001 / Video-Thinker
View on GitHub
Sparking "Thinking with Videos" via Reinforcement Learning
☆161Oct 30, 2025Updated 8 months ago
HKUST-KnowComp / SubeventWriter
View on GitHub
Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…
☆11Oct 16, 2022Updated 3 years ago
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
marinero4972 / VideoZeroBench
View on GitHub
Official implementation of "VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification"
☆21May 7, 2026Updated 2 months ago
Hokhim2 / CVBench
View on GitHub
☆19Aug 28, 2025Updated 10 months ago
Timothy-Liuxf / FrameRateTask
View on GitHub
Frame rate stabilizer, a task executor which executes tasks at a stable frame rate. 帧率稳定器——以固定帧率执行任务
☆10Jul 17, 2026Updated last week
yhy-2000 / MomentSeeker
View on GitHub
☆23Jul 23, 2025Updated last year
xrenaf / MEMLENS
View on GitHub
☆23Updated this week
EIT-NLP / Awesome-Streaming-LLMs
View on GitHub
🔥This is a repository of paper list for streaming LLMs/MLLMs.
☆24Apr 19, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated last year
maifoundations / Streamo
View on GitHub
Streaming Video Instruction Tuning
☆83Feb 25, 2026Updated 5 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
gouba2333 / MA-HMR
View on GitHub
☆17Nov 20, 2025Updated 8 months ago
TCL606 / WAVE
View on GitHub
ICLR 2026 Oral: WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
☆40Updated this week
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆882Dec 14, 2025Updated 7 months ago