TencentARC/FLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TencentARC/FLM)

TencentARC / FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

☆31

Alternatives and similar repositories for FLM

Users that are interested in FLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TencentARC / pi-Tuning
View on GitHub
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Jul 21, 2023Updated 2 years ago
zjr2000 / Untrimmed-Video-Feature-Extractor
View on GitHub
A simple and effective feature extractor for untrimmed videos
☆13Sep 1, 2022Updated 3 years ago
TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆16Jun 20, 2023Updated 3 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
TencentARC / Plot2Code
View on GitHub
☆23Aug 17, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
zhangjiewu / awesome-t2i-eval
View on GitHub
A curated list of papers and resources for text-to-image evaluation.
☆30Sep 6, 2023Updated 2 years ago
lizhaoliu-Lec / DAS
View on GitHub
This is the official repo for Densely-Anchored Sampling for Deep Metric Learning (ECCV 22).
☆16May 24, 2024Updated 2 years ago
Yangyi-Chen / CoTConsistency
View on GitHub
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Sep 16, 2023Updated 2 years ago
uestc-xyh / ComqueryFormer
View on GitHub
☆11Nov 28, 2022Updated 3 years ago
jason9693 / FROZEN
View on GitHub
☆14May 3, 2022Updated 4 years ago
zjr2000 / Awesome-Multimodal-Chatbot
View on GitHub
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction…
☆79Jun 18, 2023Updated 3 years ago
sail-sg / ptp
View on GitHub
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆150Jun 7, 2023Updated 3 years ago
kugwzk / DiDE
View on GitHub
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆31May 1, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yiren-jian / BLIText
View on GitHub
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
☆26Dec 5, 2023Updated 2 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
ttgeng233 / LongVALE
View on GitHub
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆61Jun 9, 2025Updated last year
DingchenYang99 / Pensieve
View on GitHub
The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"
☆15May 4, 2024Updated 2 years ago
nuaa-nlp / Multimodality
View on GitHub
☆15Dec 10, 2021Updated 4 years ago
abc403 / SMCA-replication
View on GitHub
SMCA replication
☆21Jul 24, 2021Updated 4 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
facebookresearch / ProcedureVRL
View on GitHub
[CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"
☆56Aug 8, 2023Updated 2 years ago
megvii-research / zipfls
View on GitHub
This repo is the official megengine implementation of the ECCV2022 paper: Efficient One Pass Self-distillation with Zipf's Label Smoothin…
☆27Oct 19, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
zjr2000 / SPES
View on GitHub
Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"
☆23May 8, 2026Updated 2 months ago
SihengLi99 / TextBind
View on GitHub
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Sep 19, 2023Updated 2 years ago
ChengshuaiZhao0 / The-Wolf-Within
View on GitHub
☆13Updated this week
silicx / DLSA
View on GitHub
The code accompanying our ECCV'22 papers: Constructing Balance from Imbalance for Long-tailed Image Recognition
☆18Jul 20, 2022Updated 4 years ago
NewsStoriesData / newsstories.github.io
View on GitHub
☆22Sep 20, 2022Updated 3 years ago
JohnWuzh / UC-OWOD
View on GitHub
☆19Jul 19, 2022Updated 4 years ago
AV-Reasoner / AV-Reasoner
View on GitHub
☆19Jul 22, 2025Updated 11 months ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
HongbangYuan / OmniReward
View on GitHub
☆47Dec 16, 2025Updated 7 months ago
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated 2 years ago
evelinehong / 3D-Concept-Grounding
View on GitHub
Code Release of "3D Concept Grounding on Neural Fields (NeurIPS2022)"
☆15Feb 13, 2023Updated 3 years ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
yuleung / FPPQ
View on GitHub
Implementation of NIPS2023: Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieva
☆11Nov 12, 2024Updated last year