SMILE-data / SMILELinks

SMILE: A Multimodal Dataset for Understanding Laughter

☆12

Alternatives and similar repositories for SMILE

Users that are interested in SMILE are comparing it to the libraries listed below

Sorting:

aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
Yui010206 / CREMA
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆53Updated 4 months ago
ilkerkesen / ViLMA
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Updated last year
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
Hritikbansal / videocon
☆57Updated last year
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆21Updated last year
HanSolo9682 / CounterCurate
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆18Updated last year
krafton-ai / Rare-to-Frequent
Rare-to-Frequent (R2F), ICLR'25, Spotlight
☆51Updated 6 months ago
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Updated 10 months ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
kkahatapitiya / LangRepo
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆32Updated last year
kaist-ami / BEAF
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆20Updated 7 months ago
Optimization-AI / FastCLIP
Distributed Optimization Infra for learning CLIP models
☆27Updated last year
google / storybench
☆49Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
TengdaHan / AutoAD
[CVPR'23 Highlight] AutoAD: Movie Description in Context.
☆99Updated 11 months ago
eric-ai-lab / Discffusion
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
☆30Updated last year
AIM-SKKU / QA-TIGER
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆23Updated 4 months ago
JiwanChung / vlis
☆24Updated 2 years ago
jialuli-luka / SELMA
Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
☆35Updated last year
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
k1rezaei / Text-to-concept
☆35Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆47Updated 8 months ago
naver-ai / prolip
☆53Updated 2 months ago
lzw-lzw / UnifiedMLLM
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Updated last year
facebookresearch / HierVL
[CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings
☆46Updated 2 years ago
IntelLabs / GraVi-T
Graph learning framework for long-term video understanding
☆67Updated 3 months ago
google / video-localized-narratives
☆60Updated 2 years ago
passing2961 / Stark
Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…
☆18Updated 10 months ago
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆48Updated last year