samuelyu2002/PACS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/samuelyu2002/PACS)

samuelyu2002 / PACS

Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)

☆18

Alternatives and similar repositories for PACS

Users that are interested in PACS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

samuelyu2002 / PFLD
View on GitHub
Implementation of Practical Facial Landmark Detector (PFLD) on Pytorch
☆14Jul 23, 2023Updated 2 years ago
nlx-group / Shortcutted-Commonsense-Reasoning
View on GitHub
Code for the article "Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning", Outstanding Paper at EMNLP20…
☆10Nov 7, 2021Updated 4 years ago
rowanz / merlot_reserve
View on GitHub
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆146Jun 1, 2022Updated 4 years ago
visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
linzhiqiu / continual-learning
View on GitHub
☆15Mar 31, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
shlizee / savvy
View on GitHub
Repository for SAVVY(Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing) Benchmark and SAVVY model
☆25May 30, 2026Updated last month
CMU-Robotics-Club / RobOrchestra
View on GitHub
More Robots than the Swarm.
☆14Apr 24, 2026Updated 2 months ago
swarnaHub / ExplaGraphs
View on GitHub
[EMNLP 2021] Dataset and PyTorch Code for ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning
☆14Nov 5, 2022Updated 3 years ago
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 3 years ago
Ydkwim / CTAL
View on GitHub
Pre-training Cross-modal Transformer for Audio-and-Language Representations
☆39Apr 20, 2021Updated 5 years ago
samuelyu2002 / ImVisible
View on GitHub
ImVisible: Pedestrian Traffic Light (PTL) Dataset, Lightweight CNN (LytNet), and Mobile Application for the Visually Impaired (CAIP '19, …
☆75Apr 18, 2024Updated 2 years ago
OpenNLPLab / FAVDBench
View on GitHub
[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description
☆76Dec 4, 2023Updated 2 years ago
klauscc / VindLU
View on GitHub
☆108Dec 23, 2022Updated 3 years ago
orlitany / SOSELETO
View on GitHub
pytorch implementation of SOSELETO
☆15Sep 5, 2019Updated 6 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
yoxu515 / VIPOSeg-Benchmark
View on GitHub
The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".
☆12Oct 17, 2023Updated 2 years ago
murufeng / knowledge_distillation
View on GitHub
一款即插即用的知识蒸馏工具包
☆13May 16, 2022Updated 4 years ago
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆21Jun 14, 2024Updated 2 years ago
HazzaCheng / AutoSpeech2019
View on GitHub
The 1st place solution for AutoSpeech 2019.
☆17Jun 9, 2020Updated 6 years ago
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
SSyangguang / MEF-freq
View on GitHub
Code for A Dual Domain Multi-exposure Image Fusion Network Based on the Spatial-frequency Integration.
☆12Jul 25, 2024Updated last year
facebookresearch / rlr-audio-propagation
View on GitHub
Audio propagation engine - Meta Reality Labs Research.
☆24Nov 1, 2022Updated 3 years ago
Wanderlust717 / CARGNet
View on GitHub
[TGRS 2023] Point Label Meets Remote Sensing Change Detection: A Consistency-Aligned Regional Growth Network
☆15Jan 5, 2024Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
v-iashin / VoxCeleb
View on GitHub
An attempt to replicate the results of [1706.08612] VoxCeleb: a large-scale speaker identification dataset
☆12Dec 11, 2019Updated 6 years ago
yunyikristy / global_local
View on GitHub
☆14Oct 7, 2021Updated 4 years ago
zzzx1224 / EBTSA-ICLR2023
View on GitHub
☆12Feb 17, 2025Updated last year
tanABCC / VABench
View on GitHub
☆16Jul 8, 2026Updated 2 weeks ago
YuanGongND / uavm
View on GitHub
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 3 years ago
sayakpaul / Multimodal-Entailment-Baseline
View on GitHub
This repository shows how to implement a basic model for multimodal entailment.
☆10Aug 17, 2021Updated 4 years ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 3 years ago
lartpang / UltraHighResolution
View on GitHub
Papers about the ultra high resolution tasks.
☆13Jul 12, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
DTaoo / Simplified_DMC
View on GitHub
A simplified version for DMC (Deep Multimodal Clustering for Unsupervised Audiovisual Learning)
☆19May 27, 2020Updated 6 years ago
registor / radarchart
View on GitHub
一个用tikz绘制多维评价雷达图的自定义环境，以便于在LaTeX绘制多维评价雷达图。
☆12Jan 5, 2019Updated 7 years ago
soCzech / ChangeIt
View on GitHub
ChangeIt dataset with more than 2600 hours of video with state-changing actions published at CVPR 2022
☆11Mar 23, 2022Updated 4 years ago
nuaa-nlp / Multimodality
View on GitHub
☆15Dec 10, 2021Updated 4 years ago
HazyResearch / augmentation_code
View on GitHub
Reproducible code for Augmentation paper
☆17Jan 23, 2019Updated 7 years ago
ubc-vision / TriBERT
View on GitHub
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…
☆14Dec 9, 2021Updated 4 years ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago