Kilichbek / artemis-speaker-tools-b
Artemis Speaker Tools B
☆23Updated 3 years ago
Alternatives and similar repositories for artemis-speaker-tools-b:
Users that are interested in artemis-speaker-tools-b are comparing it to the libraries listed below
- ☆31Updated 4 years ago
- Code for Learning to Learn Language from Narrated Video☆33Updated last year
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated last year
- [CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)☆57Updated 3 years ago
- Code, data, models for the Sherlock corpus☆55Updated 2 years ago
- Official code for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"☆27Updated last month
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos☆118Updated last year
- Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO☆51Updated 4 years ago
- VisualCOMET: Reasoning about the Dynamic Context of a Still Image☆85Updated last year
- Data repository for the VALSE benchmark.☆36Updated 11 months ago
- Using LLMs and pre-trained caption models for super-human performance on image captioning.☆40Updated last year
- Code for the Globetrotter project☆23Updated 2 years ago
- Dataset and starting code for visual entailment dataset☆109Updated 2 years ago
- Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"☆19Updated last year
- kdexd/coco-caption@de6f385☆26Updated 4 years ago
- Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021☆19Updated 3 years ago
- ☆117Updated last year
- Multimodal Graph Network (MGN): Code repo, examples from the paper☆23Updated 3 years ago
- DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)☆138Updated last year
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆48Updated 2 years ago
- The SVO-Probes Dataset for Verb Understanding☆31Updated 3 years ago
- Code release for Park et al. Multimodal Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. in CVPR, 2018☆48Updated 6 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆46Updated last year
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆155Updated last month
- MERLOT: Multimodal Neural Script Knowledge Models☆223Updated 2 years ago
- Multi-sense word embeddings from visual co-occurrences☆25Updated 5 years ago
- [ICCV 2021] Official code for "Learning to Generate Scene Graph from Natural Language Supervision"☆100Updated last year
- A unified framework to jointly model images, text, and human attention traces.☆78Updated 3 years ago
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Updated 2 years ago
- A length-controllable and non-autoregressive image captioning model.☆68Updated 3 years ago