A one-stop shop for YouCook2 info such as leaderboard and recent advances on (cooking) video retrieval and captioning.
☆41Jun 29, 2022Updated 3 years ago
Alternatives and similar repositories for YouCook2-Leaderboard
Users that are interested in YouCook2-Leaderboard are comparing it to the libraries listed below
Sorting:
- Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"☆48Jun 22, 2024Updated last year
- Data Release for VALUE Benchmark☆30Feb 16, 2022Updated 4 years ago
- PyTorch GPU distributed training code for MIL-NCE HowTo100M☆219Jul 5, 2022Updated 3 years ago
- A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free…☆29Jan 15, 2022Updated 4 years ago
- [ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning☆171Dec 4, 2020Updated 5 years ago
- Evaluation code for Dense-Captioning Events in Videos☆130Jun 11, 2019Updated 6 years ago
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆34Nov 9, 2021Updated 4 years ago
- Source code for paper "Towards Automatic Learning of Procedures from Web Instructional Videos"☆34Jan 6, 2019Updated 7 years ago
- Identifying Visible Actions in Lifestyle Vlogs☆15Aug 3, 2023Updated 2 years ago
- Referring expression comprehension on ReferIt(RefClef)☆10Nov 28, 2016Updated 9 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Oct 20, 2022Updated 3 years ago
- [ACL 2021] mTVR: Multilingual Video Moment Retrieval☆27Aug 20, 2022Updated 3 years ago
- ☆95Feb 14, 2022Updated 4 years ago
- Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)☆17Jan 12, 2023Updated 3 years ago
- PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision☆46Jul 29, 2020Updated 5 years ago
- Second-place solution to dense video captioning task in ActivityNet Challenge (CVPR 2020 workshop)☆75Aug 25, 2021Updated 4 years ago
- Code for Learning to Learn Language from Narrated Video☆33Oct 3, 2023Updated 2 years ago
- ☆19May 2, 2020Updated 5 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆116Sep 15, 2022Updated 3 years ago
- S3D Text-Video model trained on HowTo100M using MIL-NCE☆200Jul 3, 2020Updated 5 years ago
- Code and data for the project "Visually grounded continual learning of compositional semantics"☆22Dec 27, 2022Updated 3 years ago
- Adversarial Inference for Multi-Sentence Video Descriptions (CVPR 2019)☆34Jul 17, 2019Updated 6 years ago
- Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos☆87Nov 22, 2020Updated 5 years ago
- Video captioning baseline models on Video2Commonsense Dataset.☆57Apr 15, 2021Updated 4 years ago
- A video retrieval dataset How2R and a video QA dataset How2QA☆24Oct 15, 2020Updated 5 years ago
- [ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering☆133Oct 25, 2022Updated 3 years ago
- Easy to use video deep features extractor☆322Jul 5, 2020Updated 5 years ago
- video captioning☆24Mar 14, 2019Updated 6 years ago
- ☆43Apr 25, 2019Updated 6 years ago
- A pytorch implemetation of data augmentation method for visual question answering☆21May 25, 2023Updated 2 years ago
- The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding☆64Mar 9, 2022Updated 3 years ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆43Jul 15, 2022Updated 3 years ago
- ☆44Mar 8, 2021Updated 4 years ago
- Code for the HowTo100M paper☆293Mar 10, 2020Updated 5 years ago
- Tensorflow implementation of "Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization"[ICC…☆13Mar 29, 2019Updated 6 years ago
- ☆11Sep 7, 2020Updated 5 years ago
- Project page for "Visual Grounding in Video for Unsupervised Word Translation" CVPR 2020☆43Apr 26, 2020Updated 5 years ago
- COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning☆291Sep 6, 2022Updated 3 years ago
- Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))☆56Feb 6, 2023Updated 3 years ago