Starter code for working with the YouTube-8M dataset.
☆16Jun 9, 2017Updated 8 years ago
Alternatives and similar repositories for youtube-8m
Users that are interested in youtube-8m are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Content-Based Video-Music Retrieval using Soft Intra-Modal Structure Constraint☆62Sep 22, 2017Updated 8 years ago
- Keras Implementation of "Look, Listen and Learn" Model☆21Nov 14, 2017Updated 8 years ago
- Cross-modality (visual-auditory) Metric Learning Project☆15Dec 19, 2017Updated 8 years ago
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆208Apr 3, 2021Updated 5 years ago
- ☆10Apr 7, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Unofficial implementation of music separation model by Luo et.al.☆13Nov 3, 2019Updated 6 years ago
- The implementation of "End-to-End Neural Speaker Diarization with an Iterative Adaptive Attractor Estimation", which is accepted by Neura…☆11Aug 27, 2023Updated 2 years ago
- ☆21Jul 15, 2024Updated last year
- Official Implementation of BiFlow https://arxiv.org/abs/2512.10953☆48Feb 27, 2026Updated last month
- Power-Guided Grouped SRU for Real-Time Causal Audio-Visual Speech Separation☆26Nov 4, 2025Updated 5 months ago
- Code for COBRA: Contrastive Bi-Modal Representation Algorithm (https://arxiv.org/abs/2005.03687)☆15Jul 6, 2023Updated 2 years ago
- Official repository for the paper "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs"☆21Sep 7, 2025Updated 7 months ago
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆36Jul 3, 2025Updated 9 months ago
- build vgg16 with pytorch 0.4.0 for classification of CIFAR datasets☆10Mar 31, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☐ ☐ A simple, out-of-the-box and cross-platform bbox annotation tool by Python. Try it by `pip install easybox`☆10May 28, 2021Updated 4 years ago
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆11May 26, 2024Updated last year
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- Official implementation of Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation (ICLR 2024).☆27May 14, 2024Updated last year
- Research project on glyph-based Chinese character embedding. Preparing for EMNLP 2019☆11Mar 18, 2019Updated 7 years ago
- Official implementation of Efficient Speech Separation Framework Based on Neural State-Space Models☆26Feb 25, 2026Updated last month
- A simple and user-friendly tool for computing STFT/DGT☆19Jun 22, 2021Updated 4 years ago
- 支持多种 Linux 发行版的交互式/自动化 NVIDIA 驱动安装脚本☆45Apr 6, 2026Updated last week
- ☆10Nov 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing☆24Dec 29, 2021Updated 4 years ago
- Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images☆30Jun 14, 2019Updated 6 years ago
- C++ implementation for 《"GrabCut" — Interactive Foreground Extraction using Iterated Graph Cuts》☆12Jul 25, 2023Updated 2 years ago
- Codebase for ECCV18 "The Sound of Pixels"☆392Apr 25, 2022Updated 3 years ago
- Logs atmospheric pressure by using Android device's barometer sensor.☆16Nov 11, 2018Updated 7 years ago
- Code for Interspeech2022 paper DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion☆13May 6, 2023Updated 2 years ago
- This is a Kaggle data mining contest, link: https://www.kaggle.com/c/avazu-ctr-prediction☆11Mar 12, 2015Updated 11 years ago
- Reproducible code for Augmentation paper☆17Jan 23, 2019Updated 7 years ago
- RLBench simulation project for autonomous bin picking using Pandas robot arm☆10Mar 1, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆26Aug 4, 2020Updated 5 years ago
- The paper "A Two-Stream Siamese Neural Network for Vehicle Re-Identification by Using Non-Overlapping Cameras"☆31Jan 15, 2020Updated 6 years ago
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆36Jun 10, 2025Updated 10 months ago
- [ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers☆19Apr 16, 2024Updated 2 years ago
- A library to manipulate Inkscape SVG content using Python 3☆12Apr 28, 2021Updated 4 years ago
- Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)☆32Feb 28, 2025Updated last year
- Stable-diffusion-WebUI extensions, which enable tensorrt accelerated Unet for SDXL base model☆12Oct 18, 2023Updated 2 years ago