https://avocado-captioner.github.io/
☆31Oct 16, 2025Updated 5 months ago
Alternatives and similar repositories for AVoCaDO
Users that are interested in AVoCaDO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The Source Code for OmniVideoBench @ICLR 2026☆69Feb 12, 2026Updated last month
- ☆16Oct 10, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆43Mar 6, 2026Updated 2 weeks ago
- A collection of helpful contracts and libraries for use with Tempo for Foundry☆64Updated this week
- Temporary fork of Foundry with Tempo support☆72Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Experimental implementation of regions in WebVTT building on Anne's WebVTT parser.☆14Oct 19, 2014Updated 11 years ago
- Go SDK for the Tempo blockchain☆62Updated this week
- This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities☆39Updated this week
- Local windowed attention multi-instrumental music transformer tailored for music orchestration/instrumentation and stable music generatio…☆17Oct 11, 2023Updated 2 years ago
- Knowledge Graph constructed from Wikipedia☆17Dec 18, 2022Updated 3 years ago
- NCS-like audio visualizer for AviUtl☆17May 22, 2025Updated 10 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆168Feb 23, 2026Updated last month
- This is a LoRA model finetuned on Wan-I2V-14B-480P. It turns things in the image into fluffy toys.☆19Nov 10, 2025Updated 4 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆45Mar 16, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- Collection of usefull scripts for RunPod pods☆15Jan 26, 2024Updated 2 years ago
- A Fine-Grained House Music Dataset☆23Oct 28, 2022Updated 3 years ago
- A basic example of how to test a typescript express app☆13Mar 26, 2022Updated 4 years ago
- MCP to provide utilities for working with video and images.☆16Jul 7, 2025Updated 8 months ago
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models☆28Mar 18, 2026Updated last week
- A Yiddish orthographic normalizer: Standard Yiddish goes in, Hasidic Yiddish comes out☆14Jun 26, 2024Updated last year
- The social network built with Blazor WebAssembly, ASP.NET Core and Microsoft SQL database. The network uses signalR, code-first database …☆11Feb 9, 2021Updated 5 years ago
- ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands☆106Feb 6, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- SODA: Story Oriented Dense Video Captioning Evaluation Framework☆14May 3, 2024Updated last year
- ☆14Sep 11, 2025Updated 6 months ago
- ☆20Nov 21, 2025Updated 4 months ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- ☆55Mar 5, 2026Updated 3 weeks ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆21Dec 22, 2025Updated 3 months ago
- 南京大学计算机网络实验2022秋☆22Jul 16, 2023Updated 2 years ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated 10 months ago
- ☆14Apr 9, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆13Jul 3, 2024Updated last year
- ☆18Apr 4, 2025Updated 11 months ago
- ☆42Mar 10, 2026Updated 2 weeks ago
- ☆13May 15, 2025Updated 10 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆168Jan 30, 2025Updated last year
- ☆19Jun 30, 2025Updated 8 months ago