https://avocado-captioner.github.io/
☆30Oct 16, 2025Updated 4 months ago
Alternatives and similar repositories for AVoCaDO
Users that are interested in AVoCaDO are comparing it to the libraries listed below
Sorting:
- ☆16Oct 10, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- The Source Code for OmniVideoBench @ICLR 2026☆61Feb 12, 2026Updated 3 weeks ago
- Temporary fork of Foundry with Tempo support☆66Updated this week
- A collection of helpful contracts and libraries for use with Tempo for Foundry☆61Feb 27, 2026Updated last week
- Go SDK for the Tempo blockchain☆60Updated this week
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆161Feb 23, 2026Updated last week
- [CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆84Feb 13, 2026Updated 3 weeks ago
- PainterVRAM lets you reserve a slice of GPU memory before ComfyUI starts processing, preventing out-of-memory crashes. Switch between man…☆27Jan 2, 2026Updated 2 months ago
- Experimental implementation of regions in WebVTT building on Anne's WebVTT parser.☆14Oct 19, 2014Updated 11 years ago
- ☆13Jul 3, 2024Updated last year
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands☆97Feb 6, 2026Updated last month
- ☆14Sep 11, 2025Updated 5 months ago
- a Video Quality Analysis Toolkit☆13May 16, 2025Updated 9 months ago
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 3 months ago
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- This is a LoRA model finetuned on Wan-I2V-14B-480P. It turns things in the image into fluffy toys.☆19Nov 10, 2025Updated 3 months ago
- ☆13May 15, 2025Updated 9 months ago
- NCS-like audio visualizer for AviUtl☆17May 22, 2025Updated 9 months ago
- Scripts for KGIRNet model for ESWC☆10Jul 6, 2023Updated 2 years ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆13Jan 22, 2025Updated last year
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- Information Extraction related tools and models☆10Mar 16, 2023Updated 2 years ago
- ComfyUI custom node implementation of VideoMaMa for video matting with mask conditioning.☆40Feb 9, 2026Updated 3 weeks ago
- Collection of usefull scripts for RunPod pods☆15Jan 26, 2024Updated 2 years ago
- ☆20Nov 21, 2025Updated 3 months ago
- ☆12Jan 2, 2024Updated 2 years ago
- ☆16Mar 22, 2025Updated 11 months ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆18Jun 19, 2025Updated 8 months ago
- Official codebase for the NeurIPS 2023 paper: Towards Last-layer Retraining for Group Robustness with Fewer Annotations. https://arxiv.or…☆12May 15, 2024Updated last year
- LLaVA-Next for STVG☆18Dec 5, 2025Updated 3 months ago
- ☆17Aug 21, 2025Updated 6 months ago
- Collection of papers about video-audio understanding☆22Dec 26, 2025Updated 2 months ago
- ☆12Jan 25, 2024Updated 2 years ago
- Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation☆28Dec 10, 2025Updated 2 months ago