The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"
☆38May 21, 2025Updated 9 months ago
Alternatives and similar repositories for Cockatiel
Users that are interested in Cockatiel are comparing it to the libraries listed below
Sorting:
- ☆54May 6, 2025Updated 10 months ago
- ☆13May 17, 2025Updated 9 months ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆39Jun 4, 2025Updated 9 months ago
- RLHF for Stable Diffusion☆14Jul 9, 2023Updated 2 years ago
- ☆18Oct 23, 2024Updated last year
- ☆28Mar 4, 2025Updated last year
- ☆25Nov 17, 2025Updated 3 months ago
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆47Jul 5, 2025Updated 8 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆25Feb 21, 2025Updated last year
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated 2 months ago
- [CVPR2025] Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think☆23Jul 1, 2025Updated 8 months ago
- [SIGGRAPH2025] Generative Video Matting☆58Aug 12, 2025Updated 6 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Nov 28, 2023Updated 2 years ago
- ☆121Feb 28, 2026Updated last week
- [ICML2025] LoRA fine-tune directly on the quantized models.☆39Nov 25, 2024Updated last year
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation☆120Oct 18, 2024Updated last year
- ☆68Aug 16, 2024Updated last year
- PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation☆37Oct 28, 2024Updated last year
- OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments☆14Oct 25, 2023Updated 2 years ago
- [NOTE] I do not have enough ressources to maintain VMS, please use Ostris's AI-Tookit instead☆43Oct 3, 2025Updated 5 months ago
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation☆47Jul 1, 2025Updated 8 months ago
- MoviiGen 1.1: Towards Cinematic-Quality Video Generative Models☆183Jul 21, 2025Updated 7 months ago
- Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation☆12Feb 16, 2025Updated last year
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 3 months ago
- ☆57Feb 2, 2026Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 3 months ago
- ☆98Jun 23, 2025Updated 8 months ago
- ☆65Jul 10, 2025Updated 7 months ago
- [CVPR 2025🔥] Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model☆198May 11, 2025Updated 9 months ago
- The raw UserRL repo under construction☆97Sep 25, 2025Updated 5 months ago
- ☆51Apr 11, 2025Updated 10 months ago
- Make your Turtlebot2 run on ROS Melodic (Ubuntu 18.04).☆10Jul 2, 2021Updated 4 years ago
- ☆23Jun 19, 2025Updated 8 months ago
- ☆12Nov 12, 2024Updated last year
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection☆55Aug 16, 2025Updated 6 months ago
- Promptopia is an open-source AI prompting tool for modern world to discover, create, and share creative prompts☆12May 27, 2023Updated 2 years ago
- 🔥🔥First-ever hour scale video understanding models☆614Jul 14, 2025Updated 7 months ago
- [ICCVW 2019] A Global-local Embedding Module for Fashion Landmark Detection☆34Apr 26, 2023Updated 2 years ago
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆64Jan 27, 2026Updated last month