qishisuren123 / AnyCapLinks
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable caption styles.
☆48Updated last month
Alternatives and similar repositories for AnyCap
Users that are interested in AnyCap are comparing it to the libraries listed below
Sorting:
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆62Updated 6 months ago
- ☆122Updated 2 months ago
- ☆105Updated last week
- Test-time Scaling for VAR models☆21Updated last month
- Quick Long Video Understanding