qishisuren123/AnyCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qishisuren123/AnyCap)

qishisuren123 / AnyCap

A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable caption styles.

☆54

Alternatives and similar repositories for AnyCap

Users that are interested in AnyCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Cominclip / OmniVerifier
View on GitHub
[ICLR 2026 Oral & ICML 2026] Generative Universal Verifier as Multimodal Meta-Reasoner
☆63May 29, 2026Updated last month
SihengLi99 / LLM-Honesty-Survey
View on GitHub
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆66Dec 8, 2024Updated last year
Yangsenqiao / Awesome-Continual-Test-Time-Adaptation
View on GitHub
Collection of awesome Continual Test-Time Adaptation methods
☆24Jun 4, 2024Updated 2 years ago
bcmi / Granular-GRPO
View on GitHub
[CVPR 2026] Fine-Grained GRPO for Precise Preference Alignment in Flow Models
☆64Jun 1, 2026Updated last month
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆35Feb 22, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
TIGER-AI-Lab / VideoScore
View on GitHub
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
☆121Dec 4, 2025Updated 7 months ago
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated 2 years ago
thu-coai / VPO
View on GitHub
☆25Jul 20, 2025Updated 11 months ago
jiyt17 / ReDiff
View on GitHub
Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'
☆45Jun 27, 2026Updated 2 weeks ago
Xinxi-Zhang / Re-MeanFlow
View on GitHub
☆48Mar 29, 2026Updated 3 months ago
zjunlp / ReCode
View on GitHub
[AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates
☆26Jul 1, 2025Updated last year
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆32Nov 12, 2024Updated last year
zai-org / SSVAE
View on GitHub
official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".
☆71Dec 25, 2025Updated 6 months ago
xinding-bot / StreamMind
View on GitHub
[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
☆71Jun 25, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
yu-lin-li / DyToK
View on GitHub
[NeurIPS 2025] Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
☆76Feb 20, 2026Updated 4 months ago
Quest4Science / MonoArt
View on GitHub
The official implementation of “MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction”
☆65Mar 20, 2026Updated 3 months ago
HaroldChen19 / VistaDPO
View on GitHub
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆41Jun 14, 2025Updated last year
OpenMOSS / rope_pp
View on GitHub
[ICLR26] Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
☆33Dec 9, 2025Updated 7 months ago
TencentARC / OmniScript
View on GitHub
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
☆18Apr 24, 2026Updated 2 months ago
SihengLi99 / SEALONG
View on GitHub
Large Language Models Can Self-Improve in Long-context Reasoning
☆72Nov 24, 2024Updated last year
JingMog / THOR
View on GitHub
[ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".
☆33Feb 26, 2026Updated 4 months ago
DavidFanzz / SCMoE
View on GitHub
☆29May 24, 2024Updated 2 years ago
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
moonmath-ai / LiteAttention
View on GitHub
Transforming Video Diffusion with Temporal Sparse Attention
☆54Apr 8, 2026Updated 3 months ago
ShaojieJiang / tldr
View on GitHub
Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"
☆10Aug 11, 2023Updated 2 years ago
Gen-Verse / Diffusion-Sharpening
View on GitHub
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
☆72May 18, 2025Updated last year
yisuanwang / DanceTog
View on GitHub
[ICLR 26] DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
☆42Aug 3, 2025Updated 11 months ago
IamCreateAI / FlowCPS
View on GitHub
An official implementation of Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching
☆79Sep 11, 2025Updated 10 months ago
IVY-LVLM / Video-MA2MBA
View on GitHub
Official Implementation of Video-MA2MBA
☆12Dec 3, 2024Updated last year
k4rtik / uchicago-poster
View on GitHub
Unofficial Poster Template for UChicago Computer Science
☆14Sep 8, 2022Updated 3 years ago
agwmon / frame-guidance
View on GitHub
[ICLR 2026] Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
☆63Mar 3, 2026Updated 4 months ago
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
GATECH-EIC / LaCache
View on GitHub
[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
☆17Nov 4, 2025Updated 8 months ago
wangf3014 / Patch_Scaling
View on GitHub
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
☆25Feb 25, 2025Updated last year
Candice-yu / GeoLaux
View on GitHub
A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
☆38Apr 27, 2026Updated 2 months ago
ali-vilab / matrix
View on GitHub
☆34Apr 8, 2025Updated last year
zhangquanchen / SIFThinker
View on GitHub
[AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
☆22Dec 2, 2025Updated 7 months ago
susumuota / nano-askllm
View on GitHub
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
☆12Jun 19, 2024Updated 2 years ago
opendatalab / REST
View on GitHub
☆34Jul 15, 2025Updated 11 months ago