☆97Sep 19, 2024Updated last year
Alternatives and similar repositories for fineVideo
Users that are interested in fineVideo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A huge dataset for Document Visual Question Answering☆22Jul 29, 2024Updated last year
- Video-LlaVA fine-tune for CinePile evaluation☆51Aug 8, 2024Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆95May 28, 2026Updated 2 weeks ago
- ☆10Nov 18, 2024Updated last year
- Unofficial Implementation of Selective Attention Transformer☆20Oct 31, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆26Jun 5, 2026Updated last week
- YOLOv10: Real-Time End-to-End Object Detection☆12May 24, 2024Updated 2 years ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated last year
- Learning to cut end-to-end pretrained modules☆38Apr 17, 2025Updated last year
- Using short models to classify long texts☆21Mar 8, 2023Updated 3 years ago
- Profile your CoreML models directly from Python 🐍☆30Sep 8, 2025Updated 9 months ago
- Hugging Face Jobs☆20Jul 11, 2025Updated 11 months ago
- SmolVLM2 Demo☆188Mar 20, 2025Updated last year
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆876Dec 14, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Chunk Dedupe Estimation☆20Nov 5, 2024Updated last year
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆159Aug 8, 2025Updated 10 months ago
- helper functions for processing and integrating visual language information with Qwen-VL Series Model☆17Aug 30, 2024Updated last year
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆77Jul 14, 2025Updated 11 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆74Jan 20, 2025Updated last year
- ☆22Jun 30, 2021Updated 4 years ago
- ☆28Mar 3, 2025Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆700Jan 29, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆215Aug 28, 2024Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Jul 30, 2024Updated last year
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆58Dec 28, 2025Updated 5 months ago
- ☆21Nov 18, 2024Updated last year
- [AAAI 2024] Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-Supervised 3D Object Detection☆10Jan 24, 2025Updated last year
- [ICML 2025] Official PyTorch implementation of LongVU☆427May 8, 2025Updated last year
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark☆144Jul 9, 2025Updated 11 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆200Sep 26, 2025Updated 8 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆144Jun 4, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆176Mar 23, 2025Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆31Dec 28, 2023Updated 2 years ago
- Deep Speech Distances PyTorch☆29Feb 21, 2022Updated 4 years ago
- A Swift wrapper for the Supertone text-to-speech model☆34Dec 11, 2025Updated 6 months ago
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"☆98Jun 6, 2025Updated last year
- ANE accelerated embedding models!☆19Dec 11, 2024Updated last year
- Awesome papers & datasets specifically focused on long-term videos.☆378Oct 9, 2025Updated 8 months ago