Netflix / videoannotator
☆47Updated last year
Alternatives and similar repositories for videoannotator:
Users that are interested in videoannotator are comparing it to the libraries listed below
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated 9 months ago
- ☆75Updated 6 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- ☆61Updated 9 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 8 months ago
- Incredibly descriptive audiovisual summaries for videos☆40Updated 9 months ago
- MetaCLIP module for use with Autodistill.☆21Updated last year
- The open source implementation of "NeVA: NeMo Vision and Language Assistant"☆18Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆108Updated 2 months ago
- ☆20Updated 11 months ago
- ☆68Updated 10 months ago
- ☆63Updated 7 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 9 months ago
- Create topological graph for image segments.☆22Updated 7 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- Tools for merging pretrained large language models.☆19Updated 11 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆65Updated 7 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆94Updated 4 months ago
- ☆56Updated 5 months ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆33Updated 4 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 2 months ago
- Jockey is a conversational video agent.☆76Updated 3 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 5 months ago
- This project breathes life into video characters by using AI to describe their personality and then chat with you as them.☆46Updated last year
- XmodelLM☆39Updated 5 months ago
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆27Updated 11 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 5 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 8 months ago
- ☆14Updated last year