claudio-unipv / pvmlLinks
A library for novices who want to experiment with Machine Learning
☆11Updated last year
Alternatives and similar repositories for pvml
Users that are interested in pvml are comparing it to the libraries listed below
Sorting:
- YesBut - Multimodal Satire Comprehension Dataset☆17Updated 9 months ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆10Updated 8 months ago
- Text-Guided Synthesis of Scientific Vector Graphics with TikZ☆95Updated 4 months ago
- Official Code for: "DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency"☆24Updated 2 months ago
- ☆65Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆30Updated 3 weeks ago
- Optocal Character Recognition (OCR / HTR) using Transformers☆11Updated 2 years ago
- Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"☆26Updated 2 years ago
- ☆13Updated 7 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆121Updated 6 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- ☆16Updated 9 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 10 months ago
- ☆45Updated last week
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆47Updated 2 weeks ago
- VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models☆21Updated 3 months ago
- ☆33Updated 2 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆44Updated 9 months ago
- Movies that people living in 110D Dryden Road have watched since the beginning of the 2018-2019 school year☆28Updated 7 months ago
- Dataset introduced in PlotQA: Reasoning over Scientific Plots☆78Updated 2 years ago
- Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout☆20Updated 5 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 11 months ago
- ☆10Updated 3 months ago
- This is IABAC Project. The project's business rationale entails utilizing the dataset's provided features to forecast employee performanc…☆11Updated 6 months ago
- Accompanying code for "Analyzing Vision Tranformers in Class Embedding Space" (NeurIPS '23)☆14Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 5 months ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆18Updated 4 months ago
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆32Updated 2 months ago
- On Path to Multimodal Generalist: General-Level and General-Bench☆17Updated last week
- [CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"☆64Updated last year