ggg0919/cantor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ggg0919/cantor)

ggg0919 / cantor

☆90

Alternatives and similar repositories for cantor

Users that are interested in cantor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhourax / VEGA
View on GitHub
☆38Jul 9, 2024Updated 2 years ago
SooLab / DDCOT
View on GitHub
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆48Mar 18, 2024Updated 2 years ago
xjtupanda / Sparrow
View on GitHub
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Sep 3, 2025Updated 10 months ago
LightChen233 / M3CoT
View on GitHub
☆92Mar 12, 2026Updated 4 months ago
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yixuan730 / DetToolChain
View on GitHub
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
☆45Oct 12, 2024Updated last year
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
ydk122024 / Med-HallMark
View on GitHub
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models
☆14Jun 24, 2024Updated 2 years ago
mikecheninoulu / Emotional-gesture-papers
View on GitHub
☆23May 29, 2025Updated last year
TIMMY-CHAN / MISS
View on GitHub
[ICANN 2024 (Oral)] MISS: A Generative Pre-training and Fine-tuning Approach for Med-VQA
☆12Aug 8, 2024Updated last year
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
UVa-NLP / VMASK
View on GitHub
Code for the paper "Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers"
☆18Dec 15, 2020Updated 5 years ago
billhhh / FQSR
View on GitHub
Codes for ACMMM 2021 paper "Fully Quantized Image Super-Resolution Networks".
☆20Jul 25, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
LHL3341 / ContextBLIP
View on GitHub
ContextBLIP : Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions [ACL 2024]
☆11May 17, 2024Updated 2 years ago
sangminwoo / RITUAL
View on GitHub
Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…
☆14Dec 16, 2024Updated last year
UARK-AICV / FG-CXR
View on GitHub
The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…
☆12Jul 28, 2025Updated 11 months ago
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
ZhangYiqun018 / StickerConv
View on GitHub
[ACL 2024]
☆60Jun 20, 2024Updated 2 years ago
Leon1207 / 3DRefTR
View on GitHub
This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"
☆26Aug 24, 2023Updated 2 years ago
Beckschen / LLaVolta
View on GitHub
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆66Feb 19, 2025Updated last year
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆366Jan 14, 2025Updated last year
shenyunhang / APE
View on GitHub
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆608May 8, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pkunlp-icler / MIC
View on GitHub
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆49Jul 13, 2025Updated last year
Yuqifan1117 / HalluciDoctor
View on GitHub
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆52Jul 16, 2024Updated 2 years ago
TIMMY-CHAN / MILE
View on GitHub
[MICCAI 2024] Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
☆17Sep 18, 2024Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
VITA-MLLM / VITA-Audio
View on GitHub
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆682May 24, 2025Updated last year
THUNLP-MT / CODIS
View on GitHub
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆13Oct 14, 2024Updated last year
Wuzheng02 / OS-Kairos
View on GitHub
[ACL 2025] Research code for the paper "OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents"
☆21Jun 19, 2025Updated last year
bhairavmehta95 / slitherin-gym
View on GitHub
Slither-in Inspired Snake Environment for OpenAI Gym (Part of Requests for Research 2.0)
☆12Mar 5, 2018Updated 8 years ago
sosppxo / 3D-STMN
View on GitHub
[AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…
☆45Dec 20, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
MME-Benchmarks / Video-MME-v2
View on GitHub
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆369May 24, 2026Updated 2 months ago
passalis / probabilistic_kt
View on GitHub
Probabilistic Knowledge Transfer for Deep Neural Networks
☆41Oct 22, 2018Updated 7 years ago
wangxu0820 / NegativePrompt
View on GitHub
The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…
☆25May 10, 2024Updated 2 years ago
UCSB-AI / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated last year
sergiotasconmorales / locvqa
View on GitHub
Localized questions for VQA
☆12May 6, 2025Updated last year
sarrouti / HealthVer
View on GitHub
☆20Feb 3, 2022Updated 4 years ago
yu-rp / apiprompting
View on GitHub
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆112Oct 10, 2024Updated last year