VisuRiddles: Fine-grained Perception is a important thing for Multimodal Large Models in Riddles Solving
☆18Oct 22, 2025Updated 5 months ago
Alternatives and similar repositories for VisuRiddles
Users that are interested in VisuRiddles are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Apr 21, 2025Updated 11 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆128Jun 4, 2025Updated 9 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆14Jun 10, 2025Updated 9 months ago
- ☆20Nov 21, 2025Updated 4 months ago
- [ICCV 2025] LIRA☆21Nov 25, 2025Updated 4 months ago
- ☆12Sep 8, 2022Updated 3 years ago
- Increasing the scale and diversity of chart de-rendering data.☆12Mar 13, 2024Updated 2 years ago
- ☆18Mar 19, 2021Updated 5 years ago
- [ICCV2025] Training-Free Diffusion Models for Geometric Image Editing☆32Jan 13, 2026Updated 2 months ago
- ☆15May 15, 2025Updated 10 months ago
- [ICLR26] ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding☆42Updated this week
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.☆19Nov 20, 2024Updated last year
- ☆15Oct 23, 2018Updated 7 years ago
- ☆55Jun 17, 2025Updated 9 months ago
- ☆22May 30, 2023Updated 2 years ago
- ☆27Feb 20, 2024Updated 2 years ago
- Implement Code for UniMix and Bayias Compensated Loss☆19Mar 7, 2023Updated 3 years ago
- Export Donut model to onnx and run it with onnxruntime☆23Nov 21, 2023Updated 2 years ago
- This repository is the official Pytorch implementation of Balanced Product of Calibrated Experts for Long-Tailed Recognition (CVPR 2023).☆18Mar 13, 2025Updated last year
- Caffe implementation of the paper "Deep Pyramidal Residual Networks" (https://arxiv.org/abs/1610.02915).☆27Jul 18, 2017Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 实现Blip2RWKV+QFormer的多模态图文对话大模型,使用Two-Step Cognitive Psychology Prompt方法,仅3B参数的模型便能够出现类人因果思维链。对标MiniGPT-4,ImageBind等图文对话大语言模型,力求以更小的算力和资源实…☆42Jul 17, 2023Updated 2 years ago
- (ECCV'22) FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling☆19Nov 22, 2024Updated last year
- ☆12Oct 20, 2023Updated 2 years ago
- Verify CPU circuits in Logisim or Verilog against MARS simulation☆10Dec 31, 2020Updated 5 years ago
- ☆33Sep 27, 2024Updated last year
- ☆134Dec 22, 2023Updated 2 years ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- ☆12Aug 23, 2019Updated 6 years ago
- [NeurIPS 2022] Source code for our paper "Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data"☆24Oct 16, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- GAIIC2024无人机视角下的双光目标检测 - Rank6 解决方案☆11Jun 17, 2024Updated last year
- Decode Neural signal as Speech☆38Oct 6, 2024Updated last year
- 此项目创建的初衷是为了帮助人工智能、自然语言处理和大语言模型相关背景的同学找工作使用,欢迎加入项目的建设和维护☆17Mar 30, 2025Updated 11 months ago
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆19Feb 1, 2026Updated last month
- Deciphering Oracle Bone Language with Diffusion Models (ACL 2024 Best Paper)☆228Sep 17, 2025Updated 6 months ago
- Z1h is a programming language that lets you work quickly and provide simple, reliable, and flexible service.☆13Aug 17, 2025Updated 7 months ago
- This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the …☆54Dec 23, 2024Updated last year