MoonshotAI/WorldVQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MoonshotAI/WorldVQA)

MoonshotAI / WorldVQA

☆119

Alternatives and similar repositories for WorldVQA

Users that are interested in WorldVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
MoonshotAI / Kimi-VL
View on GitHub
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,206Jul 15, 2025Updated last year
ByteDance-Seed / Seed-1.8
View on GitHub
☆219Dec 19, 2025Updated 7 months ago
Sharut / canonical-multimodal-rep
View on GitHub
☆15Feb 25, 2026Updated 4 months ago
MoonshotAI / Kimi-K2.5
View on GitHub
Open Visual Agentic Intelligence
☆2,218Jan 31, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
pkunlp-icler / SCL-RAI
View on GitHub
Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022
☆11Aug 20, 2022Updated 3 years ago
thinking-machines-lab / tinker-project-ideas
View on GitHub
Ideas for projects related to Tinker
☆191Nov 6, 2025Updated 8 months ago
M3-IT / YING-VLM
View on GitHub
Vision Large Language Models trained on M3IT instruction tuning dataset
☆17Aug 16, 2023Updated 2 years ago
stepfun-ai / Step3-VL-10B
View on GitHub
Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…
☆407Jan 21, 2026Updated 6 months ago
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 8 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
chenllliang / ATP-AMR
View on GitHub
Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022
☆15Mar 31, 2023Updated 3 years ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,509Aug 3, 2025Updated 11 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
UniPat-AI / BabyVision
View on GitHub
We introduce BabyVision, a benchmark revealing the infancy of AI vision.
☆232Jan 13, 2026Updated 6 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
stepfun-ai / Step3
View on GitHub
☆453Aug 10, 2025Updated 11 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
MoonshotAI / Kimi-Linear
View on GitHub
☆1,481Nov 17, 2025Updated 8 months ago
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆466May 26, 2026Updated last month
bigcode-project / bigcodearena
View on GitHub
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
☆61Oct 13, 2025Updated 9 months ago
MoonshotAI / Kimi-Researcher
View on GitHub
☆80Jun 20, 2025Updated last year
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chenllliang / G1
View on GitHub
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆103May 20, 2025Updated last year
zlab-princeton / vero
View on GitHub
Vero: An Open RL Recipe for General Visual Reasoning
☆134Jun 19, 2026Updated last month
XiaomiMiMo / MiMo-VL
View on GitHub
MiMo-VL
☆642Aug 21, 2025Updated 11 months ago
vlf-silkie / VLFeedback
View on GitHub
☆102Dec 22, 2023Updated 2 years ago
google-deepmind / lm_act
View on GitHub
LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
☆30May 21, 2025Updated last year
visgym / VisGym
View on GitHub
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
☆114May 3, 2026Updated 2 months ago
CASIA-IVA-Lab / OPT_Questioner
View on GitHub
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Aug 9, 2023Updated 2 years ago
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 3 months ago
lancopku / clip-openness
View on GitHub
[ACL 2023] Delving into the Openness of CLIP
☆24Jan 11, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XiaomiMiMo / lmms-eval
View on GitHub
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
☆70Aug 8, 2025Updated 11 months ago
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆975Jul 4, 2026Updated 2 weeks ago
chenllliang / MLS
View on GitHub
Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ACL-2022
☆18May 19, 2022Updated 4 years ago
chenllliang / DnD-Transformer
View on GitHub
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆80Dec 10, 2024Updated last year
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,102Updated this week
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year