☆105Feb 4, 2026Updated 3 weeks ago
Alternatives and similar repositories for WorldVQA
Users that are interested in WorldVQA are comparing it to the libraries listed below
Sorting:
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆75Feb 7, 2026Updated 3 weeks ago
- VisPlay: Self-Evolving Vision-Language Models☆44Feb 12, 2026Updated 2 weeks ago
- ☆57Jul 8, 2025Updated 7 months ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 9 months ago
- Official Repository of NeurIPS2021 paper: PTR☆32Dec 17, 2021Updated 4 years ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks☆22Jan 19, 2026Updated last month
- [ICRA 2026] StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes☆20Feb 17, 2026Updated last week
- ☆213Dec 19, 2025Updated 2 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,164Jul 15, 2025Updated 7 months ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- yolo目标检测算法☆15Jul 27, 2025Updated 7 months ago
- SkillX.sh — The Only Skill That Your AI Agent Needs. AI agent skills marketplace with semantic search, leaderboard, ratings, and CLI.☆24Feb 13, 2026Updated 2 weeks ago
- ☆34Nov 11, 2025Updated 3 months ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- [SIGIR 2025] Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph☆16Jun 6, 2025Updated 8 months ago
- Standalone IOS app to GPS location without jailbreaks. Untethered, local, and open source.☆41Jan 20, 2026Updated last month
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆16Nov 19, 2025Updated 3 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆22Feb 13, 2026Updated 2 weeks ago
- DragMesh: Interactive 3D Generation Made Easy☆20Dec 28, 2025Updated 2 months ago
- Community maintained hardware plugin for vLLM on AWS Neuron☆23Updated this week
- MV-RAG combines retrieval with multi-view generation to create accurate 3D-consistent visuals. By retrieving reference images and text, i…☆24Nov 29, 2025Updated 3 months ago
- ☆17Aug 5, 2025Updated 6 months ago
- Tusk Drift Demo - Node.js Service☆58Jan 20, 2026Updated last month
- AiTer Optimized Model☆39Updated this week
- [ICLR 26] Part-X-MLLM: Part-aware 3D Multimodal Large Language Model☆111Jan 26, 2026Updated last month
- Python3 script to create Voronoi tessellations (mosaic pattern) on images☆10May 25, 2019Updated 6 years ago
- My matlab functions☆13Nov 9, 2014Updated 11 years ago
- something for paper agent☆11Dec 18, 2024Updated last year
- Quick Long Video Understanding [TMLR2025]☆76Oct 27, 2025Updated 4 months ago
- [npj Digital Medicine] A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis☆17Feb 6, 2025Updated last year
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- The official implementation of the paper "Self-Updatable Large Language Models by Integrating Context into Model Parameters"☆15May 18, 2025Updated 9 months ago
- Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding☆10Jan 5, 2026Updated last month
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- a viewer for for lancedb. including some actions like CRUD etc☆12Apr 27, 2025Updated 10 months ago
- ☆20Updated this week
- WWDC 2020 Swift Student Challenge Submission "6 Feet Between" by Tony Tang☆10Jun 17, 2020Updated 5 years ago