Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Oct 25, 2023Updated 2 years ago
Alternatives and similar repositories for Skywork-MM
Users that are interested in Skywork-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆30Dec 1, 2022Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11May 22, 2026Updated 2 weeks ago
- Video Diffusion State Space Models☆19Mar 27, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Dynamic Early Exit for Image Captioning☆17Oct 25, 2022Updated 3 years ago
- ☆24Oct 8, 2023Updated 2 years ago
- Blending Custom Photos with Video Diffusion Transformers☆50Jan 21, 2025Updated last year
- Generate consistent videos with stable diffusion models☆51Jan 20, 2023Updated 3 years ago
- ☆31Mar 24, 2025Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆106Nov 9, 2023Updated 2 years ago
- music generation with perceiver-ar model☆26Jul 20, 2022Updated 3 years ago
- ☆26Jun 25, 2021Updated 4 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆50Jul 22, 2025Updated 10 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆30Jan 6, 2026Updated 5 months ago
- ☆30Feb 16, 2024Updated 2 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- Our 2nd-gen LMM☆34May 22, 2024Updated 2 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- Cyberdolphin Suite of ComfyUI nodes for wiring up OpenAI and compatible LLM APIs.☆15Jul 31, 2024Updated last year
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Nov 10, 2023Updated 2 years ago
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated last month
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Sep 19, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated 4 months ago
- Text-based real image editing with stable diffusion models☆27Dec 19, 2022Updated 3 years ago
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- Building a multi-agent RAG system with advanced RAG methods☆13Jan 12, 2025Updated last year
- ☆13Feb 5, 2025Updated last year
- [ECCV2022] A PyTorch implementation of the paper "Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embo…☆13Mar 20, 2023Updated 3 years ago
- Gradient-Free Textual Inversion for Personalized Text-to-Image Generation☆43Jan 23, 2023Updated 3 years ago
- Lion: Kindling Vision Intelligence within Large Language Models☆51Jan 25, 2024Updated 2 years ago
- ☆11Sep 7, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆59Aug 7, 2023Updated 2 years ago
- Dataset for WWW 2020 paper "Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog"☆38May 4, 2021Updated 5 years ago
- ☆29Mar 30, 2026Updated 2 months ago
- Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"☆12Aug 1, 2024Updated last year
- ☆27Jun 20, 2021Updated 4 years ago
- glsl-like scripting language for rapid prototyping of multipass rendering techniques☆16Feb 10, 2026Updated 4 months ago
- Unifew: Unified Fewshot Learning Model☆18Sep 10, 2021Updated 4 years ago