Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Oct 25, 2023Updated 2 years ago
Alternatives and similar repositories for Skywork-MM
Users that are interested in Skywork-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11Updated this week
- [ACL 2023] Delving into the Openness of CLIP☆24Jan 11, 2023Updated 3 years ago
- Video Diffusion State Space Models☆19Mar 27, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆24Oct 8, 2023Updated 2 years ago
- Blending Custom Photos with Video Diffusion Transformers☆50Jan 21, 2025Updated last year
- Generate consistent videos with stable diffusion models☆51Jan 20, 2023Updated 3 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated 2 years ago
- [ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents☆60Feb 26, 2026Updated 2 months ago
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening☆73May 18, 2025Updated last year
- Agent Tool and Skills for VR Development on Meta Quest☆63May 12, 2026Updated last week
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆125Nov 25, 2024Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆106Nov 9, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- music generation with perceiver-ar model☆26Jul 20, 2022Updated 3 years ago
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆49Jul 22, 2025Updated 9 months ago
- Unity 3D Code for "Building Tilt Brush from Scratch" YouTube tutorial by Fuseman☆11Mar 1, 2017Updated 9 years ago
- ☆28Jan 6, 2026Updated 4 months ago
- 2022 WAIC 黑客松蚂蚁财富赛道:AntSQL大规模金融语义解析中文Text-to-SQL挑战赛 一位萌新的代码 嘻嘻嘻☆14Mar 11, 2023Updated 3 years ago
- python library for reverse engineered Adobe Firefly API☆13Mar 31, 2023Updated 3 years ago
- ☆12Nov 8, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Nov 10, 2023Updated 2 years ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆116Sep 26, 2024Updated last year
- General-purpose Visual Understanding Evaluation☆20Dec 21, 2023Updated 2 years ago
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated last month
- An Extension for Automatic1111 Webui that makes the interface easier to use on mobile (portrait)☆16Apr 16, 2024Updated 2 years ago
- Web page for "🍅HumanTOMATO: Text-aligned Whole-body Motion Generation".☆15May 25, 2024Updated last year
- ☆13Jul 10, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated 4 months ago
- Text-based real image editing with stable diffusion models☆27Dec 19, 2022Updated 3 years ago
- Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification☆17Jul 13, 2025Updated 10 months ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆26May 31, 2025Updated 11 months ago
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- Implementation about a recommender System using RQ-VAE Semantic IDs☆17Apr 15, 2026Updated last month
- ☆13Feb 5, 2025Updated last year