Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Oct 25, 2023Updated 2 years ago
Alternatives and similar repositories for Skywork-MM
Users that are interested in Skywork-MM are comparing it to the libraries listed below
Sorting:
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Dec 1, 2022Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11Jan 30, 2026Updated last month
- [ACL 2023] Delving into the Openness of CLIP☆24Jan 11, 2023Updated 3 years ago
- Video Diffusion State Space Models☆19Mar 27, 2024Updated last year
- Blending Custom Photos with Video Diffusion Transformers☆48Jan 21, 2025Updated last year
- Generate consistent videos with stable diffusion models☆51Jan 20, 2023Updated 3 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated last year
- [ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents☆58Feb 26, 2026Updated 3 weeks ago
- character recognition, textline recognition☆10Aug 31, 2019Updated 6 years ago
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening☆70May 18, 2025Updated 10 months ago
- ☆29Mar 24, 2025Updated 11 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆123Nov 25, 2024Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Nov 9, 2023Updated 2 years ago
- music generation with perceiver-ar model☆26Jul 20, 2022Updated 3 years ago
- ☆13Aug 24, 2023Updated 2 years ago
- TaiYiXLCheckpointLoader: An unoffical node support Taiyi-Diffusion-XL(Taiyi-XL) Chinese-English bilingual language model☆11Sep 1, 2024Updated last year
- ☆26Jun 25, 2021Updated 4 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆46Jul 22, 2025Updated 7 months ago
- Anything Model Bacth Downloader allows you to batch download models from civitai, hugging face easily just through model url.☆15Mar 19, 2023Updated 3 years ago
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- ☆28Jan 6, 2026Updated 2 months ago
- python library for reverse engineered Adobe Firefly API☆13Mar 31, 2023Updated 2 years ago
- This repository is home to a Unity project with 36 different shaders and 6 different particle systems to be tested all in the same scene …☆18Apr 15, 2024Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated 2 years ago
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- An Extension for Automatic1111 Webui that makes the interface easier to use on mobile (portrait)☆16Apr 16, 2024Updated last year
- Web page for "🍅HumanTOMATO: Text-aligned Whole-body Motion Generation".☆15May 25, 2024Updated last year
- HSTU-BLaIR: Lightweight Contrastive Text Embedding for Generative Recommender 🌱☆23Jul 4, 2025Updated 8 months ago
- ☆13Jul 10, 2024Updated last year
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated 2 months ago
- Created for this model trained by Gustavosta for Stable Diffusion to create a prompt from a few words. You can submit your own text or se…☆16Feb 13, 2023Updated 3 years ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆26May 31, 2025Updated 9 months ago
- Implementation about a recommender System using RQ-VAE Semantic IDs☆16Aug 11, 2025Updated 7 months ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆66Updated this week