193746 / VHASRLinks
☆10Updated 7 months ago
Alternatives and similar repositories for VHASR
Users that are interested in VHASR are comparing it to the libraries listed below
Sorting:
- ☆29Updated 10 months ago
- Our 2nd-gen LMM☆33Updated last year
- ☆48Updated last month
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 9 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 6 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- A collection of strong multimodal models for building multimodal AGI agents☆43Updated 11 months ago
- ☆27Updated 8 months ago
- ☆19Updated 5 months ago
- ☆56Updated 11 months ago
- Fast instruction tuning with Llama2☆11Updated last year
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Gene…☆23Updated last month
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆21Updated 4 months ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated 2 years ago
- Supervoice Speaker Separation Network☆12Updated last year
- An end to end ASR Transformer model training repo☆13Updated 3 years ago
- ☆57Updated last year
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Updated 2 years ago
- 【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语…☆14Updated 2 months ago
- 本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。☆13Updated last year
- 中文原生文生图测评基准☆9Updated 11 months ago
- Music large model based on InternLM2-chat.☆22Updated 6 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆62Updated 3 weeks ago
- Chinese CLIP models with SOTA performance.☆55Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆62Updated 7 months ago
- Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"☆15Updated 11 months ago
- The official implement of Freeze-Omni.☆13Updated 7 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- The official implement of VITA, VITA15, LongVITA, and VITA-Audio.☆32Updated last month