193746 / VHASRLinks
☆10Updated 7 months ago
Alternatives and similar repositories for VHASR
Users that are interested in VHASR are comparing it to the libraries listed below
Sorting:
- Our 2nd-gen LMM☆33Updated last year
- ☆29Updated 9 months ago
- ☆56Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 6 months ago
- Fast instruction tuning with Llama2☆11Updated last year
- flow mirror models from JZX AI Labs☆45Updated 8 months ago
- ☆27Updated 7 months ago
- ☆56Updated 10 months ago
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Gene…☆22Updated last month
- 国内外数据竞赛资讯整理☆18Updated 3 years ago
- The official repository for the RealSyn dataset☆34Updated last month
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- ☆18Updated 4 months ago
- Whisper in TensorRT-LLM☆15Updated last year
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- Bert TensorRT模型加速部署☆9Updated 3 years ago
- A Token-level Text Image Foundation Model for Document Understanding☆92Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 7 months ago
- ☆25Updated 9 months ago
- Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"☆14Updated 10 months ago
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Updated 2 years ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆45Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- 大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标☆17Updated 8 months ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆42Updated last year
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆100Updated 2 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- 本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。☆13Updated last year