aws-samples / fine-tune-qwen2-vl-with-llama-factory
☆11Updated last month
Alternatives and similar repositories for fine-tune-qwen2-vl-with-llama-factory:
Users that are interested in fine-tune-qwen2-vl-with-llama-factory are comparing it to the libraries listed below
- ☆10Updated 7 months ago
- vLLM Router☆17Updated 10 months ago
- Inference deployment of the llama3☆11Updated 9 months ago
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- ☆20Updated last week
- Stable Diffusion in TensorRT 8.5+☆14Updated last year
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆41Updated 7 months ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆41Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆45Updated last year
- Large Language Model Hosting Container☆80Updated last week
- ☆25Updated this week
- ☆25Updated last month
- Getting started with TensorRT-LLM using BLOOM as a case study☆13Updated 10 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆17Updated 2 weeks ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 2 months ago
- ☆12Updated last week
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆48Updated last year
- ☆49Updated 2 weeks ago
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆13Updated 11 months ago
- TensorRT LLM Benchmark Configuration☆12Updated 5 months ago
- Model compression for ONNX☆81Updated 2 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆46Updated 6 months ago
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆27Updated 2 months ago
- ☆21Updated 3 weeks ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated 10 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆16Updated 3 months ago
- Self-host LLMs with vLLM and BentoML☆79Updated this week
- HunyuanDiT with TensorRT and libtorch☆17Updated 7 months ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆26Updated last year
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆30Updated 2 months ago