experiments with inference on llama
☆103Jun 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20Nov 23, 2022Updated 3 years ago
- ☆20Jan 27, 2024Updated 2 years ago
- ☆16Aug 10, 2022Updated 3 years ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,685Oct 23, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆12Oct 25, 2023Updated 2 years ago
- ☆16Apr 21, 2022Updated 4 years ago
- Serving multiple LoRA finetuned LLM as one☆1,160May 8, 2024Updated 2 years ago
- 9th solution☆11Oct 11, 2022Updated 3 years ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- ☆17Feb 19, 2024Updated 2 years ago
- Test pytorch code with minimal computational overhead☆26Jun 8, 2023Updated 2 years ago
- batched loras☆351Sep 6, 2023Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,913Jan 21, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆607Aug 23, 2024Updated last year
- Project repository of the paper "Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning wi…☆35Mar 19, 2024Updated 2 years ago
- ☆125Mar 17, 2024Updated 2 years ago
- Mutual Information Predicts Hallucinations in Abstractive Summarization☆13Nov 14, 2022Updated 3 years ago
- This repo lets you run mistral-7b in Google Colab.☆16Oct 1, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Merge Transformers language models by use of gradient parameters.☆214Aug 8, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,922Sep 30, 2023Updated 2 years ago
- Experimenting text-embeddings-inference server on both CPU and GPU☆18Oct 25, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆75Jul 2, 2021Updated 4 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,109Jun 30, 2025Updated 10 months ago
- A reference implementation of an end to end, open-source MLOps platform.☆15Nov 20, 2022Updated 3 years ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,338May 11, 2025Updated last year
- ComfyUI custom nodes for HappyHorse 1.0 — Alibaba's #1 AI video model with native 1080p, integrated audio, text-to-video, image-to-video,…☆28Apr 28, 2026Updated last month
- Non-local Modeling for Image Quality Assessment☆13Dec 20, 2023Updated 2 years ago
- ☆67Mar 28, 2025Updated last year
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- Salesforce open-source LLMs with 8k sequence length.☆727Jan 31, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Experiments on speculative sampling with Llama models☆128Jun 8, 2023Updated 2 years ago
- ☆20Apr 12, 2024Updated 2 years ago
- ☆134Nov 24, 2023Updated 2 years ago
- My solution for the ''LLM - Detect AI Generated Text'' kaggle competition☆14Feb 2, 2024Updated 2 years ago
- Various transformers for FSDP research☆38Nov 11, 2022Updated 3 years ago
- GPTQ inference Triton kernel☆322May 18, 2023Updated 3 years ago
- Janus is an opensource IA for Star Citizen☆11Dec 23, 2023Updated 2 years ago