experiments with inference on llama
☆103Jun 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13May 25, 2023Updated 2 years ago
- ☆20Nov 23, 2022Updated 3 years ago
- ☆20Jan 27, 2024Updated 2 years ago
- Leverage your LangChain trace data for fine tuning☆46Aug 2, 2024Updated last year
- ☆16Aug 10, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,688Oct 23, 2024Updated last year
- ☆122Apr 22, 2024Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,148May 8, 2024Updated last year
- extensible collectives library in triton☆97Mar 31, 2025Updated 11 months ago
- ☆17Feb 19, 2024Updated 2 years ago
- Test pytorch code with minimal computational overhead☆26Jun 8, 2023Updated 2 years ago
- batched loras☆351Sep 6, 2023Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,905Jan 21, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆125Mar 17, 2024Updated 2 years ago
- ☆20Jul 12, 2023Updated 2 years ago
- This repo lets you run mistral-7b in Google Colab.☆16Oct 1, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- Merge Transformers language models by use of gradient parameters.☆214Aug 8, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,912Sep 30, 2023Updated 2 years ago
- Experimenting text-embeddings-inference server on both CPU and GPU☆18Oct 25, 2023Updated 2 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,105Jun 30, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A reference implementation of an end to end, open-source MLOps platform.☆15Nov 20, 2022Updated 3 years ago
- ☆22Jan 5, 2024Updated 2 years ago
- Overview and tutorials of the LlamaIndex Library☆19Aug 7, 2023Updated 2 years ago
- Salesforce open-source LLMs with 8k sequence length.☆726Jan 31, 2025Updated last year
- Experiments on speculative sampling with Llama models☆128Jun 8, 2023Updated 2 years ago
- ☆135Nov 24, 2023Updated 2 years ago
- My solution for the ''LLM - Detect AI Generated Text'' kaggle competition☆16Feb 2, 2024Updated 2 years ago
- Various transformers for FSDP research☆38Nov 11, 2022Updated 3 years ago
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Lyra V2 (SoundStream) running in the browser☆19Sep 20, 2023Updated 2 years ago
- Generate beautiful, testable documentation with Jupyter Notebooks☆21Jul 25, 2022Updated 3 years ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated this week
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆197May 6, 2024Updated last year
- ☆457Oct 15, 2023Updated 2 years ago
- ☆53Jan 18, 2024Updated 2 years ago
- Large Language Model Text Generation Inference☆10,815Mar 21, 2026Updated last week