experiments with inference on llama
☆103Jun 6, 2024Updated 2 years ago
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14May 25, 2023Updated 3 years ago
- ☆20Nov 23, 2022Updated 3 years ago
- ☆20Jan 27, 2024Updated 2 years ago
- Leverage your LangChain trace data for fine tuning☆46Aug 2, 2024Updated last year
- ☆16Aug 10, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,688Oct 23, 2024Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,161May 8, 2024Updated 2 years ago
- Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text …☆22Feb 18, 2024Updated 2 years ago
- ☆17Feb 19, 2024Updated 2 years ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- Test pytorch code with minimal computational overhead☆26Jun 8, 2023Updated 3 years ago
- batched loras☆351Sep 6, 2023Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,912Jan 21, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆610Aug 23, 2024Updated last year
- ☆125Mar 17, 2024Updated 2 years ago
- This repo lets you run mistral-7b in Google Colab.☆16Oct 1, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Merge Transformers language models by use of gradient parameters.☆214Aug 8, 2024Updated last year
- Command-line script for inferencing from models such as LLaMA, in a chat scenario, with LoRA adaptations☆32Jun 1, 2023Updated 3 years ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,924Sep 30, 2023Updated 2 years ago
- Experimenting text-embeddings-inference server on both CPU and GPU☆18Oct 25, 2023Updated 2 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A reference implementation of an end to end, open-source MLOps platform.☆15Nov 20, 2022Updated 3 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,105Jun 30, 2025Updated 11 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,343May 11, 2025Updated last year
- ☆22Jan 5, 2024Updated 2 years ago
- Non-local Modeling for Image Quality Assessment☆13Dec 20, 2023Updated 2 years ago
- Salesforce open-source LLMs with 8k sequence length.☆727Jun 2, 2026Updated 2 weeks ago
- ☆68Mar 28, 2025Updated last year
- Experiments on speculative sampling with Llama models☆129Jun 8, 2023Updated 3 years ago
- ☆134Nov 24, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Various transformers for FSDP research☆38Nov 11, 2022Updated 3 years ago
- GPTQ inference Triton kernel☆322May 18, 2023Updated 3 years ago
- Janus is an opensource IA for Star Citizen☆11Dec 23, 2023Updated 2 years ago
- Lyra V2 (SoundStream) running in the browser☆19Sep 20, 2023Updated 2 years ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆197May 6, 2024Updated 2 years ago
- link manager, comic style☆24Nov 11, 2024Updated last year
- ☆456Oct 15, 2023Updated 2 years ago