experiments with inference on llama
☆103Jun 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13May 25, 2023Updated 2 years ago
- ☆20Jan 27, 2024Updated 2 years ago
- Leverage your LangChain trace data for fine tuning☆46Aug 2, 2024Updated last year
- ☆16Aug 10, 2022Updated 3 years ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,687Oct 23, 2024Updated last year
- ☆122Apr 22, 2024Updated 2 years ago
- Serving multiple LoRA finetuned LLM as one☆1,156May 8, 2024Updated 2 years ago
- 9th solution☆11Oct 11, 2022Updated 3 years ago
- A block oriented training approach for inference time optimization.☆34Aug 19, 2024Updated last year
- Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text …☆22Feb 18, 2024Updated 2 years ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- Test pytorch code with minimal computational overhead☆26Jun 8, 2023Updated 2 years ago
- batched loras☆351Sep 6, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,909Jan 21, 2024Updated 2 years ago
- ☆606Aug 23, 2024Updated last year
- Project repository of the paper "Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning wi…☆35Mar 19, 2024Updated 2 years ago
- ☆125Mar 17, 2024Updated 2 years ago
- This repo lets you run mistral-7b in Google Colab.☆16Oct 1, 2023Updated 2 years ago
- ☆20Jul 12, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Merge Transformers language models by use of gradient parameters.☆214Aug 8, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Command-line script for inferencing from models such as LLaMA, in a chat scenario, with LoRA adaptations☆33Jun 1, 2023Updated 2 years ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,915Sep 30, 2023Updated 2 years ago
- Experimenting text-embeddings-inference server on both CPU and GPU☆18Oct 25, 2023Updated 2 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- ☆22Sep 11, 2023Updated 2 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,110Jun 30, 2025Updated 10 months ago
- A reference implementation of an end to end, open-source MLOps platform.☆15Nov 20, 2022Updated 3 years ago
- ☆22Jan 5, 2024Updated 2 years ago
- Non-local Modeling for Image Quality Assessment☆13Dec 20, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- Salesforce open-source LLMs with 8k sequence length.☆727Jan 31, 2025Updated last year
- Experiments on speculative sampling with Llama models☆128Jun 8, 2023Updated 2 years ago