experiments with inference on llama
☆103Jun 6, 2024Updated last year
Alternatives and similar repositories for llama-inference
Users that are interested in llama-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13May 25, 2023Updated 2 years ago
- ☆20Jan 27, 2024Updated 2 years ago
- Leverage your LangChain trace data for fine tuning☆46Aug 2, 2024Updated last year
- ☆16Aug 10, 2022Updated 3 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,687Oct 23, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆122Apr 22, 2024Updated last year
- ☆12Oct 25, 2023Updated 2 years ago
- Serving multiple LoRA finetuned LLM as one☆1,152May 8, 2024Updated last year
- 9th solution☆11Oct 11, 2022Updated 3 years ago
- ☆17Feb 19, 2024Updated 2 years ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- Test pytorch code with minimal computational overhead☆26Jun 8, 2023Updated 2 years ago
- batched loras☆351Sep 6, 2023Updated 2 years ago
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,909Jan 21, 2024Updated 2 years ago
- ☆605Aug 23, 2024Updated last year
- Project repository of the paper "Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning wi…☆35Mar 19, 2024Updated 2 years ago
- ☆125Mar 17, 2024Updated 2 years ago
- Mutual Information Predicts Hallucinations in Abstractive Summarization☆13Nov 14, 2022Updated 3 years ago
- ☆20Jul 12, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Merge Transformers language models by use of gradient parameters.☆214Aug 8, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,915Sep 30, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Experimenting text-embeddings-inference server on both CPU and GPU☆18Oct 25, 2023Updated 2 years ago
- ☆75Jul 2, 2021Updated 4 years ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,323May 11, 2025Updated 11 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,107Jun 30, 2025Updated 9 months ago
- ☆22Jan 5, 2024Updated 2 years ago
- Overview and tutorials of the LlamaIndex Library☆19Aug 7, 2023Updated 2 years ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Jan 21, 2025Updated last year
- Salesforce open-source LLMs with 8k sequence length.☆726Jan 31, 2025Updated last year
- Experiments on speculative sampling with Llama models☆128Jun 8, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆20Apr 12, 2024Updated 2 years ago
- ☆135Nov 24, 2023Updated 2 years ago
- The Energy Transformer block, in JAX☆64Dec 14, 2023Updated 2 years ago
- My solution for the ''LLM - Detect AI Generated Text'' kaggle competition☆16Feb 2, 2024Updated 2 years ago
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- Janus is an opensource IA for Star Citizen☆11Dec 23, 2023Updated 2 years ago
- Generate beautiful, testable documentation with Jupyter Notebooks☆21Jul 25, 2022Updated 3 years ago