LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
☆135Jun 10, 2023Updated 3 years ago
Alternatives and similar repositories for llama-server
Users that are interested in llama-server are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python bindings for llama.cpp☆68Feb 29, 2024Updated 2 years ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆38Jun 6, 2023Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 7 months ago
- Falcon LLM ggml framework with CPU and GPU support☆250Jan 22, 2024Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- IRIS: Demonstrator for use of LLMs in python (outdated)☆62Mar 23, 2025Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated 2 years ago
- A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI…☆594Jun 12, 2023Updated 3 years ago
- Official Repository for "Modeling Hierarchical Structures with Continuous Recursive Neural Networks" (ICML 2021)☆12Aug 18, 2021Updated 4 years ago
- Python scripts for AI voice changers☆14Apr 25, 2023Updated 3 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated 2 years ago
- Structural Pruning for LLaMA☆54May 20, 2023Updated 3 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper☆14Aug 9, 2021Updated 4 years ago
- Falcon7B + Falcon40B support - in branch falcon40b. Now all good and working. But main action now in https://github.com/cmp-nct/ggllm.cpp☆10Sep 30, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.☆11May 26, 2023Updated 3 years ago
- An embeddable widget for interacting with openAI api compatable LLM's☆15Sep 18, 2024Updated last year
- ☆26Jun 8, 2026Updated last week
- Plugin Allows loading of local llms into Auto-GPT☆12Apr 21, 2023Updated 3 years ago
- ☆16May 31, 2024Updated 2 years ago
- A OpenAI API compatible REST server for llama.☆207Feb 24, 2025Updated last year
- Chat²GPT is a ChatGPT (and DALL·E 2/3, and ElevenLabs) chat bot for Google Chat. 🤖💬☆11Feb 2, 2026Updated 4 months ago
- A rust implementation of Andrej Karpathy's Micrograd☆15Apr 28, 2025Updated last year
- ☆17Mar 11, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Control a Sphero Ollie with web bluetooth☆13Nov 7, 2016Updated 9 years ago
- Unsupervised muti-metric fusion for Full-Reference (FR) Image Quality Assessment (IQA)☆11Jul 11, 2014Updated 11 years ago
- Documentation site for fast-agent☆31May 10, 2026Updated last month
- [ACM MM 2025] Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors☆42Aug 14, 2025Updated 10 months ago
- A modern Craft CMS starter kit for agencies and developers — featuring Vite, Tailwind, Datastar, DDEV, MCP, LLM Ready.☆35Jun 9, 2026Updated last week
- Provides strongly-typed object model for eduHub Data Sets☆13Mar 11, 2026Updated 3 months ago
- ☆13Jan 20, 2022Updated 4 years ago
- ☆52Feb 5, 2025Updated last year
- A curated list of awesome neural radiance fields papers☆13Mar 11, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Realtime News and Information Eval☆19Mar 26, 2026Updated 2 months ago
- Temporary repository for Japanese☆11Apr 25, 2019Updated 7 years ago
- A Python package designed to simplify the process of creating and managing function calls to OpenAI's API, as well as models using LiteLL…☆17May 25, 2025Updated last year
- Web app for Sphero Bolt☆14Sep 28, 2019Updated 6 years ago
- For our ISSTA'23 paper ACETest: Automated Constraint Extraction for Testing Deep Learning Operators☆17Apr 28, 2026Updated last month
- 中文原生工业测评基准☆16Mar 21, 2024Updated 2 years ago
- Karpathy's llama2.c transpiled to MLX for Apple Silicon☆14Dec 28, 2023Updated 2 years ago