RahulSChand / llama2.c-for-dummies
Step by step explanation/tutorial of llama2.c
☆207Updated 11 months ago
Related projects: ⓘ
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆287Updated 3 months ago
- Easy and Efficient Quantization for Transformers☆172Updated 2 months ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆114Updated 8 months ago
- Efficient fine-tuning for ko-llm models☆184Updated 6 months ago
- Newsletter bot for 🤗 Daily Papers☆101Updated this week
- OSLO: Open Source for Large-scale Optimization☆172Updated last year
- The Universe of Evaluation. All about the evaluation for LLMs.☆205Updated 2 months ago
- evolve llm training instruction, from english instruction to any language.☆108Updated last year
- 1-Click is all you need.☆58Updated 4 months ago
- Extension of Langchain for RAG. Easy benchmarking, multiple retrievals, reranker, time-aware RAG, and so on...☆275Updated 8 months ago
- ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.☆110Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆342Updated 3 weeks ago
- manage histories of LLM applied applications☆86Updated 10 months ago
- A bagel, with everything.☆306Updated 5 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆163Updated 3 months ago
- Inference Llama/Llama2 Modes in NumPy☆18Updated 9 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆339Updated 6 months ago
- scalable and robust tree-based speculative decoding algorithm☆300Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)☆659Updated last week
- Train GEMMA on TPU/GPU! (Codebase for training Gemma-Ko Series)☆45Updated 6 months ago
- ☆170Updated this week
- A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that inc…☆10Updated 4 months ago
- Comparison of Language Model Inference Engines☆178Updated 2 weeks ago
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆44Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆106Updated 6 months ago
- Python bindings for ggml☆125Updated 2 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆258Updated 10 months ago
- A collection of all available inference solutions for the LLMs☆65Updated 2 weeks ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆47Updated 11 months ago