clabrugere / scratch-llm
Implements a LLM similar to Meta's Llama 2 from the ground up in PyTorch, for educational purposes.
☆34Updated 2 months ago
Alternatives and similar repositories for scratch-llm:
Users that are interested in scratch-llm are comparing it to the libraries listed below
- Article about deploying machine learning models using grpc, pytorch and asyncio☆28Updated 2 years ago
- Inference Llama 2 in C++☆44Updated 11 months ago
- Microsoft Phi 2 Streamlit App, deployed on HuggingFace Spaces is based on the Microsoft Phi 2 small language model (SLM) for text generat…☆14Updated 11 months ago
- Scripts for text classification with llama and bert☆14Updated 3 weeks ago
- Tiny C++11 GPT-2 inference implementation from scratch☆58Updated 2 weeks ago
- Fine-Tuning LLM and embedding models☆27Updated last year
- Benchmarking PyTorch 2.0 different models☆21Updated 2 years ago
- GGUF parser in Python☆26Updated 8 months ago
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆21Updated 2 months ago
- Finetuning BLOOM on a single GPU using gradient-accumulation☆30Updated 2 years ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated 2 months ago
- Inference Llama 2 in one file of pure C++☆83Updated last year
- llama.cpp to PyTorch Converter☆33Updated last year
- several types of attention modules written in PyTorch for learning purposes☆50Updated 6 months ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- Make triton easier☆47Updated 10 months ago
- ML/DL Math and Method notes☆60Updated last year
- ☆18Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆111Updated last year
- minimal LLM scripts for 24GB VRAM GPUs. training, inference, whatever☆38Updated last month
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆146Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Benchmarking vision language vision on face tasks☆12Updated 3 weeks ago
- ☆17Updated 2 months ago
- Create a source of truth for ML model results and browse it on Papers with Code☆30Updated 3 years ago
- ☆35Updated last year
- Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inferen…☆19Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 4 months ago
- Digest AI is a powerful model analysis tool that extracts insights from your models.☆19Updated last month