apple / ml-ane-transformers
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
☆2,608Updated last year
Alternatives and similar repositories for ml-ane-transformers:
Users that are interested in ml-ane-transformers are comparing it to the libraries listed below
- Swift app demonstrating Core ML Stable Diffusion☆2,660Updated 9 months ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,186Updated 3 weeks ago
- Export Hugging Face models to Core ML and TensorFlow Lite☆653Updated 8 months ago
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆6,042Updated 7 months ago
- Run LLaMA (and Stanford-Alpaca) inference on Apple Silicon GPUs.☆585Updated 2 years ago
- Simple UI for LLM Model Finetuning☆2,063Updated last year
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,647Updated last year
- Tensor library for machine learning☆12,237Updated this week
- C++ implementation for BLOOM☆809Updated last year
- 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading☆9,541Updated 6 months ago
- An Extensible Deep Learning Library☆2,006Updated this week
- Llama 2 Everywhere (L2E)☆1,517Updated 2 months ago
- A language for constraint-guided and efficient LLM programming.☆3,873Updated 10 months ago
- ☆2,778Updated last week
- A Bulletproof Way to Generate Structured JSON from Language Models☆4,667Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,614Updated last year
- Quantized inference code for LLaMA models☆1,052Updated 2 years ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆6,842Updated 8 months ago
- Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)☆1,253Updated 3 months ago
- Chat with your favourite LLaMA models in a native macOS app☆1,499Updated last year
- Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)☆3,900Updated 9 months ago
- Drive a browser with GPT-3☆1,922Updated 9 months ago
- Running large language models on a single GPU for throughput-oriented scenarios.☆9,295Updated 5 months ago
- Numbers every LLM developer should know☆4,193Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,358Updated 9 months ago
- Apple AMX Instruction Set☆1,062Updated 3 months ago
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,690Updated 3 months ago
- Accessible large language models via k-bit quantization for PyTorch.☆6,876Updated this week
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆5,847Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,622Updated this week