jorahn / llama-int8
Quantized inference code for LLaMA models
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llama-int8
- A library for incremental loading of large PyTorch checkpoints☆56Updated last year
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated 9 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- ☆27Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 8 months ago
- Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…☆16Updated 8 months ago
- Simple LLM inference server☆17Updated 5 months ago
- Experimental sampler to make LLMs more creative☆30Updated last year
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- ☆26Updated last year
- ☆21Updated 5 months ago
- Command-line script for inferencing from models such as falcon-7b-instruct☆75Updated last year
- Jupyter Notebooks and an R Notebook for encoding Pokémon embeddings and creating data visualizations.☆16Updated 4 months ago
- ☆34Updated last year
- RWKV-7: Surpassing GPT☆43Updated this week
- ☆19Updated last year
- Image Generation API Server - Similar to https://text-generator.io but for images☆46Updated 2 months ago
- ☆49Updated 8 months ago
- Latent Large Language Models☆16Updated 2 months ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆15Updated 2 weeks ago
- Training hybrid models for dummies.☆15Updated 2 weeks ago
- ☆20Updated 3 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆15Updated this week
- Embeddings focused small version of Llama NLP model☆102Updated last year
- ☆32Updated last year
- Merge LLM that are split in to parts☆25Updated last year
- LLM sampling method for enforcing syntax adherence in generated output☆21Updated last year
- GPT2 Byte Pair Encoding implementation in Golang☆24Updated last week