Preemo-Inc / text-generation-inference
β201Updated 7 months ago
Related projects: β
- This is our own implementation of 'Layer Selective Rank Reduction'β230Updated 3 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β129Updated last month
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β81Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ124Updated this week
- experiments with inference on llamaβ106Updated 3 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ295Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ217Updated 2 months ago
- β75Updated 3 weeks ago
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ236Updated last week
- β58Updated 3 weeks ago
- β89Updated 11 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β192Updated 4 months ago
- Tutorial for building LLM routerβ144Updated 2 months ago
- Low-Rank adapter extraction for fine-tuned transformers modelβ154Updated 4 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hubβ154Updated 11 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β139Updated 11 months ago
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β180Updated 8 months ago
- A bagel, with everything.β306Updated 5 months ago
- Let's build better datasets, together!β195Updated last month
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAIβ223Updated 4 months ago
- Small finetuned LLMs for a diverse set of useful tasksβ119Updated last year
- β64Updated 3 months ago
- β56Updated this week
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ217Updated 6 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRAβ123Updated last year
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 linesβ195Updated 4 months ago
- β203Updated 2 months ago
- Tune MPTsβ84Updated last year
- A simple Python sandbox for helpful LLM data agentsβ143Updated 3 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ93Updated 5 months ago