isEmmanuelOlowe / llm-cost-estimator
Estimating hardware and cloud costs of LLMs and transformer projects
☆14Updated last year
Alternatives and similar repositories for llm-cost-estimator:
Users that are interested in llm-cost-estimator are comparing it to the libraries listed below
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated 10 months ago
- BH hackathon☆14Updated 11 months ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆28Updated 4 months ago
- The world's first fully automated VC fund.☆20Updated 2 weeks ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆18Updated 5 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆81Updated 3 weeks ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated 4 months ago
- ☆63Updated last week
- GPT-4 Level Conversational QA Trained In a Few Hours☆59Updated 7 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 10 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆20Updated 2 weeks ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- AI Multi-agent system for real-time, adaptive supply chain coordination and optimization leveraging responsive AI clusters.☆16Updated last year
- ☆46Updated 8 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆26Updated last week
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆23Updated last year
- Compression for Foundation Models☆30Updated last week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated 2 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated last week
- LLM reads a paper and produce a working prototype☆51Updated 2 weeks ago
- ☆45Updated 9 months ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 4 months ago
- Modified Beam Search with periodical restart☆12Updated 6 months ago
- ☆13Updated 9 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆45Updated last year
- Cascade Speculative Drafting☆29Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆111Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆111Updated 3 months ago