A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.
☆13Sep 2, 2024Updated last year
Alternatives and similar repositories for tinyllama
Users that are interested in tinyllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Dec 19, 2024Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 11 months ago
- ☆16Oct 16, 2024Updated last year
- ☆22Dec 1, 2021Updated 4 years ago
- Finetune GPT2 for text summarization☆17Aug 16, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- RADLADS training code☆37May 7, 2025Updated 10 months ago
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆33Jul 1, 2025Updated 8 months ago
- Code for Pushdown Layers from our EMNLP 2023 paper☆29Dec 3, 2023Updated 2 years ago
- Code for Repl4NLP paper "A Cross-Task Analysis of Text Span Representations"☆21Nov 4, 2022Updated 3 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- A PyTorch wrapper of parallel exclusive scan in CUDA☆12May 25, 2023Updated 2 years ago
- Cluster doctor skills☆14Feb 20, 2026Updated last month
- [NeurIPS 2024] Official implementation of NeurIPS 2024 paepr "Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory …☆26Feb 24, 2025Updated last year
- ☆15Aug 19, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22May 24, 2023Updated 2 years ago
- ☆51Jan 28, 2024Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year
- Fork of NACA from Google Code☆13Feb 25, 2010Updated 16 years ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- A script to reorganize 'Want to go' Saved places in Google Maps into separate lists by category.☆11May 14, 2024Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆98Oct 27, 2025Updated 4 months ago
- A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…☆19Jan 11, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆20Updated this week
- Tokenflood is a load testing framework for simulating arbitary loads on instruction-tuned LLMs☆44Updated this week
- Yad2 smart scraper with a minimal setup☆17Jun 18, 2023Updated 2 years ago
- Test Orchestrator for Performance and Scalability of AI pLatforms☆16Mar 20, 2026Updated last week
- Convert any office files to pdf format☆10May 31, 2024Updated last year
- Exercise to implement DDD with CQRS using Dapr for Pub-Sub.☆13Jul 28, 2021Updated 4 years ago
- Variant optimization autoscaler for distributed inference workloads☆34Mar 19, 2026Updated last week
- Commands that will make you more comfortable with the ROCm toolkit.☆18Aug 1, 2024Updated last year
- Code for Max-Margin Contrastive Learning - AAAI 2022☆17Apr 25, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆72Jan 13, 2026Updated 2 months ago
- Run code-llama with 50k tokens using flash attention and better transformer☆12Nov 21, 2023Updated 2 years ago
- Predict the performance of LLM inference services☆23Sep 18, 2025Updated 6 months ago
- Vocabulary Parallelism☆25Mar 10, 2025Updated last year
- ☆100Mar 15, 2026Updated last week
- This repository contains the code and data download links to reproduce building the WDC Products Benchmark.☆15Jul 13, 2023Updated 2 years ago
- ☆11Dec 3, 2020Updated 5 years ago