brave-experiments / MELT-publicLinks
codebase for "MELTing Point: Mobile Evaluation of Language Transformers"
☆18Updated last year
Alternatives and similar repositories for MELT-public
Users that are interested in MELT-public are comparing it to the libraries listed below
Sorting:
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆28Updated last year
- (HotMobile'24) Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing☆17Updated last year
- An LLM inference engine, written in C++☆15Updated last month
- Awesome Mobile LLMs☆210Updated last week
- Simulation framework for accelerating research in Private Federated Learning☆330Updated last month
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆46Updated 5 months ago
- EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).☆65Updated last year
- Compression for Foundation Models☆33Updated 3 months ago
- Compressing Large Language Models using Low Precision and Low Rank Decomposition☆95Updated 7 months ago
- ☆57Updated 6 months ago
- Training hybrid models for dummies.☆25Updated 6 months ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆141Updated last year
- ☆45Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆49Updated this week
- How much energy do GenAI models consume?☆45Updated 2 months ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆21Updated 2 weeks ago
- ☆23Updated last year
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆50Updated 4 months ago
- ☆15Updated last year
- ☆66Updated last month
- Algorithms for approximate attention in LLMs☆18Updated 3 months ago
- ☆79Updated 8 months ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Updated 10 months ago
- [EMNLP 2024 Main] Virtual Personas for Language Models via an Anthology of Backstories☆29Updated 8 months ago
- Efficient LLM Inference Acceleration using Prompting☆48Updated 8 months ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆30Updated 8 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated 10 months ago
- Samples of good AI generated CUDA kernels☆84Updated last month
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆94Updated last year
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆171Updated 6 months ago