JerryYin777 / PaperHelper
PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References
☆9Updated 3 months ago
Related projects: ⓘ
- GIFT: Generative Interpretable Fine-Tuning☆17Updated 2 months ago
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆15Updated 6 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆21Updated 3 months ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆13Updated 8 months ago
- Tutorials to GPU programming. Reading notes.☆10Updated last year
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Updated last year
- Adaptive neighbor sampling for temporal GNN☆10Updated 6 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆29Updated 6 months ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆19Updated 4 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning☆25Updated last month
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆37Updated 2 months ago
- This repository contains code for the MicroAdam paper.☆9Updated 2 months ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Updated 2 years ago
- TensorRT LLM Benchmark Configuration☆10Updated last month
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆25Updated last month
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆14Updated this week
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks.☆12Updated 3 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆42Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆20Updated 2 months ago
- ☆29Updated 4 months ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 6 months ago
- ☆16Updated this week
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆9Updated 8 months ago
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆25Updated 6 months ago
- RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆13Updated 2 months ago
- AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION☆10Updated 10 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆27Updated last week
- Efficient and Online Dataset Growth Algorithm (with cleanness and diversity awareness) to deal with growing web data☆19Updated last month
- 研究生课《网络大数据管理理论和应用》大作业项目代码☆10Updated last year