seth-lu / Im2winLinks
☆14Updated 2 years ago
Alternatives and similar repositories for Im2win
Users that are interested in Im2win are comparing it to the libraries listed below
Sorting:
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆51Updated last week
- [ICLR 2024] Dynamic Sparse Training with Structured Sparsity☆18Updated last year
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆29Updated 11 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated 3 weeks ago
- Dynamic Neural Architecture Search Toolkit☆30Updated 7 months ago
- ☆21Updated 2 years ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆72Updated this week
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆30Updated last year
- KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆15Updated 2 months ago
- OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM☆47Updated 9 months ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated 2 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆42Updated last year
- ☆11Updated last year
- A tool convert TensorRT engine/plan to a fake onnx☆40Updated 2 years ago
- ☆33Updated last month
- ☆16Updated 2 years ago
- A block oriented training approach for inference time optimization.☆33Updated 10 months ago
- ACL 2023☆39Updated 2 years ago
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆46Updated 3 years ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆19Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 9 months ago
- ☆77Updated 5 months ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆51Updated 2 years ago
- Code for paper "ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection" (MobiSys'23)☆13Updated last year
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Updated 3 years ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆38Updated last month
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆24Updated last year
- Awesome code, projects, books, etc. related to CUDA☆19Updated this week
- An object detection codebase based on MegEngine.☆28Updated 2 years ago
- This project is the official implementation of our accepted IEEE TPAMI paper Diverse Sample Generation: Pushing the Limit of Data-free Qu…☆14Updated 2 years ago