seth-lu / Im2win
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Im2win
- Explore training for quantized models☆10Updated last week
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated last week
- Code for High-Capacity Expert Binary Networks (ICLR 2021).☆27Updated 2 years ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆27Updated 3 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆42Updated 2 weeks ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- ACL 2023☆38Updated last year
- Dynamic Neural Architecture Search Toolkit☆29Updated 5 months ago
- ☆47Updated 2 months ago
- Triton kernels for Flux☆17Updated last week
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆46Updated 2 years ago
- MLPerf™ Mobile models☆24Updated last month
- ☆11Updated 3 years ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated last month
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆19Updated 6 months ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆12Updated 3 years ago
- ☆14Updated last month
- ☆11Updated this week
- Arch-Net: Model Distillation for Architecture Agnostic Model Deployment☆22Updated 3 years ago
- Describing How to Enable OpenVINO Execution Provider for ONNX Runtime☆19Updated 4 years ago
- benchmarking some transformer deployments☆26Updated last year
- Code for paper: "Privately generating tabular data using language models".☆14Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆50Updated this week
- ☆15Updated last year
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year