chncwang / InsNetLinks

InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.

☆67

Alternatives and similar repositories for InsNet

Users that are interested in InsNet are comparing it to the libraries listed below

Sorting:

qsyao / cudaBERT
A Fast Muti-processing BERT-Inference System
☆101Updated 3 years ago
keithyin / read-pytorch-source-code
pytorch源码阅读 0.2.0 版本
☆91Updated 5 years ago
Oneflow-Inc / OneFlow-Benchmark
OneFlow models for benchmarking.
☆104Updated last year
YellowOldOdd / SDBI
Simple Dynamic Batching Inference
☆145Updated 3 years ago
tvmai / meetup-slides
Place for meetup slides
☆140Updated 5 years ago
Oneflow-Inc / DLPerf
DeepLearning Framework Performance Profiling Toolkit
☆294Updated 3 years ago
LiebingYu / tinyflow
A simple deep learning framework that supports automatic differentiation and GPU acceleration.
☆59Updated 2 years ago
starmee / AI-Notes
My learning notes about AI, including Machine Learning and Deep Learning.
☆18Updated 6 years ago
dlsys-course / tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
☆124Updated 8 years ago
bytedance / effective_transformer
Running BERT without Padding
☆475Updated 3 years ago
HadXu / Thunder
A small deep-learning framework with C++/Python/CUDA
☆54Updated 7 years ago
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆134Updated 2 years ago
LeeJuly30 / BERTCpp
implement bert in pure c++
☆36Updated 5 years ago
mrcat2018 / AutodiffEngine
AutodiffEngine
☆13Updated 6 years ago
Tencent / WeChat-TFCC
☆127Updated 4 years ago
dlsys-course / assignment2-2018
(Spring 2018) Assignment 2: Graph Executor with TVM
☆124Updated 7 years ago
sebgao / cTensor
A super light-weight deep learning library based on NumPy in PyTorch fashion.
☆94Updated 4 years ago
uwsampl / dtr-prototype
Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616
☆132Updated 2 years ago
Harry-Chen / InfMoE
Inference framework for MoE layers based on TensorRT with Python binding
☆41Updated 4 years ago
PaddlePaddle / CINN
Compiler Infrastructure for Neural Networks
☆147Updated 2 years ago
zakheav / automatic-differentiation-framework
an automatic differentiation framework with dynamic graph/支持动态图的自动求导框架
☆100Updated 5 years ago
pigirons / sgemm_hsw
This is an implementation of sgemm_kernel on L1d cache.
☆231Updated last year
tengkz / tensorflow_notes
tensorflow源码阅读笔记
☆192Updated 7 years ago
Oneflow-Inc / oneflow-documentation
oneflow documentation
☆69Updated last year
Oneflow-Inc / models
Models and examples built with OneFlow
☆100Updated last year
msnh2012 / XNet
Simple CuDNN wrapper
☆30Updated 9 years ago
BBuf / how-to-optimize-gemm
☆98Updated 4 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆187Updated 6 years ago
Bruce-Lee-LY / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆42Updated 8 months ago
Oneflow-Inc / conda-env
☆12Updated 2 years ago