dthuerck / culipLinks

Code for the culip ("CUda for Linear and Integer Programming") project, containing GPU primitives for linear algebra, linear optimization and (someday) integer optimization.

☆19

Alternatives and similar repositories for culip

Users that are interested in culip are comparing it to the libraries listed below

Sorting:

zhisbug / ray-scalable-ml-design
Some microbenchmarks and design docs before commencement
☆12Updated 4 years ago
facebookresearch / loop_nest
Loop Nest - Linear algebra compiler and code generator.
☆22Updated 2 years ago
eth-easl / mixtera
A lightweight, user-friendly data-plane for LLM training.
☆16Updated last month
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
vvvm23 / mezo-jax
JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"
☆19Updated last year
Timeroot / CLQO
Solver for Unconstrained Binary Quadratic Optimization (UBQO, BQO, QUBO) and Max 2-SAT, based on semidefinite relaxation with constraint …
☆15Updated 2 years ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 8 months ago
soumyadipghosh / eventgrad
Event-Triggered Communication in Parallel Machine Learning
☆28Updated 3 years ago
stellaraccident / mlir-py-release
☆11Updated 3 years ago
yandex-research / moshpit-sgd
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation
☆29Updated 4 months ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆31Updated last month
jiazhihao / sosp19ae
Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions
☆21Updated 3 years ago
facebookresearch / coocmap
code for paper "Accessing higher dimensions for unsupervised word translation"
☆21Updated last year
edwardjhu / improved_wasserstein
Code for our ICLR Trustworthy ML 2020 workshop paper "Improved Image Wasserstein Attacks and Defenses"
☆14Updated 5 years ago
Jokeren / triton-samples
☆28Updated 4 months ago
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 4 months ago
davidar / eigenGPT
Minimal C++ implementation of GPT2
☆40Updated last year
eth-easl / deltazip
Compression for Foundation Models
☆31Updated 2 months ago
cccntu / LoRAnanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆19Updated 2 years ago
facebookresearch / DLRM-FlexFlow
Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…
☆29Updated 3 years ago
HazyResearch / mongoose
A Learnable LSH Framework for Efficient NN Training
☆31Updated 3 years ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆60Updated last month
Zyphra / zcookbook
Training hybrid models for dummies.
☆21Updated 4 months ago
gunrock / io
Input (scripts, etc.) and output (scripts, performance results, etc.) for Gunrock and other graph engines
☆10Updated last year
nunoplopes / torchy
A tracing JIT compiler for PyTorch
☆13Updated 3 years ago
NolanoOrg / llama-int4-quant
☆26Updated 2 years ago
spcl / CheckEmbed
Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks"
☆19Updated this week
K-Wu / pytorch-direct
Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…
☆9Updated last year
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆142Updated 5 months ago