dthuerck / culip
Code for the culip ("CUda for Linear and Integer Programming") project, containing GPU primitives for linear algebra, linear optimization and (someday) integer optimization.
☆19Updated 6 years ago
Alternatives and similar repositories for culip:
Users that are interested in culip are comparing it to the libraries listed below
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system☆17Updated 2 years ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 3 years ago
- ☆32Updated 4 years ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 6 months ago
- Notes and artifacts from the ONNX steering committee☆26Updated last week
- Sparse symmetric indefinite solver implemented with a runtime system☆13Updated 4 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆59Updated 3 weeks ago
- SParse AcceleRation on Tensor Architecture☆17Updated last month
- Standalone commandline CLI tool for compiling Triton kernels☆18Updated 7 months ago
- CuPy Benchmark☆12Updated 6 years ago
- Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations☆14Updated 2 years ago
- Fork of Bliss☆13Updated 2 years ago
- Matrix Algebra on GPU and Multicore Architectures (MAGMA) source releases from http://icl.cs.utk.edu/magma/index.html☆23Updated 9 years ago
- An implementation of the revised simplex algorithm in CUDA for solving linear optimization problems in the form max{c*x | A*x=b, l<=x<=u}☆27Updated 8 years ago
- Solver for Unconstrained Binary Quadratic Optimization (UBQO, BQO, QUBO) and Max 2-SAT, based on semidefinite relaxation with constraint …☆15Updated 2 years ago
- An integer linear program solver using a Lagrange decomposition into binary decision diagrams. Lagrange multipliers are updated through d…☆59Updated 11 months ago
- MPI Code Generation through Domain-Specific Language Models☆13Updated 5 months ago
- ☆11Updated 3 years ago
- A collection of reproducible inference engine benchmarks☆29Updated 2 weeks ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- A compiler for BLOG probabilistic programming language☆26Updated 7 years ago
- Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser☆13Updated 4 years ago
- Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…☆28Updated 3 years ago
- ☆11Updated 3 months ago
- Input (scripts, etc.) and output (scripts, performance results, etc.) for Gunrock and other graph engines☆10Updated last year
- ☆13Updated 3 years ago