ISCS-ZJU / Decentralized-inference-based-on-vLLMLinks
☆30Updated 5 months ago
Alternatives and similar repositories for Decentralized-inference-based-on-vLLM
Users that are interested in Decentralized-inference-based-on-vLLM are comparing it to the libraries listed below
Sorting:
- Source code for CSWAP-CLUSTER'21 and CSWAP+-TPDS'22☆25Updated 3 years ago
- Source code for PHAST-TPDS'22☆29Updated last year
- ☆27Updated 3 months ago
- ☆31Updated 3 months ago
- Source code for XPGraph-MICRO'22☆26Updated 3 years ago
- Source code for AdaMBE-SC'24☆25Updated last year
- Source code for GMBE-SC'23☆35Updated 2 years ago
- ☆39Updated 6 months ago
- Source code for AMBEA-TC'24☆29Updated last year
- Source code for ChunkGraph-ATC'24☆28Updated last year
- Source code for CCLBTree-EuroSys'24☆42Updated last year
- Appling the asynchronous tensor swapping to PyTorch framework.☆30Updated 2 years ago
- Zplot demos☆21Updated 3 years ago
- Source code for NVAlloc-ASPLOS'22☆59Updated 3 years ago
- Source code for iCache-HPCA'23☆49Updated 2 years ago
- A persistent learned index for non-volatile memory with high read/write performance.☆18Updated 3 years ago
- ☆9Updated 7 months ago
- Nap - NUMA-Aware Persistent Indexes☆41Updated 4 years ago
- A collection of awesome researchers and papers about disaggregated memory.☆163Updated last month
- This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.☆36Updated last year
- for paper @ ASPLOS‘25’☆16Updated 4 months ago
- A low-latency, billion-scale, and updatable graph-based vector store on SSD.☆54Updated last week
- ☆18Updated 10 months ago
- GPU-accelerated vector query processing system that supports large vector datasets beyond GPU memory.☆29Updated last year
- Code for "Baleen: ML Admission & Prefetching for Flash Caches" (FAST 2024).☆26Updated last year
- ☆10Updated 6 months ago
- ☆72Updated 2 years ago
- WIPE implementation☆11Updated last year
- ☆18Updated 2 years ago
- ☆20Updated 2 years ago