ashvardanian / cpp-cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
☆16Updated 3 weeks ago
Related projects: ⓘ
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 5 months ago
- GPU prices aggregator for cloud providers☆24Updated this week
- Vector Database with support for late interaction and token level embeddings.☆51Updated last week
- ☆18Updated this week
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- build your own vector database -- the littlest hnsw☆19Updated 9 months ago
- Tree-based indexes for neural-search☆28Updated 6 months ago
- A list of awesome resources and blogs on topics related to Unum☆30Updated 2 weeks ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆33Updated 5 months ago
- Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retriev…☆37Updated 8 months ago
- Triton backend for managing the model state tensors automatically in sequence batcher☆13Updated 7 months ago
- Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…☆14Updated 5 months ago
- NLP with Rust for Python 🦀🐍☆57Updated 3 months ago
- Efficient BM25 with DuckDB 🦆☆12Updated last week
- Inference Llama 2 in C++☆47Updated 4 months ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated last month
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year
- Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documen…☆17Updated last year
- Feste is a free and open-source framework allowing scalable composition of NLP tasks using a graph execution model that is optimized and …☆40Updated last year
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆15Updated last month
- Cortex-compatible model server for Python and TensorFlow☆16Updated last year
- Make triton easier☆39Updated 3 months ago
- A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL☆29Updated 2 years ago
- ☆10Updated 2 weeks ago
- Experiments with Model Training, Deployment & Monitoring☆36Updated 6 months ago
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆53Updated 3 weeks ago
- Build Agentic workflows with function calling☆16Updated 2 weeks ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- 🤝 Trade any tensors over the network☆30Updated 11 months ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆13Updated 8 months ago