A scalable data preprocessing framework built on PySpark for LLM training
☆24Dec 9, 2025Updated 4 months ago
Alternatives and similar repositories for Kaiyuan-Spark
Users that are interested in Kaiyuan-Spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python interface and preprocessing pipeline for the BBBC021 dataset of cellular images☆14Sep 19, 2021Updated 4 years ago
- A free lunch for LLMs recognition images☆10Apr 29, 2025Updated 11 months ago
- ☆17Feb 11, 2023Updated 3 years ago
- ☆12Mar 14, 2025Updated last year
- SAS and Python code examples for use with SAS Viya Workbench.☆11Jun 26, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- UMM-Discovery is a fully unsupervised deep learning method to cluster cellular images with similar phenotypes together, solely based on t…☆11Nov 4, 2020Updated 5 years ago
- 人脸识别的项目,基于arcface算法,使用CelebA和LFW数据集。☆14Nov 4, 2021Updated 4 years ago
- ☆39Jan 16, 2026Updated 3 months ago
- 一个Bilibili的弹幕屏蔽规则☆11Jul 11, 2024Updated last year
- Follow nginx log, and find out bad guys!☆24Mar 7, 2026Updated last month
- Jupyter Notebooks for the Image Data Resource☆19Feb 20, 2026Updated last month
- LibGDX and Web port of the awesome Pixel Dungeon (1.9.2a)☆20Oct 16, 2022Updated 3 years ago
- Saving tenhou paifu (replays) with key information. 天鳳牌譜記錄器。☆13Feb 5, 2025Updated last year
- A minimal example of Abductive Learning☆19Dec 6, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Documentation for Digital Design course☆21Jun 10, 2025Updated 10 months ago
- Code for "YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Ass…☆20Nov 5, 2025Updated 5 months ago
- Full State Quantum Circuit Simulation Beyond Memory Limit☆16Aug 5, 2024Updated last year
- [ICCV2025] Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning☆24Nov 13, 2025Updated 5 months ago
- Exemplar Guided Deep Neural Network for Spatial Transcriptomics Analysis of Gene Expression Prediction☆20Oct 20, 2023Updated 2 years ago
- laboratory assignments of cs143-Compilers☆18Jun 8, 2021Updated 4 years ago
- Llama 2 Everywhere (L2E)☆14Jun 26, 2024Updated last year
- Docker file templates for GZCTF, including crypto, pwn, web.☆21Aug 10, 2023Updated 2 years ago
- ⚡ Triton implementation of Clifford algebra neural networks.☆38Oct 24, 2025Updated 5 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ⚡Japanese sentence splitting(日本語文境界判定器), 40–250× faster via a Rust-accelerated Python library with near-perfect API compatibility with …☆72Oct 14, 2025Updated 6 months ago
- Optimizing the Cell Painting assay for image-based profiling☆21Aug 11, 2025Updated 8 months ago
- ☆27Jan 8, 2024Updated 2 years ago
- The official implementation of the AAAI 2024 paper Bi-ViT.☆13Dec 18, 2023Updated 2 years ago
- WS-DINO: a novel framework to use weak label information in a self-supervised setting to learn phenotypic representations from high-conte…☆23Mar 28, 2023Updated 3 years ago
- Deep learning examples for the Instant Super Computer☆20Jan 28, 2026Updated 2 months ago
- Computer vision course experiment☆13Feb 14, 2020Updated 6 years ago
- Code for Kolmogorov-Arnold Network for Quantum Architecture Search i.e., KANQAS☆19Jan 9, 2025Updated last year
- Repository for RuDiK, a system for discovering declarative logical rules over RDF Knowledge Bases☆27May 8, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A framework to compress classical machine learning model during training by quantum machine learning☆19Aug 1, 2024Updated last year
- Official PyTorch implementation of QwT—“Quantization without Tears” (CVPR 2025): fast, accurate, and hassle-free post-training network qu…☆33Sep 30, 2025Updated 6 months ago
- A Python OOP CLI+GUI interface with the object database in the mobile game Gakuen Idolm@ster.☆32Apr 10, 2026Updated last week
- [仅napcat] 让QQ的野生bot也能发送按钮!☆27Mar 17, 2026Updated last month
- An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3☆30May 30, 2021Updated 4 years ago
- Demos for Volcengine's OpenAPIs☆19Feb 25, 2026Updated last month
- Some minimal implementation of some Diffusion Models. Try to use as less code and as simple arch as possible☆21Jan 10, 2025Updated last year