☆72Apr 20, 2026Updated 2 weeks ago
Alternatives and similar repositories for dolma3
Users that are interested in dolma3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data mapping framework for rust stuff☆51Mar 25, 2026Updated last month
- Tooling for exact and MinHash deduplication of large-scale text datasets☆81Mar 24, 2026Updated last month
- decontamination☆30Mar 4, 2026Updated 2 months ago
- Code and data for "An Accurate Unsupervised Method for Joint Entity Alignment and Dangling Entity Detection".☆15Mar 26, 2022Updated 4 years ago
- ☆17Apr 29, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is the OpenSky Daemon (OpenSkyD), which serves as an intermediate component between an ADSB receiver and the OpenSky Network, relayi…☆10Dec 22, 2021Updated 4 years ago
- Official Implementation of wd1☆29Sep 25, 2025Updated 7 months ago
- ProofNet dataset ported into Lean 4☆30Jun 9, 2025Updated 11 months ago
- Fork of Flame repo for training of some new stuff in development☆19Apr 24, 2026Updated 2 weeks ago
- ☆21Jun 4, 2025Updated 11 months ago
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Jun 4, 2025Updated 11 months ago
- ☆10Jun 29, 2021Updated 4 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆110Jul 15, 2025Updated 9 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Analysis on stop reasons☆10Jun 17, 2024Updated last year
- ☆74Apr 15, 2026Updated 3 weeks ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆66Jan 26, 2026Updated 3 months ago
- Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?☆43Jul 26, 2025Updated 9 months ago
- ☆28Sep 22, 2025Updated 7 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- Run all the tests at the same time with modal.com☆11Mar 2, 2024Updated 2 years ago
- This repository contains a series of 4 jupyter notebooks demonstrating how AWS AI Services like Amazon Rekognition, Amazon Transcribe and…☆12Nov 26, 2021Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated 3 months ago
- ☆15May 11, 2025Updated 11 months ago
- This repository contains statistics about the AI Infrastructure products.☆17Feb 27, 2025Updated last year
- Directional diffusion models☆43Oct 31, 2024Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- Scalable PCA (sPCA) is a scalable implementation of Principal component analysis algorithm on top of Spark☆12May 12, 2015Updated 10 years ago
- ☆27Mar 4, 2025Updated last year
- ☆70Apr 24, 2026Updated 2 weeks ago
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆29Apr 17, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Smart (Tesla) charging with a Tesla Wall Connector (and some Python code)☆24Jun 19, 2018Updated 7 years ago
- CWTS OpenAlex ETL data pipeline.☆20Oct 29, 2025Updated 6 months ago
- A set of distinct value estimators that give probabilistic bounds on a sets cardinality☆22Dec 9, 2019Updated 6 years ago
- Visualize LIDAR data on Vision Pro☆32May 27, 2024Updated last year
- a large-scale graph database created as a combination of multiple taxonomy backbones extracted from 5 existing knowledge graphs, namely: …☆14Jan 23, 2024Updated 2 years ago
- ☆11Nov 16, 2022Updated 3 years ago
- Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, Kafka Stream API and Hazelcast Jet☆10Apr 3, 2024Updated 2 years ago