yigitkonur / llm-dataset-prepLinks
Python toolkit for preparing LLM fine-tuning datasets. Features category weighting, reservoir sampling, JSONL processing, and statistical analysis.
☆16Updated last year
Alternatives and similar repositories for llm-dataset-prep
Users that are interested in llm-dataset-prep are comparing it to the libraries listed below
Sorting:
- Density-based clustering for vector embeddings using HDBSCAN and cosine similarity. Features automatic parameter search, PCA, and quality…☆17Updated 2 weeks ago
- Serverless API Gateway☆70Updated last week
- Minimalist search engine for job applications (CVs)☆61Updated last year
- deduplication☆15Updated 2 years ago
- Summarize webpages from specified URLs using the LangChain framework and the ChatOllama model☆122Updated last year
- An application to demonstrate how can you make a RAG using pgvector and PostgreSQL☆28Updated last year
- Subtitle translation API using GPT. Dynamic context windows preserve conversational flow. Features auto-fallback to DeepL, concurrent pro…☆32Updated 2 weeks ago
- High-performance Rust CLI and library achieving 10K+ req/s for LLM APIs. Features weighted load-balancing, HTTP/2 pooling, and real-time …☆17Updated 2 weeks ago
- Bulk call automation tool using Telnyx and LLM-based transcription. Dials, plays audio, records, and transcribes hundreds of concurrent c…☆137Updated 2 weeks ago
- Gen AI based travel assistant for Turkish Airlines customers☆11Updated last year
- Tiny AI is a platform to create/modify AI powered chatbots. This repository contains ChatGPT plugin and API for talk, create and modify T…☆36Updated last year
- Muninn is a fast and flexible HTML parsing tool that simplifies the process of extracting data from HTMLs.☆145Updated 9 months ago
- Türkiye Teknoloji Takımı Vakfı - Yapay Zeka Usta Eğitimleri Serisi - Makine Öğreniminde Regresyon ve Sınıflandırma☆17Updated 5 years ago
- MCP Server for Turkish Government Tenders☆56Updated 2 weeks ago
- MCP Server for Searching Turkish Legislation☆110Updated last month
- Kubernetes logs to MongoDB☆16Updated 4 years ago
- Jumpstart Your Cursor AI Projects☆177Updated 10 months ago
- a lightweight and simple cli package☆12Updated 4 years ago
- 🌊 plugin.t4y.ai☆34Updated 2 years ago
- ☆176Updated 4 months ago
- ☆12Updated last year
- A Turkish Text-to-SQL Dataset☆12Updated 10 months ago
- Server load testing CLI tool 🏋️☆11Updated 2 years ago
- Turkish LM Tuner☆87Updated last year
- ☆27Updated 2 years ago
- ☆108Updated 2 years ago
- Bundled functions and classes for torch-based machine learning projects.☆14Updated last year
- ☆71Updated 5 months ago
- unofficial Tureng.com API☆81Updated 3 years ago
- Parser and data type for Turkish Citizenship ID numbers☆20Updated this week