mrmps/ai-chunker

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mrmps/ai-chunker)

mrmps / ai-chunker

Chunk your text using gpt4o-mini more accurately

☆44

Alternatives and similar repositories for ai-chunker

Users that are interested in ai-chunker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nreimers / beir-sparta
View on GitHub
Re-Implementation of SPARTA model
☆13Oct 1, 2021Updated 4 years ago
mixedbread-ai / binary-embeddings
View on GitHub
Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…
☆19Mar 23, 2024Updated 2 years ago
takara-ai / SwarmFormer
View on GitHub
A pytorch implementation of SwarmFormer for text classification.
☆16Feb 28, 2026Updated 4 months ago
IlyasMoutawwakil / py-txi
View on GitHub
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆32Sep 19, 2025Updated 10 months ago
transferwise / wise-topic
View on GitHub
LLM-only topic extraction and classification
☆11Jun 3, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
RasaHQ / rasa-3.x-component-examples
View on GitHub
A basic Rasa project with Custom Components
☆10Jan 27, 2022Updated 4 years ago
tmalsburg / llm_surprisal
View on GitHub
Simple tool for generating tokens with open source transformers and/or calculate per-token surprisal.
☆14Jul 10, 2026Updated 2 weeks ago
nstawfik / MedSentEval
View on GitHub
☆11Nov 19, 2020Updated 5 years ago
GusLovesMath / Llama3_MacSilicon
View on GitHub
Repository for running LLMs efficiently on Mac silicon (M1, M2, M3). Features Jupyter notebook for Meta-Llama-3 setup using MLX framework…
☆11May 4, 2024Updated 2 years ago
louisbrulenaudet / ragoon
View on GitHub
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆70Nov 17, 2025Updated 8 months ago
firecrawl / OpenManus
View on GitHub
No fortress, purely open ground. OpenManus is Coming.
☆18Mar 18, 2025Updated last year
yang-zhang / labse-pytorch
View on GitHub
Language-agnostic BERT Sentence Embedding (LaBSE) Pytorch Model
☆21Sep 2, 2020Updated 5 years ago
UKPLab / gpl
View on GitHub
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …
☆343Jul 6, 2023Updated 3 years ago
marqo-ai / GCL
View on GitHub
Generalised Contrastive Learning. This is a Repository for Google Shopping Dataset and Benchmarks followed by our novel fine-grained cont…
☆76Jul 17, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
mixedbread-ai / wiki_demo_app
View on GitHub
☆14Jun 25, 2024Updated 2 years ago
krypticmouse / dspy-docs
View on GitHub
Official Documentation for DSPy Library
☆25Updated this week
RasaHQ / starter-pack-intentless-policy
View on GitHub
☆12Jul 11, 2023Updated 3 years ago
GreycLab / gmic-py
View on GitHub
Python binding for the G'MIC Image Processing Framework
☆11Nov 14, 2025Updated 8 months ago
felipebravom / EmoInt
View on GitHub
Scripts for WASSA-2017 Shared Task on Emotion Intensity
☆14Oct 4, 2017Updated 8 years ago
favreau / Brayns
View on GitHub
Visualizer for large-scale and interactive ray-tracing of neurons
☆10Jan 25, 2022Updated 4 years ago
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
Lhx94As / E2E-language-diarization
View on GitHub
Source code of paper <End-to-End Language Diarization for Bilingual Code-switching Speech>
☆19Jan 23, 2022Updated 4 years ago
annomator / annomator_1.0
View on GitHub
Annomator is a fully featured automatic image annotator. It can detect, record, edit and display masks and boxes from objects detected i…
☆15Nov 22, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
maharanasarkar / whatsapp-connector-rasa
View on GitHub
Custom channel connector to connect Rasa Open Source to WhatsApp API.
☆20Jul 29, 2025Updated 11 months ago
vbarda / pandas-rag-langgraph
View on GitHub
☆72Jul 10, 2024Updated 2 years ago
Shef-AIRE / llms_post-ocr_correction
View on GitHub
Leveraging LLMs for Post-OCR Correction of Historical Newspapers
☆18May 12, 2026Updated 2 months ago
mozilla-ai / visual-dspy
View on GitHub
Visual demo of DSPy's prompt optimization on Gradio
☆15Apr 14, 2025Updated last year
uds-lsv / MCSE
View on GitHub
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings
☆58Jun 10, 2024Updated 2 years ago
di37 / langchain-rag-basic-to-advanced-tutorials
View on GitHub
It includes the concepts for RAG application from basics till advanced using LangChain library.
☆17Mar 31, 2024Updated 2 years ago
PrithivirajDamodaran / Route0x
View on GitHub
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆122Mar 31, 2025Updated last year
isurulkh / RAG-application-Gemini
View on GitHub
This repository implements a question answering system that retrieves information from uploaded PDFs using Google Generative AI and LangC…
☆12Dec 25, 2023Updated 2 years ago
utter-project / fairseq
View on GitHub
This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.
☆21Nov 19, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
benouinirachid / patterns-finder
View on GitHub
Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expre…
☆25Nov 26, 2022Updated 3 years ago
cyfer0618 / kaldi-pytorch-rnnlm
View on GitHub
Enable RNNLM lattice rescoring with Pytorch [kaldi]
☆12Jun 5, 2020Updated 6 years ago
mzbac / flux.1.app
View on GitHub
☆21Oct 9, 2024Updated last year
bikashkumars / rasa
View on GitHub
Build Modern Chatbot using Rasa
☆10Aug 21, 2022Updated 3 years ago
NathanGodey / headless-lm
View on GitHub
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆29Apr 17, 2024Updated 2 years ago
CAIsr / uniQC
View on GitHub
Unified NeuroImaging Quality Control (uniQC) toolbox
☆12Jan 10, 2022Updated 4 years ago
AIAnytime / Zephyr-7B-beta-RAG-Demo
View on GitHub
Zephyr 7B beta RAG Demo inside a Gradio app powered by BGE Embeddings, ChromaDB, and Zephyr 7B Beta LLM.
☆35Oct 27, 2023Updated 2 years ago