gioelecrispo / chunkipy
chunkipy is an extremely useful tool for segmenting long texts into smaller chunks, based on either a character or token count. With customizable chunk sizes and splitting strategies, chunkipy provides flexibility and control for various text processing tasks.
β35Updated last year
Alternatives and similar repositories for chunkipy:
Users that are interested in chunkipy are comparing it to the libraries listed below
- Explore the use of DSPy for extracting features from PDFs πβ39Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β67Updated 4 months ago
- β62Updated 4 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 6 months ago
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.β46Updated 6 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β64Updated 5 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created byβ¦β29Updated 7 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ59Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ101Updated this week
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.β78Updated last year
- A framework for evaluating function calls made by LLMsβ37Updated 8 months ago
- Mistral + Haystack: build RAG pipelines that rock π€β103Updated last year
- β30Updated 8 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 3 months ago
- Writing Blog Posts with Generative Feedback Loops!β47Updated last year
- Build reliable, secure, and production-ready AI apps easily.β70Updated this week
- β47Updated 11 months ago
- Pre-train Static Word Embeddingsβ51Updated 3 weeks ago
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ100Updated last year
- β76Updated 9 months ago
- β38Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated 8 months ago
- β24Updated last year
- POC Port of the openai-realtime-console to streamlit.β45Updated 5 months ago
- Generalist and Lightweight Model for Text Classificationβ92Updated this week
- Analysis on the cost of encoder based modelsβ11Updated last month
- β45Updated 11 months ago
- β18Updated 5 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsemblesβ22Updated 3 months ago
- Chunk your text using gpt4o-mini more accuratelyβ44Updated 7 months ago