wikit-ai / chunknorrisLinks
ChunkNorris is a black belt in document chunking to feed your LLMs and RAG apps 🥋🔪
☆22Updated this week
Alternatives and similar repositories for chunknorris
Users that are interested in chunknorris are comparing it to the libraries listed below
Sorting:
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Updated last year
- Model implementation for the contextual embeddings project☆40Updated 8 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆76Updated 2 weeks ago
- Python library to use Pleias-RAG models☆68Updated 9 months ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆61Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- 0-Shot Tokenizer Transplant☆14Updated 8 months ago
- Code for the EMNLP'24 paper "Learning to Extract Structured Entities Using Language Models"☆49Updated 10 months ago
- ☆59Updated last year
- ☆27Updated 11 months ago
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆12Updated 3 years ago
- Efficient few-shot learning with cross-encoders.☆62Updated last year
- A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark☆32Updated last week
- ☆20Updated 10 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- Pre-train Static Word Embeddings☆94Updated 5 months ago
- Evaluate language models using multiple choice items☆13Updated 3 weeks ago
- 🕸 GlotCC Dataset and Pipline -- NeurIPS 2024☆20Updated 10 months ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- ☆10Updated last year
- ☆57Updated last month
- GLiNER model in a FastAPI microservice.☆47Updated last year
- a unified framework for leveraging LLMs☆73Updated last month
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated last year
- CMU Linguistic Annotation Backend☆14Updated 4 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Updated 6 months ago
- ☆32Updated last year
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 3 years ago