Ighina / DeepTilingLinks
A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive summarization and semantic search applications built on top of it.
☆48Updated 2 years ago
Alternatives and similar repositories for DeepTiling
Users that are interested in DeepTiling are comparing it to the libraries listed below
Sorting:
- Efficient few-shot learning with cross-encoders.☆57Updated last year
- Universal text classifier for generative models☆24Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆65Updated last year
- Completion After Prompt Probability. Make your LLM make a choice☆80Updated 9 months ago
- Using short models to classify long texts☆21Updated 2 years ago
- 🚀 Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)☆25Updated last year
- TextReducer - A Tool for Summarization and Information Extraction☆88Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆60Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated 2 years ago
- RaKUn 2.0 - A fast keyword detection algorithm☆68Updated 3 weeks ago
- ☆64Updated 4 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆41Updated 2 years ago
- ☆49Updated 6 months ago
- utilities for loading and running text embeddings with onnx☆44Updated 2 weeks ago
- ☆33Updated 2 years ago
- Entailment self-training☆25Updated 2 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆89Updated last year
- Generate visual podcasts about novels using open source models☆25Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated last year
- Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…☆18Updated last year
- ☆67Updated last year
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆86Updated 4 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated 3 weeks ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Updated last year
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- QLoRA with Enhanced Multi GPU Support☆37Updated 2 years ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Updated 2 years ago