dhruvilgala / tvtropes
☆50Updated last year
Related projects ⓘ
Alternatives and complementary repositories for tvtropes
- ☆68Updated 8 months ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated last year
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆42Updated last year
- A BERT-based application for reusable text classification at scale☆37Updated last year
- Libraries, Archives and Museums (LAM)☆82Updated 2 years ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- ☆86Updated 5 months ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆97Updated 6 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated this week
- assign color hues to a collection of text fragments based on embeddings☆20Updated 5 months ago
- Completion After Prompt Probability. Make your LLM make a choice☆69Updated 2 weeks ago
- ☆14Updated last year
- ☆18Updated 8 months ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated this week
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆146Updated 5 months ago
- Multilingual syllable annotation pipeline component for spacy☆37Updated last year
- Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-l…☆60Updated 2 weeks ago
- Efficient few-shot learning with cross-encoders.☆40Updated 9 months ago
- ☆147Updated 5 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆77Updated 3 months ago
- Dataset accompanying the paper "Investigating African-American Vernacular English in Transformer-Based Text Generation."☆9Updated 2 years ago
- Generate visual podcasts about novels using open source models☆23Updated last year
- Factored Cognition Primer: How to write compositional language model programs☆48Updated last year
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆98Updated 10 months ago
- Small python package to measure OCR quality and other related metrics.☆21Updated 9 months ago
- StAtutory Reasoning Assessment☆11Updated last year
- Semantically Structured Sentence Embeddings☆67Updated last month
- The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.☆9Updated last week