dhruvilgala / tvtropes
☆44Updated last year
Related projects: ⓘ
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- Documentation effort for the BookCorpus dataset☆30Updated 3 years ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆38Updated 3 years ago
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆41Updated last year
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆89Updated 6 years ago
- Factored Cognition Primer: How to write compositional language model programs☆48Updated last year
- ☆67Updated 6 months ago
- LLM plugin for clustering embeddings☆61Updated 6 months ago
- Libraries, Archives and Museums (LAM)☆81Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆33Updated 6 months ago
- ☆81Updated 3 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆71Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆61Updated last month
- Code and data to support "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4"☆67Updated last year
- A BERT-based application for reusable text classification at scale☆37Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆61Updated 6 months ago
- End-to-end zero-shot entity and relation extraction☆50Updated last month
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆131Updated 3 months ago
- ☆29Updated last year
- A spaCy wrapper for GliNER☆77Updated 2 months ago
- Analysis of gutenberg dataset☆40Updated 5 years ago
- Completion After Prompt Probability. Make your LLM make a choice☆68Updated last week
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆72Updated last month
- Repo for the paper "Detecting Logical Fallacies: From Quiz to Climate Change News" (2021)☆69Updated 9 months ago
- 💫 SpaCy wrapper for ConceptNet 💫☆88Updated last year
- SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researc…☆27Updated 5 years ago
- Detecting gibberish as a type of sentiment analysis with GPT2☆24Updated 3 years ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆112Updated 4 months ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated 2 weeks ago