mim-solutions / bert_for_longer_texts
BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.
☆135Updated 9 months ago
Alternatives and similar repositories for bert_for_longer_texts:
Users that are interested in bert_for_longer_texts are comparing it to the libraries listed below
- ☆61Updated 4 years ago
- Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…☆259Updated 4 months ago
- Creating class-based TF-IDF matrices☆83Updated 2 years ago
- ☆358Updated last year
- ☆63Updated 3 years ago
- A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning☆153Updated last year
- Code and experiments for *BERTopic: Neural topic modeling with a class-based TF-IDF procedure*☆75Updated last year
- Efficient Attention for Long Sequence Processing☆92Updated last year
- A Framework for Textual Entailment based Zero Shot text classification☆153Updated last year
- Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with L…☆34Updated last year
- A repo to explore different NLP tasks which can be solved using T5☆172Updated 4 years ago
- Building NER and RE components using HuggingFace Transformers☆50Updated 2 years ago
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆80Updated 10 months ago
- Long Document Summarization Papers☆145Updated last year
- ☆44Updated 2 years ago
- Clustering sentence embeddings to extract message intent☆172Updated 3 years ago
- ☆158Updated 9 months ago
- Text classification with Foundation Language Model LLaMA☆115Updated 2 years ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆330Updated last year
- Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama, and Mistral for Disaster Tweets Analysis with Lora☆51Updated last year
- PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT☆81Updated 4 months ago
- SpanMarker for Named Entity Recognition☆422Updated 2 months ago
- Transformer-based Long Document Classification☆17Updated 2 years ago
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆55Updated 10 months ago
- Zero and Few shot named entity & relationships recognition☆361Updated 4 months ago
- Data and models for the SciFact verification task.☆228Updated last year
- Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document …☆184Updated last year
- Guideline following Large Language Model for Information Extraction☆355Updated 4 months ago
- A collection of topic diversity measures for topic modeling☆45Updated 3 years ago
- ☆38Updated 2 years ago