mim-solutions / bert_for_longer_textsLinks
BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.
☆145Updated last year
Alternatives and similar repositories for bert_for_longer_texts
Users that are interested in bert_for_longer_texts are comparing it to the libraries listed below
Sorting:
- ☆369Updated last year
- Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…☆265Updated 11 months ago
- Efficient Attention for Long Sequence Processing☆97Updated last year
- OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)☆788Updated last year
- Clustering sentence embeddings to extract message intent☆174Updated 4 years ago
- A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning☆155Updated last year
- Code and experiments for *BERTopic: Neural topic modeling with a class-based TF-IDF procedure*☆82Updated last year
- Guideline following Large Language Model for Information Extraction☆404Updated last year
- Zero and Few shot named entity & relationships recognition☆391Updated last month
- Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer…☆395Updated 2 years ago
- TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)☆355Updated 7 months ago
- [ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction☆310Updated 2 years ago
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆447Updated 3 years ago
- PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT☆97Updated 11 months ago
- Multimodal model for text and tabular data with HuggingFace transformers as building block for text data☆609Updated last year
- Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder …☆162Updated 4 months ago
- Repository for TweetEval☆386Updated 3 years ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆338Updated 2 years ago
- ☆69Updated 4 years ago
- Multilingual/multidomain question generation datasets, models, and python library for question generation.☆364Updated last year
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆86Updated last year
- Active Learning for Text Classification in Python☆628Updated last week
- SpanMarker for Named Entity Recognition☆460Updated 9 months ago
- Multi-label text classification using BERT☆65Updated 4 years ago
- A curated list of resources on document similarity measures (papers, tutorials, code, ...)☆253Updated 3 years ago
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆196Updated last month
- simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.☆399Updated 2 years ago
- Long Document Summarization Papers☆152Updated 2 years ago
- Text classification with Foundation Language Model LLaMA☆113Updated 2 years ago
- ☆169Updated last year