mim-solutions / bert_for_longer_texts
BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.
☆128Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for bert_for_longer_texts
- ☆334Updated 11 months ago
- Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…☆254Updated 2 weeks ago
- Creating class-based TF-IDF matrices☆82Updated 2 years ago
- Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama, and Mistral for Disaster Tweets Analysis with Lora☆45Updated last year
- HDBSCAN Tuning for BERTopic Models☆42Updated last year
- Guideline following Large Language Model for Information Extraction☆313Updated 3 weeks ago
- Text classification with Foundation Language Model LLaMA☆110Updated last year
- Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with L…☆30Updated last year
- A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning☆143Updated 8 months ago
- ☆40Updated last year
- Efficient Attention for Long Sequence Processing☆89Updated 11 months ago
- A repo to explore different NLP tasks which can be solved using T5☆169Updated 3 years ago
- Code and experiments for *BERTopic: Neural topic modeling with a class-based TF-IDF procedure*☆70Updated 11 months ago
- ☆59Updated 3 years ago
- 🔍 A statutory article retrieval dataset in French. (ACL 2022)☆38Updated last year
- Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to…☆34Updated last year
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆421Updated 2 years ago
- Clustering sentence embeddings to extract message intent☆167Updated 3 years ago
- TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)☆222Updated last week
- Named Entity Recognition in PyTorch on CoNLL2003 dataset☆16Updated 2 years ago
- A collection of topic diversity measures for topic modeling☆45Updated 3 years ago
- Long Document Summarization Papers☆137Updated last year
- Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer…☆377Updated last year
- ☆147Updated 5 months ago
- ☆60Updated 3 years ago
- ☆42Updated 2 years ago
- Applying BERT to named entity recognition in English and Russian.☆160Updated last year
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆161Updated 2 weeks ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆323Updated last year
- ☆68Updated last year