bloomberg / koan
A word2vec negative sampling implementation with correct CBOW update.
β260Updated 3 years ago
Related projects β
Alternatives and complementary repositories for koan
- πΈ fastText + Bloom embeddings for compact, full-coverage vectors with spaCyβ286Updated last year
- Create interactive textual heat maps for Jupiter notebooksβ196Updated 5 months ago
- More interactive weak supervision with FlyingSquidβ314Updated 4 years ago
- Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddingsβ76Updated 2 years ago
- Self-Supervision for Named Entity Disambiguation at the Tailβ213Updated 2 years ago
- π Easy training and deployment of seq2seq models.β229Updated 3 years ago
- SummVis is an interactive visualization tool for text summarization.β251Updated 2 years ago
- π°Natural language processing (NLP) newsletterβ301Updated 4 years ago
- Flexible classic and NeurAl Retrieval Toolkitβ214Updated 3 months ago
- Misspelling Oblivious Word Embeddingsβ202Updated 5 years ago
- Labelling platform for text using weak supervision.β260Updated 2 years ago
- NeuralQA: A Usable Library for Question Answering on Large Datasets with BERTβ231Updated last year
- xfspell β the Transformer Spell Checkerβ187Updated 4 years ago
- SpikeX - SpaCy Pipes for Knowledge Extractionβ398Updated 3 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in cβ¦β363Updated 2 years ago
- Deep learning with text doesn't have to be scary.β275Updated last year
- LASER multilingual sentence embeddings as a pip packageβ225Updated last year
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.β126Updated 3 years ago
- Live Python Notebooks with any Editorβ276Updated last year
- LM Pretraining with PyTorch/TPUβ132Updated 5 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality β¦β106Updated 8 months ago
- Google USE (Universal Sentence Encoder) for spaCyβ177Updated last year
- Question-answers, collected from Googleβ124Updated 3 years ago
- Machine Learning for Information Retrievalβ85Updated this week
- Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.β145Updated 3 years ago
- Ensemble topic modelling with pLSAβ112Updated 3 years ago
- A library to synthesize text datasets using Large Language Models (LLM)β151Updated last year
- Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/β187Updated last year
- skweak: A software toolkit for weak supervision applied to NLP tasksβ918Updated 2 months ago
- Toolkit to help understand "what lies" in word embeddings. Also benchmarking!β469Updated last year