commonsearch/cosr-back

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/commonsearch/cosr-back)

commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.

☆122

Alternatives and similar repositories for cosr-back

Users that are interested in cosr-back are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

commonsearch / cosr-front
View on GitHub
Frontend of Common Search. Go server for fetching and rendering results + HTML5 UI to browse them.
☆59Feb 17, 2017Updated 9 years ago
commonsearch / gumbocy
View on GitHub
Python binding for gumbo-parser using Cython
☆14Aug 16, 2016Updated 9 years ago
ikreymer / cc-index-server
View on GitHub
Deployment of pywb as a CommonCrawl Index Server
☆22Oct 6, 2017Updated 8 years ago
anuzzolese / oke-challenge-2016
View on GitHub
☆22Aug 24, 2017Updated 8 years ago
trivio / common_crawl_index
View on GitHub
Index URLs in Common Crawl
☆197Sep 19, 2017Updated 8 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
diffbot / wikistatsextractor
View on GitHub
Extract statistics from Wikipedia Dump files.
☆26Aug 2, 2021Updated 4 years ago
strangerlabs / tantivy
View on GitHub
Node.js bindings to Tantivy Search
☆13Dec 8, 2022Updated 3 years ago
semanticize / st
View on GitHub
Semanticizest: dump parser and client
☆20May 11, 2016Updated 10 years ago
idio / spotlight-model-editor
View on GitHub
Tool for tweaking dbpedia spotlight's models
☆16Dec 1, 2017Updated 8 years ago
getlantern / flashlight-build
View on GitHub
Repeatable builds for Lantern, using docker.
☆14Mar 7, 2024Updated 2 years ago
gfjreg / CommonCrawl
View on GitHub
A distributed system for mining common crawl using SQS, AWS-EC2 and S3
☆22Jun 24, 2014Updated 12 years ago
VIDA-NYU / domain_discovery_tool_deprecated
View on GitHub
Seed acquisition tool to bootstrap focused crawlers
☆23Apr 24, 2017Updated 9 years ago
freedombox / freedombox.org
View on GitHub
Source code for the freedombox.org website. Read-only mirror of https://salsa.debian.org/freedombox-team/freedombox.org
☆11Aug 24, 2017Updated 8 years ago
dedupeio / doublemetaphone
View on GitHub
Python wrapper for a C++ Double Metaphone
☆15Jan 12, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tomayac / wikipedia-live-monitor
View on GitHub
Wikipedia Live Monitor
☆22Dec 21, 2024Updated last year
iai-group / nordlys
View on GitHub
Nordlys: Toolkit for entity-oriented and semantic search
☆31Mar 23, 2021Updated 5 years ago
wunderalbert / prod-neural-materials
View on GitHub
Background materials for the article "Productivity Assessment of Neural Code Completion"
☆16Jul 11, 2023Updated 2 years ago
SocialGouv / legi-data
View on GitHub
Legi Data
☆16Jun 27, 2026Updated last week
PeARSearch / PeARS
View on GitHub
The development of PeARS has been moved to https://github.com/PeARSearch/PeARS-orchard
☆54Jul 22, 2017Updated 8 years ago
seomoz / g-crawl-py
View on GitHub
Gevent Crawling in Python, with Utilities
☆22Mar 12, 2015Updated 11 years ago
dkpro / dkpro-c4corpus
View on GitHub
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…
☆53Jun 12, 2020Updated 6 years ago
newsreader / eso-and-ceo
View on GitHub
Events and Situations Ontology
☆14Apr 20, 2018Updated 8 years ago
graphific / Fear-and-Loathing-experiment
View on GitHub
unprocessed and processed frames of Fear and Loathing in Las Vegas #deepdream experiment
☆24Jul 9, 2015Updated 10 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
nightkr / klocka
View on GitHub
Smart Doorbell? Pi-Powered "Smart" Doorbell!
☆53Jun 5, 2018Updated 8 years ago
odie5533 / WarcMiddleware
View on GitHub
WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆48Mar 19, 2018Updated 8 years ago
searxng / fasttext-predict
View on GitHub
fasttext with wheels and no external dependency, but only the predict method (<1MB)
☆20Nov 23, 2024Updated last year
internetarchive / surt
View on GitHub
Sort-friendly URI Reordering Transform (SURT) python module
☆45Sep 11, 2025Updated 9 months ago
Matt-Deacalion / systemd-django
View on GitHub
systemd service files for Django related daemons.
☆12Jul 30, 2013Updated 12 years ago
saschagrunert / fosdem20
View on GitHub
Demo material used for the Podman talk at FOSDEM 2020
☆20Feb 2, 2020Updated 6 years ago
dapete42 / vcat
View on GitHub
vCat Java code
☆11Updated this week
LuminosoInsight / assoc-space
View on GitHub
Compute association strength over semantic networks in a dimensionality-reduced form.
☆32Aug 14, 2015Updated 10 years ago
DataBrewery / learn-data-brewing
View on GitHub
Step-by-step introduction to the traditional data warehousing with examples.
☆11Mar 14, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xiaoganghan / wikientities
View on GitHub
Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts
☆59Sep 5, 2012Updated 13 years ago
commoncrawl / cc-crawl-statistics
View on GitHub
Statistics of Common Crawl monthly archives mined from URL index files
☆225Jun 23, 2026Updated last week
TheGadflyProject / TheGadflyProject
View on GitHub
The core NLP library for automatic question generation
☆17Mar 7, 2017Updated 9 years ago
hypriot / rpi-swarm
View on GitHub
Raspberry Pi compatible Docker image with Docker Swarm - https://github.com/docker/swarm
☆15Oct 29, 2017Updated 8 years ago
autonull / telepathine
View on GitHub
p2p gossip protocol w/ incremental diffs & failure detection for a fault-tolerant, self-managing cluster or mesh (for node.js)
☆14Sep 22, 2014Updated 11 years ago
asssaf / urbit-shipyard
View on GitHub
Ship name utilities for Urbit
☆11Mar 14, 2020Updated 6 years ago
crawler-commons / crawler-commons
View on GitHub
A set of reusable Java components that implement functionality common to any web crawler
☆259Updated this week