Thimbron

July 18, 2020

July Search Update

Filed under: Uncategorized — thimbronion @ 4:03 p.m.

Work is progressing on the search bot.

I now have a configuration that will index the btcbase.org logs. The resulting index is not perfect - anything from reddit is excluded due to my ip being blocked, and many archive.is pages are not successfully indexed due to archive.is periodically going offline. There is an error when attempting to index any Bitcointalk link that I haven’t been able to resolve. Also, due to the timespan involved, many links have rotted and are lost forever. Most links provided as “shortened” links also no longer work. The results for this crawl should show up in the bot’s index about one week from now.

Work on the encyclopedia crawl progresses as well. Apify delivered a half-functional crawling script that works with their platform. At the moment I don’t have a configuration which allows the crawl to get all volumes of the encyclopedia. I am currently working with support to get this resolved.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress