Crawling btcbase.org/log

The results for lekythion’s first crawl of btcbase.org are in. The index created is not of much use due to the crawler being blocked by many sites including archive.is, reddit and tardstalk for not complying with the robots.txt files.

Nevertheless I now have a comprehensive list of links from the #trilema logs. I think the index should include the logs themselves, but it might be convenient to be able to compartmentalize crawling external links into a separate task/configuration.

4 Responses to “Crawling btcbase.org/log”

  1. It'd be rather useful to have a combined searchable index of the logs and linked items, yes. I only care about the linked items insofar as they contributed to whichever interesting conversation in the forum.

  2. thimbronion says:

    Alright - noted.

  3. spyked says:

    At some point Lobbes had a similar thing going for links in logs, also with archival support, which I found quite useful. I don't know if his thing is still active (and if so, for what channels), but IMHO this direction would be worth exploring, given how often things tend to disappear from the web.

  4. thimbronion says:

    Awesome. Correct me if I'm wrong, but his thing searches urls only, and not the content of urls, correct?

Leave a Reply