Search Prototype

So the search project produced a prototype, which is available in #exusiae.

It works like this:

18:00:07 thimbronion !s Bitcoin
18:00:08 lekythion 10 results
18:00:08 lekythion ³http://trilema.com/2013/bitcoin-prices-bitcoin-inflexibility/
18:00:08 lekythion Bitcoin prices, Bitcoin inflexibility on Trilema - A blog by Mircea Popescu.
18:00:08 lekythion …keeping the Bitcoin). Other than this ~4% of the Bitcoin monetary…
18:00:08 lekythion …Bitcoin. Will people stop throwing dollars at Bitcoin because Bitcoin
18:00:09 lekythion …Will people start throwing Bitcoin at dollars because Bitcoin prices…
18:00:10 lekythion ²http://trilema.com/2015/introducing-the-bitcoin-isp/
18:00:11 lekythion Introducing the Bitcoin ISP on Trilema - A blog by Mircea Popescu.
18:00:12 lekythion …Bitcoin, The Most Serene Republic Of ~. In any case, Bitcoin ISP will…
18:00:13 lekythion …Bitcoin ISP, your only avenue is to voice your concerns in #bitcoin
18:00:14 lekythion …Soon to become a Bitcoin registered company, trading as S.BISP…
18:00:15 lekythion All results can be found at ¹http://paste.deedbot.org/?id=ZwnE.

The bot currently only searches an index of trilema.com. The !s command accepts Apache Lucene queries.

I now confront some problems.

  1. Fine tuning was required. I had to tune the indexer to extract certain elements from Trilema to get the quality of the results somewhere near acceptable. This means every site is going to need tuning. For example, the good stuff is all the div.entry class in mp-wp, while trinque.org has it somewhere else.

  2. I don’t yet know how to let others add sites they want to search. This is partially due to the first issue because if I just take lists of sites from people and don’t customize the indexer, the results won’t be great. It’s also due to not knowing the best way to allow users to configure their lists of sites to index. The first thing that popped into my head was to allow users to sign a text file that includes a list of all the sites they want to index and provide that to me. I would then do the configuration on the server manually and associate that index with their nick such that it would be the default index searched whenever they search. Perhaps at some point users could also specify by WoT identity others’ indexes they’d like to search.

One positive result is that after futzing around trying to use Google to find particular Trilema articles, I find using my own index to be much more productive.

Sites and documents I personally want to index:

trilema.com
loper-os.org
the blog of everyone from #ossasepia
thebitcoin.foundation
the naggum archive
Encyclopedia Britannica, 11th Edition
bitcointalk.org

3 Responses to “Search Prototype”

  1. Diana Coman says:

    How does it end up with only 10 results for Bitcoin on trilema.com? I'd expect there are way more for that specific term.

  2. thimbronion says:

    Diana Coman:

    I haven't figured out what's going on there - but my guess is somewhere in one of the libraries I'm using there is a magic limit. The actual hitcount for many terms is much higher - in the hundreds. It's something I'm going to have to iron out.

  3. thimbronion says:

    Diana Coman: I upped the limit to 100.

Leave a Reply