Archive for May, 2021

Alethepedia Improvements

Monday, May 3rd, 2021

Alethepedia is an encyclopedia based on an OCR scan of the 11th edition of Encyclopedia Britannica. It is currently hosted in an mp-wp running on a Digital Ocean VM. Mp-wp is a good foundation for an encyclopedia CMS, but much work needs to be done to make it better suited to the task. Following are the things I'd like to accomplish over the next 6 months or so:

Categorize articles alphabetically

I want all articles to be associated with a letter category corresponding the first letter of the title.

  • Deliverable: a python script I can run against the database.

Order articles alphabetically

I want articles to be listed in alphabetical order by title within categories. See: for inspiration.

  • Deliverable: a patch against my mp-wp.

Prioritize title matches in search results

Currently it is very difficult to sift through search results and find an article when searching for words in the title, since it seems that possibly matches are made only against the body. I have no idea how the results are sorted. I'm open to other proposals to improve search via the web interface.

  • Deliverable: a patch against my mp-wp.

Any word that matches an article title should link to the article, with exceptions for things like the article on the letter A.

  • Deliverable: a python script I can run against the database.

Develop a minimalist, monochrome, responsive encyclopedia theme for mp-wp.

I kind of like the WordPress default theme, but I want a theme that is responsive and looks good on mobile devices.
Article dates are irrelevant and shouldn't be displayed
next/prev shouldn't say 'Older entries,' should be something like just next/previous since the date of the article isn't relevant.

  • Deliverable: WordPress theme or patch to existing WordPress theme.

Dump db to public file

  • Deliverable: Python script that can be called by cron to periodically dump the contents of the db into a sanitized .sql file that can be downloaded by interested parties.

Future work:

I am accepting proposals for work on cleaning up the text of the entries themselves. Here are some of the problems:

  • The OCR software mangled many dates within articles, as well as tables.
  • Many words have been mangled as well.