Archive for the ‘alethepedia’ Category

Log link index

Sunday, June 13th, 2021

A .sql file containing an index that allows searching by content linked to in the logs is available here. It's only about 200mb compressed.

Alethepedia Improvements

Monday, May 3rd, 2021

Alethepedia is an encyclopedia based on an OCR scan of the 11th edition of Encyclopedia Britannica. It is currently hosted in an mp-wp running on a Digital Ocean VM. Mp-wp is a good foundation for an encyclopedia CMS, but much work needs to be done to make it better suited to the task. Following are the things I'd like to accomplish over the next 6 months or so:

Categorize articles alphabetically

I want all articles to be associated with a letter category corresponding the first letter of the title.

  • Deliverable: a python script I can run against the database.

Order articles alphabetically

I want articles to be listed in alphabetical order by title within categories. See: https://codex.wordpress.org/Alphabetizing_Posts for inspiration.

  • Deliverable: a patch against my mp-wp.

Prioritize title matches in search results

Currently it is very difficult to sift through search results and find an article when searching for words in the title, since it seems that possibly matches are made only against the body. I have no idea how the results are sorted. I'm open to other proposals to improve search via the web interface.

  • Deliverable: a patch against my mp-wp.

Any word that matches an article title should link to the article, with exceptions for things like the article on the letter A.

  • Deliverable: a python script I can run against the database.

Develop a minimalist, monochrome, responsive encyclopedia theme for mp-wp.

I kind of like the WordPress default theme, but I want a theme that is responsive and looks good on mobile devices.
Article dates are irrelevant and shouldn't be displayed
next/prev shouldn't say 'Older entries,' should be something like just next/previous since the date of the article isn't relevant.

  • Deliverable: WordPress theme or patch to existing WordPress theme.

Dump db to public file

  • Deliverable: Python script that can be called by cron to periodically dump the contents of the db into a sanitized .sql file that can be downloaded by interested parties.

Future work:

I am accepting proposals for work on cleaning up the text of the entries themselves. Here are some of the problems:

  • The OCR software mangled many dates within articles, as well as tables.
  • Many words have been mangled as well.

Simple Mp-wp Article Import Script

Sunday, September 20th, 2020

Below is a simple script for importing an article directly into mp-wp via the db inspired by lobbes' previous work on importing logs into mp-wp. It is a necessary step towards being able to import the entire Encyclopedia Britannica into mp-wp. I thought it might be useful for anyone wishing to automate creation of articles.

#!/usr/bin/python

import mysql.connector
import time
import json
from pprint import pprint

def add_article(cnx, article):
        cursor = cnx.cursor()

        content = article['content'] 

        current_date = time.strftime("%Y-%m-%d %H:%M:%S",
                                                     (1910, 1, 1, 1, 3, 38, 1, 48, 0))
        current_date_gmt = current_date
        title = article['title']
        post_url_relative = '-'.join(title.split())

        query = '''INSERT INTO wpmp_posts(post_author, post_date,
                        post_date_gmt, post_content, post_title,
                        post_category, post_status, comment_status, ping_status,
                        post_name, post_modified, post_modified_gmt, post_parent,
                        guid, menu_order, post_type, comment_count,
                        post_excerpt, to_ping, pinged, post_content_filtered,
                        post_mime_type, post_password) VALUES (%s,%s,%s,%s,
                                %s,%s,%s,%s,
                                %s,%s,%s,%s,
                                %s,%s,%s,%s,
                                %s,%s,%s,%s,
                                %s,%s,%s) ; '''
        cursor.execute(
                query,
                (0, current_date, current_date_gmt, content, title, 0, "publish",
                 "open", "open", post_url_relative, current_date,
                 current_date_gmt, 0, "", 0, "post", 0, "", "", "", "", "", "")
        )

        cnx.commit()

cnx = mysql.connector.connect(user='jwz', password='justwantedto',
                host='127.0.0.1',
                database='alethepedia')
json_file = '../data/encyclopedia.json'
with open(json_file) as json_data:
    data = json.load(json_data)
add_article(cnx, data[0]['articles'][0])
cnx.close()

Rule Britannica

Wednesday, May 27th, 2020

Absolutely Nobody:

Me: Let’s scrape the 1910 Encyclopedia Britannica and load it up into an mp-wp instance.

When writing qntra articles I often found myself depending on Pediwiki to find original sources when doing background checks. This was a revolting experience to say the least.

To address this, I am standing up an mp-wp instance containing the the entire 11th edition of Encyclopedia Britannica, which is available in unusable form only from the Internet Archive and Project Gutenberg. It goes without saying that this will be added to lekythion’s search index.

Obviously it won’t contain bios of more recent public figures, but it is a start, and would perhaps have come in handy when writing the unpersoning piece.

I look forward to being able to reference specific phrases/lines in the best encyclopedia ever.