Author: Ethan J. Eldridge
Website: ejehardenberg.github.io
Python tool to migrate WP contents to JSON for use in harp
Current Features:
- Creates JSON for Posts
- Creates JSON for Comments
- Creates JSON for Nav
- Creates JSON for Pages
- Creates .md files for the Posts
- Creates .md files for the pages
- If using the PULL_TYPES, will pull down entire wp_post table and convert it into usable _data.
To Run:
Fill out the database credentials at the top of the file
$python wp2json4harp.py
You'll now have an example.jade file, and a few directories with _data.json inside.
Once you've ran the script, you'll get folders for each of the _data.json files. This makes things a bit easier to coordinate, and you can see from the example jade file how you can access the information pulled from your blog.
It's pretty heavy on I/O from all the writes, but I pulled down a sizable wordpress database within a reasonable time (less than a minute) that had 347915 rows in the postmeta, and 34617 in the posts table, so it works alright.
If you'd like to try it out:
- Install Wordpress
- Install Harp
- Download this script
- Configure it to your liking using the options below
- Run the script!
- Move the folders and files into your harp site area.
Some configuration details:
Configuration is at the top of the script, you'll need to enter your database credentials. Optionally, you can fully configure the script using the constants below:
Constant | What it does |
---|---|
MYSQL_HOST | Defines the host of the database to connect to. |
MYSQL_USER | Defines the user to connect to the database as |
MYSQL_PASS | Defines the password to the database |
MYSQL_DB | Defines the database name connected to on the host. |
WP_PREFIX | The prefix to your wordpress tables, typically this is `wp_` |
ONLY_PUBLISHED | Only retrieve posts and pages that have been published |
GENERATE_PAGES | Generate a markdown file for the pages being pulled from the WP database. This will exist in the PAGES_DIR |
GENERATE_POSTS | Generate a markdown file for the posts being pulled from the WP database. This will exist in the BLOG_DIR |
ROOT_DIR | Where to generate all the files this script creates, leave empty by default for the area where the script is being ran |
ENCODING | The encoding to decode the content from the database in, I've defaulted it to latin to handle some annoying unicode errors |
OUTPUT_ENCODING | The encoding to encode the _data.json files in |
STRIP_NON_ASCII | strips out non-ascii characters from data being written into _data.json |
PULL_TYPES | Specify this to true and all post types will be pulled out of the database and _data files created for eachs, if you use this, then the *_DIR constants mean nothing. |
PAGES_DIR | The directory name where the pages will be stored |
BLOG_DIR | The directory name where the blog posts will be stored |
NAV_DIR | The directory where the navigation json will be stored |
COMMENTS_DIR | The directory where comments will be stored. |
EXAMPLE_FILE | The name of the file that will be generated to show some of the posts and pages. |
STOP_ON_ERR | Boolean value that causes errors to stop the script, |
TODO:
- Taxonomy?
- Add nav stuff to the PULL_TYPES area as well to help out with navigation
- How does one pull in the comments to a post?
- Use getopt to make cmd line arguments instead of constants
- More examples!