Skip to content

Migrating from MySQL Spatial Extensions to PostGIS

jpmckinney edited this page Sep 19, 2012 · 51 revisions

Many of the following steps are to set up an environment in which to test the migration. When the actual migration occurs, many steps may be skipped.

From the AWS Console, boot an Ubuntu AMI on a EC2 Extra Large instance, making sure to change the root volume size (doing so will create an EBS volume). An uncompressed MySQL dump is 6 GB. Comparison of a few EC2 instances:

  • Extra Large $0.312, 8 ECU, 15 GB
  • High-Memory Extra Large $0.450, 6.5 ECU, 17.1 GB
  • High-CPU Extra Large $0.660 20 ECU, 7 GB

Login to EC2 instance

ssh -o "StrictHostKeyChecking no" -i ~/.ec2/ec2-keypair.pem [email protected]
sudo -i
# Avoid locale warnings.
locale-gen en_CA.UTF-8
locale-gen en_US.UTF-8
# Start a screen session.
aptitude update
aptitude install screen
screen

Change the path to your keypair and the public hostname of the instance as necessary. If you chose an Amazon Linux AMI instead, login with the ec2-user user (the commands below assume Ubuntu, however, as the Amazon Linux AMI uses Yum).

MySQL

Backup

LOAD DATA INFILE is "usually 20 times faster than using INSERT statements". Login to the MySQL slave and run:

mkdir migration
chmod a+w migration
mysql -u root -p -e "GRANT FILE ON *.* to twfyl_user;"
# Takes about 30 minutes.
mysqldump -u twfyl_user -p --opt twfy_local_production -T migration

Tuning for fast restoration

Potentially useful variables that may sacrifice integrity:

System variables that others mention, but we don't need:

Installation

Login to the EC2 instance, become root (sudo -i) and re-attach your screen session (screen -r):

# http://www.dbasquare.com/2012/05/22/install-and-configure-mysql-on-ec2-with-redhat-linux/
# Leave MySQL password blank (you'll be prompted 3 times).
aptitude install -y mysql-server libmysqlclient-dev
# Tune MySQL for fast restoration, assuming 15 GB total. 
# MySQL will not start if innodb_log_file_size set to 2GB.
sed -i -e "/\[mysqld\]/a\
innodb_file_per_table\n\
innodb_buffer_pool_size = 10G\n\
innodb_log_file_size = 1G\n\
innodb_log_buffer_size = 8M\n\
innodb_flush_log_at_trx_commit = 0\n\
innodb_flush_method = O_DIRECT\n\
innodb_thread_concurrency = 0" /etc/mysql/my.cnf
stop mysql
# http://dba.stackexchange.com/questions/1261/how-to-safely-change-mysql-innodb-variable-innodb-log-file-size
rm -f /var/lib/mysql/ib_logfile[01]
start mysql

Restore from mysqldump -T

Downloading the migration directory to the EC2 instance is faster than creating a tar.gz compressed archive of the migration directory, downloading it, and uncompressing and unarchiving it. In fact, just creating the archive takes almost as long as downloading the directory.

This is a faster method (about 30 minutes).

mkdir migration
# You will need frankg's password. Takes about 40 minutes.
scp -o "StrictHostKeyChecking no" -P 7012 -r [email protected]:~/migration .
mysql -u root -e "CREATE DATABASE twfy_local_production;"
for i in migration/*.sql; do mysql -u root twfy_local_production < $i;done
# mysqlimport uses absolute paths. Takes about 20 minutes.
mysqlimport -L -u root twfy_local_production `pwd`/migration/*.txt

For posterity, this is a slower method (hours).

# You will need frankg's password. Takes 1-2 minutes.
scp -o "StrictHostKeyChecking no" -P 7012 [email protected]:~/backups/openlylocal/twfy_local_production_A.sql.gz .
# Takes 2-3 minutes.
gunzip twfy_local_production_A.sql.gz
mysql -u root -e "CREATE DATABASE twfy_local_production;"
# Takes hours.
mysql -u root twfy_local_production < twfy_local_production_A.sql

Convert to UTF-8

# http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
# Takes under 5 minutes.
for i in `grep -l CHARSET=latin1 migration/*.sql | sed 's/migration\///' | sed 's/\.sql//'`; do
  echo "mysql -u root -e 'ALTER TABLE $i CONVERT TO CHARACTER SET utf8;' twfy_local_production"
  sh -c "mysql -u root -e 'ALTER TABLE $i CONVERT TO CHARACTER SET utf8;' twfy_local_production"
done

PostgreSQL

aptitude install -y postgresql-9.1-postgis postgresql-client postgresql-client-common libpq-dev

Set up the PostGIS template, create a superuser root with password pass and create a twfy_local_production production database.

Taps

We will use Taps to convert from MySQL to PostgreSQL. Start a taps server for MySQL.

# From running `rvm requirements`.
aptitude install -y build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion
curl -L https://get.rvm.io | bash -s stable --ruby
useradd -G rvm root
# Some RVM documentation uses:
# source /etc/profile.d/rvm.sh
source /usr/local/rvm/scripts/rvm
gem install --no-rdoc --no-ri taps sqlite3 mysql2 pg
taps server mysql2://root@localhost/twfy_local_production user pass

Migration

Migrate all tables except boundaries, which contains spatial data. You may want to open a new screen window (Ctrl-A C).

# Takes hours.
taps pull -g -e boundaries postgres://root:pass@localhost/twfy_local_production http://user:[email protected]:5000

Taps seems to misreport the number of records (which you can find with):

mysql -u root -e "SELECT TABLE_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'twfy_local_production';"

Testing

Setup

aptitude install -y libcurl4-openssl-dev
git clone [email protected]:CountCulture/OpenlyLocal.git
rvm install ruby-1.8.7-p370
rvm 1.8.7
cd OpenlyLocal
bundle
cp doc/examples/initializers/authentication_details.rb doc/examples/initializers/geokit_config.rb doc/examples/initializers/resque.rb config/initializers/
cp doc/examples/database.yml doc/examples/resque.yml doc/examples/smtp_gmail.yml config/
vi config/database.yml

Edit config/database.yml to connect to the local PostgreSQL server. In Rails 2.3, the template option in database.yml seems to have no effect, so you cannot run bundle exec rake db:create:all. (The activerecord-postgis-adapter gem, available to Rails 3+ only, respects the template option.)

Boundaries

CREATE TABLE boundaries (
    id integer NOT NULL,
    area_type character varying(255),
    area_id integer,
    created_at timestamp without time zone,
    updated_at timestamp without time zone,
    hectares double precision
);
SELECT AddGeometryColumn('public', 'boundaries', 'boundary_line', 4326, 'POLYGON', 2);

CREATE SEQUENCE boundaries_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE boundaries_id_seq OWNED BY boundaries.id;
SELECT pg_catalog.setval('boundaries_id_seq', 1, false);
ALTER TABLE ONLY boundaries ALTER COLUMN id SET DEFAULT nextval('boundaries_id_seq'::regclass);
ALTER TABLE ONLY boundaries ADD CONSTRAINT boundaries_pkey PRIMARY KEY (id);

Get a download link for the "Boundary-Line" file from Ordnance Survey. Note: The new file uses NESS IDs, but the Rake task was written for SNAC IDs. You will likely have to use an old file we have. Then, from the Rails root:

curl -o bdline_gb.zip DOWNLOAD_LINK
unzip bdline_gb.zip
mv Data/district_borough_unitary_ward_region.* db/data/boundary_line
RAILS_ENV=production bundle exec rake import_ward_boundaries

Tests

RAILS_ENV=production bundle exec db:schema:dump
RAILS_ENV=production bundle exec db:structure:dump
sudo -u postgres createdb -T template_postgis openlylocal_development
sudo -u postgres createdb -T template_postgis twfy_local_test
#RAILS_ENV=production bundle exec db:test:clone
#RAILS_ENV=test rake db:fixtures:load
#bundle exec rake db:test:load
bundle exec rake db:test:prepare
bundle exec rake