-
Notifications
You must be signed in to change notification settings - Fork 3
Migrating from MySQL Spatial Extensions to PostGIS
Many of the following steps are to set up an environment in which to test the migration. When the actual migration occurs, many steps may be skipped.
From the AWS Console, boot an Ubuntu AMI on a EC2 Extra Large instance, making sure to change the root volume size (doing so will create an EBS volume). An uncompressed MySQL dump is 6 GB. Comparison of a few EC2 instances:
- Extra Large $0.312, 8 ECU, 15 GB
- High-Memory Extra Large $0.450, 6.5 ECU, 17.1 GB
- High-CPU Extra Large $0.660 20 ECU, 7 GB
ssh -o "StrictHostKeyChecking no" -i ~/.ec2/ec2-keypair.pem [email protected]
sudo -i
# Avoid locale warnings.
locale-gen en_CA.UTF-8
locale-gen en_US.UTF-8
# Start a screen session.
aptitude update
aptitude install screen
screen
Change the path to your keypair and the public hostname of the instance as necessary. If you chose an Amazon Linux AMI instead, login with the ec2-user
user (the commands below assume Ubuntu, however, as the Amazon Linux AMI uses Yum).
LOAD DATA INFILE
is "usually 20 times faster than using INSERT statements". Login to the MySQL slave and run:
mkdir migration
chmod a+w migration
mysql -u root -p -e "GRANT FILE ON *.* to twfyl_user;"
# Takes about 30 minutes.
mysqldump -u twfyl_user -p --opt twfy_local_production -T migration
- Reasons to use innodb_file_per_table explains that it can save hard drive space.
-
innodb_buffer_pool_size
: "up to 80% of the machine physical memory size" -
innodb_log_file_size
: "Sensible values range from 1MB" to 1/2 the size of the buffer pool, but less that 4GB. Large values save disk I/O. -
innodb_log_buffer_size
: "Sensible values range from 1MB to 8MB." -
innodb_flush_log_at_trx_commit
can be set to 0 or 2 to improve performance. innodb_flush_method
innodb_thread_concurrency
Potentially useful variables that may sacrifice integrity:
innodb_checksums
innodb_doublewrite
innodb_support_xa
- Setting
innodb_data_home_dir
to a tmpfs or ramfs
System variables that others mention, but we don't need:
-
Variable's Day Out #8: innodb_additional_mem_pool_size says it's unneeded. Running
SHOW ENGINE INNODB STATUS \G
also shows no need to increase its size. -
innodb_log_files_in_group
: Beginning PHP and MySQL 5 writes, "MySQL developers recommend setting this parameter to 3." MySQL documentation writes, "the default (and recommended) value is 2."
Login to the EC2 instance, become root (sudo -i
) and re-attach your screen session (screen -r
):
# http://www.dbasquare.com/2012/05/22/install-and-configure-mysql-on-ec2-with-redhat-linux/
# Leave MySQL password blank (you'll be prompted 3 times).
aptitude install -y mysql-server libmysqlclient-dev
# Tune MySQL for fast restoration, assuming 15 GB total.
# MySQL will not start if innodb_log_file_size set to 2GB.
sed -i -e "/\[mysqld\]/a\
innodb_file_per_table\n\
innodb_buffer_pool_size = 10G\n\
innodb_log_file_size = 1G\n\
innodb_log_buffer_size = 8M\n\
innodb_flush_log_at_trx_commit = 0\n\
innodb_flush_method = O_DIRECT\n\
innodb_thread_concurrency = 0" /etc/mysql/my.cnf
stop mysql
# http://dba.stackexchange.com/questions/1261/how-to-safely-change-mysql-innodb-variable-innodb-log-file-size
rm -f /var/lib/mysql/ib_logfile[01]
start mysql
Downloading the migration
directory to the EC2 instance is faster than creating a tar.gz
compressed archive of the migration
directory, downloading it, and uncompressing and unarchiving it. In fact, just creating the archive takes almost as long as downloading the directory.
This is a faster method (about 30 minutes).
mkdir migration
# You will need frankg's password. Takes about 40 minutes.
scp -o "StrictHostKeyChecking no" -P 7012 -r [email protected]:~/migration .
mysql -u root -e "CREATE DATABASE twfy_local_production;"
for i in migration/*.sql; do mysql -u root twfy_local_production < $i;done
# mysqlimport uses absolute paths. Takes about 20 minutes.
mysqlimport -L -u root twfy_local_production `pwd`/migration/*.txt
For posterity, this is a slower method (hours).
# You will need frankg's password. Takes 1-2 minutes.
scp -o "StrictHostKeyChecking no" -P 7012 [email protected]:~/backups/openlylocal/twfy_local_production_A.sql.gz .
# Takes 2-3 minutes.
gunzip twfy_local_production_A.sql.gz
mysql -u root -e "CREATE DATABASE twfy_local_production;"
# Takes hours.
mysql -u root twfy_local_production < twfy_local_production_A.sql
# http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
# Takes under 5 minutes.
for i in `grep -l CHARSET=latin1 migration/*.sql | sed 's/migration\///' | sed 's/\.sql//'`; do
echo "mysql -u root -e 'ALTER TABLE $i CONVERT TO CHARACTER SET utf8;' twfy_local_production"
sh -c "mysql -u root -e 'ALTER TABLE $i CONVERT TO CHARACTER SET utf8;' twfy_local_production"
done
aptitude install -y postgresql-9.1-postgis postgresql-client postgresql-client-common libpq-dev
Set up the PostGIS template, create a superuser root
with password pass
and create a twfy_local_production
production database.
We will use Taps to convert from MySQL to PostgreSQL. Start a taps server for MySQL.
# From running `rvm requirements`.
aptitude install -y build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion
curl -L https://get.rvm.io | bash -s stable --ruby
useradd -G rvm root
# Some RVM documentation uses:
# source /etc/profile.d/rvm.sh
source /usr/local/rvm/scripts/rvm
gem install --no-rdoc --no-ri taps sqlite3 mysql2 pg
taps server mysql2://root@localhost/twfy_local_production user pass
Migrate all tables except boundaries
, which contains spatial data. You may want to open a new screen window (Ctrl-A C
).
# Takes hours.
taps pull -g -e boundaries postgres://root:pass@localhost/twfy_local_production http://user:[email protected]:5000
Taps seems to misreport the number of records (which you can find with):
mysql -u root -e "SELECT TABLE_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'twfy_local_production';"
aptitude install -y libcurl4-openssl-dev
git clone [email protected]:CountCulture/OpenlyLocal.git
rvm install ruby-1.8.7-p370
rvm 1.8.7
cd OpenlyLocal
bundle
cp doc/examples/initializers/authentication_details.rb doc/examples/initializers/geokit_config.rb doc/examples/initializers/resque.rb config/initializers/
cp doc/examples/database.yml doc/examples/resque.yml doc/examples/smtp_gmail.yml config/
vi config/database.yml
Edit config/database.yml
to connect to the local PostgreSQL server. In Rails 2.3, the template
option in database.yml
seems to have no effect, so you cannot run bundle exec rake db:create:all
. (The activerecord-postgis-adapter
gem, available to Rails 3+ only, respects the template
option.)
CREATE TABLE boundaries (
id integer NOT NULL,
area_type character varying(255),
area_id integer,
created_at timestamp without time zone,
updated_at timestamp without time zone,
hectares double precision
);
SELECT AddGeometryColumn('public', 'boundaries', 'boundary_line', 4326, 'POLYGON', 2);
CREATE SEQUENCE boundaries_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE boundaries_id_seq OWNED BY boundaries.id;
SELECT pg_catalog.setval('boundaries_id_seq', 1, false);
ALTER TABLE ONLY boundaries ALTER COLUMN id SET DEFAULT nextval('boundaries_id_seq'::regclass);
ALTER TABLE ONLY boundaries ADD CONSTRAINT boundaries_pkey PRIMARY KEY (id);
Get a download link for the "Boundary-Line" file from Ordnance Survey. Note: The new file uses NESS IDs, but the Rake task was written for SNAC IDs. You will likely have to use an old file we have. Then, from the Rails root:
curl -o bdline_gb.zip DOWNLOAD_LINK
unzip bdline_gb.zip
mv Data/district_borough_unitary_ward_region.* db/data/boundary_line
RAILS_ENV=production bundle exec rake import_ward_boundaries
RAILS_ENV=production bundle exec db:schema:dump
RAILS_ENV=production bundle exec db:structure:dump
sudo -u postgres createdb -T template_postgis openlylocal_development
sudo -u postgres createdb -T template_postgis twfy_local_test
#RAILS_ENV=production bundle exec db:test:clone
#RAILS_ENV=test rake db:fixtures:load
#bundle exec rake db:test:load
bundle exec rake db:test:prepare
bundle exec rake