Download | Project Page | Andy Balaam

My Great Data (migratedata)

migratedata allows you to write a simple XML file to define how to copy data from one database to another, and perform very simple transformations on the way.

Currently, it is immediately able to help you migrate a PostNuke web site to a WordPress one, but to perform any other transformations you will need to create your own XML spec file (which is reasonably simple).

Migrating from PostNuke to WordPress

Download the migratedata tarball from the Download page, and unzip it somewhere.

Copy the file database_details_sample.xml and name it database_details.xml.

Edit the database_details.xml file to contain the hosts, usernames, passwords and database names of your databases.

If you have remote access to the MySQL databases for both your source and your target databases, the process is relatively simple. Open a terminal and cd into the directory containing migratedata.py, and run this command:

./migratedata.py database_details.xml postnuke0.75_to_wordpress2.2.xml

If you don't have remote access to the target database, run this command instead:

./migratedata.py --dump-insert database_details.xml postnuke0.75_to_wordpress2.2.xml > insert.sql

and execute the insert.sql script on the database server of your target database (probably through your hosting provider's web interface).

If you can't run queries remotely against your source database either, do this:

./migratedata.py --dump-select database_details.xml postnuke0.75_to_wordpress2.2.xml > select.sql

This will create the file select.sql which you need to run on your source database (probably through your hosting provider's web interface). The output of running this query will be some more SQL, which you will need to copy and run on your target database, again probably through your new hosting provider's web interface.

Migrating other things: creating your own spec file

A spec file looks something like this:

<migrate_data_spec>
	<table_copy source="nuke_users" target="wp_users" where="pn_uid > 1">
		<column target="ID"                  type="int">pn_uid + 100</column>
		<column target="user_login"          type="str">pn_uname</column>
		<column target="user_pass"           type="str">pn_pass</column>
		<column target="user_nicename"       type="str">pn_name</column>
		<column target="user_email"          type="str">pn_email</column>
		<column target="user_url"            type="str">pn_url</column>
		<column target="user_registered"     type="sql">CURDATE()</column>
		<column target="user_activation_key" type="int">1</column>
		<column target="user_status"         type="int">1</column>
		<column target="display_name"        type="str">pn_name</column>
	</table_copy>
	<table_copy source="nuke_stories_cat" target="wp_categories">
		<column target="cat_ID"               type="int">pn_catid</column>
		<column target="cat_name"             type="str">SUBSTRING( pn_title, 0, 55 )</column>
		<column target="category_nicename"    type="str">pn_title</column>
		<column target="category_description" type="str">''</column>
	</table_copy>
</migrate_data_spec>

You can have multiple <table_copy> tags that tell migratedata how to copy data from a source table to a target table.

For the type attribute you should normally use the int type for literal numbers or columns containing numbers, and the str type for most other things (including dates).

Just examine your database tables and see which tables and columns map to each other in the two systems. It is possible to insert simple SQL into the copytable XML, but more complex transformations are not possible. If you need to perform merges, pivots or anything like that, migratedata is not the right tool for the job. You have two options: a) add the required functionality to migratedata e.g. by creating a <merge> tag and implementing the functionality you need or b) finding a different tool. migratedata is designed to be simple to use, and especially easy to create spec files, to encourage people to contribute spec files for lots of different migrations.

NOTE: If you want to evaluate some SQL, you will normally still want to use the str type. The only time you want the sql type is when you want your SQL to be evaluated at the very end, just as the data is inserted into the target table. Any SQL you want evaluated when you're sucking data out of the source table should have a type of str, or int if it's going to return an integer value.

The above example shows the different types of SQL evaluation. The cat_name column of the second table uses the SUBSTRING SQL function to shorten the value of pn_title in the source to ensure it fits in the target column. This SQL will be executed when the data is being sucked out of the source, so it has a str type. Similarly, the ID column of the first table uses some very simple SQL to add 100 to the value of pn_uid in the source table to ensure the ids don't clash with anything currently in the target database.

In contrast, the user_registered column in the first table contains some SQL we want to preserve until the very last moment. We don't want the date to be the time when we sucked the data out of the old database - we want it to be the date when we actually add the user to the target, so we give it a sql type, so the literal SQL "CURDATE()" appears in our INSERT script and is only evaluated when we run the insert.

Migrating non-MySQL databases: creating an md_<database>.py file

To modify migratedata to support a new database type, all you need to do is create a new file called md_<database>.py, where <database> is the name of your database.

For example, if we were adding support for postgresql, we would create a file called md_postgres.py and put it inside the migratedata directory, next to md_myqsl.py.

Then we would need to implement two functions: get_db_cmd and generate_table_copy. The easiest way to see how they work is to look at the versions in md_sql.py and modify them for your new platform.

Probably a lot of things will be the same, but there are likely to be differences with different SQL syntax, and there will certainly be differences with which database client programs are run.

To test your database type, add an import statement at the top of migratedata.py, e.g.

import md_postgres

and modify the line just below that reads

md_db = md_mysql

To make it say, e.g.

md_db = md_postgres

Now you should be able to run migratedata as normal and see it using your new platform. If you get another platform (partially) working, please do send your code to the mailing list and we'll incorporate it into the next release when we can get it working reliably.

If variations in SQL syntax need to be reflected inside the SQL in actual spec files, code changes will be needed in migratedata itself to support this (and we'll need to decide on an appropriate syntax).

Check out Andy Balaam's home page, and Andy Balaam's blog for news of this and other projects.

Licence

migratedata is Copyright (C) 2007 by Andy Balaam.

migratedata is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This licence is contained in the file COPYING.txt.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 US