Scott Weston on March 10, 2014
Why is retaining the same IDs important when upgrading Drupal 6 to Drupal 7?
One of our clients is in the process of migrating from Drupal 6 to 7. One of the requirements for this migration is to retain the same IDs from the previous version of Drupal. In Part Two of this series I will detail the 5 steps I took to complete the migration. Did you miss post #1? Click back to set the scene.
Step One: Content Model
In the case of this project, the content model pretty much stayed the same. This was a big win for this data migration because there were going to be no major changes to the site. But we did tweak the content model in such a way that the client's future goals (post-D7 launch) would be easier to attain than if we stayed on Drupal 6 or just did straight port from D6 to D7.
Step Two: Figuring Out How to Migrate with the Same IDs
Once the content model was built out in Drupal 7, I was able to start formulating my plan for the migration. I tried a few different approaches to keep the same IDs that either failed miserably or kind of worked. Here are a few failed attempts that I can look back and laugh about now:
● First, I did a migration with the Migrate module, then ran a number of scripts to update the database directly, replacing the 'new' ID with the 'old' ID. I initially tested this with just one content type. It felt like a fragile approach to doing this and even more importantly, I felt that I was violating the "Don't Hack" ethos that is at the core of Drupal. This was an example of how not to approach my problem in the Drupal Way.
● Next, I looked into 'customizing' the Migrate and Feeds Importer modules to get the IDs in, but I quickly ran either WSODs or pages full of errors that made me quickly realize that this wasn't going to work.
I realized that was probably going to have to be a build my own solution from scratch. Personally, I had a few requirements that my migration process would need to accommodate based on how the client's site is architected:
● Straight-forward process for getting data out of the Drupal 6 site,
● The ability to run the import and then 'roll back' if something needed to be changed,
● Have separate processes for the initial import and then an 'update' of entities (more on this later), and
● Access to all of the referenced nodes and groups that were in the Drupal 6 site.
Step Three: The E is for Extract
Since I was building this migrate process from scratch, I wanted to make it as easy possible to get the data out of Drupal 6. While the ability to write your own queries to get at data with the Migrate module is powerful, it would add complexity to my process that I wanted to avoid.
As I was looking at the old site while trying to come up with a scheme for the data export, I kept looking at the Devel data generated by the Devel module. That's when I had the 'Ah-ha!' moment: simply load up each object (node, taxonomy term, user) and save it as a JSON encoded string in the database.
I created a table to hold the JSON strings that serves as the container ship to carry the object from D6 to D7. For those that want the gritty details, here's the mysql table schema for the my_migrate table:
CREATE TABLE `my_migrate` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`nid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`uid` int(11) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
`tid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `vid` (`vid`),
KEY `tid` (`tid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
Next, I created a module with a batch that essentially looped through each of the users, nodes, vocabularies, and taxonomy terms. For each item, I loaded the object using user_load, node_load, taxonomy_get_term, etc., and then inserted it into my_migrate table with the appropriate identifying information (nid, content type, tid, vid, uid, etc.).
These attributes were added to the table to make it easy to find the data as needed once it's over in Drupal 7.
With 65,000+ objects in the Drupal 6 database, the Extraction phase of my ETL (Extract, Transform, Load), did take a little while to run, but the batch progress meter gave me confidence that things were happening. Once the extraction was completed, it was time to move on to the Transform and Loading.
Step Four: T and L
As I mentioned in an earlier, the project didn't have any major changes to the content model between Drupal 6 and Drupal 7. There were some minor tweaks made to allow the site to work better with how the client uses the site and with the new features that Drupal 7 offers. This included renaming two content types and adding/subtracting/moving a few fields on the site.
So as far as the Transform phase of my ETL went, I decided to merge the Transform and Load into one step, transforming the data as needed just before it is loaded into the new Drupal 7 site.
In Drupal 7, I did pretty much the reverse of what I did in Drupal 6. I made a batch script for each type of entity I was importing that loaded up the row and then decoded the object into a temporary variable. Then, taking cues from the Migrate module, I wrote a number of pretty simple classes that transformed the object from the Drupal 6 convention of $old_node->field_foo['value] to $new_node->field_foo[LANGUAGE_NONE]['value'].
Step Five: Test and Verify
Once the nodes and users were in Drupal 7, I gave myself plenty of time to poke around at the site to make sure that I didn't miss anything along the way. Being intimately familiar with the source data and site, along with being patient and meticulous in the programming of the import process, went a long way to making sure that once the data got in to Drupal, it was in really good shape.