SS 4.x file migration performance tips

Overview

This guide is aimed at developers upgrading large sites to SilverStripe 4.x. It describes custom approaches to to file migration which can reduce go-live timeframes. The information contained in here is a starting point for advanced developers. It is not supported by the SilverStripe open source project, or by SilverStripe Ltd. If you find any improvements or bugs, please note them here and improve the guide (it’s a wiki).

Problem Statement

When upgrading to SilverStripe 4.x, files in the assets/ folder need to be migrated into a new asset storage mechanism. This is achieved through a file migration task. It relocates files into a new folder structure, updates database records, moves and/or generates thumbnails, and fixes <img> and <a> references. Depending on the size of your assets folder, this can take hours or even days (see system requirements).

By default, the task is run as a single threaded PHP script, which can only process at the speed of a single CPU core within PHP, and a single CPU core for sequential database queries. Increasing the speed of these cores, using a fast file system and allocating lots of RAM can help.

During the migration, content editing is often “locked” in order to avoid conflicts with already migrated data. Locking can be achieved by temporarily restricting access to the CMS, or simply agreeing to cease editing with your content authors. The potentially long migration runtimes can complicate such locking, since sites often rely on being updated frequently, or during emergencies.

There are two strategies to speed up migration, and reduce the impact/duration of locking the site for content editing. They can be combined for maximum effect.

Caveat: The recommendations below have been extracted from a successful migration on a website with ~150GB assets, but are considered an advanced usage scenario. You will need to understand how SilverStripe assets work in 4.x, have a general knowledge of how files are migrated, and tailor the guide to your specific website and hosting environment.

Caveats

  • SilverStripe Platform and CWP: If you are a customer of SilverStripe Platform or the New Zealand Common Web Platform (CWP), you don’t have direct access to the webserver filesystem or database. You will need to use database and asset snapshots. Large snapshots can cause issues (e.g. timeouts, disk space, download/upload), so we recommend that you work with Helpdesk to go through the steps outlined below. You should consult the CWP Upgrade Guide for helpful tips in this environment.
  • Always migrate to the latest available SilverStripe 4.x release
  • The TagsToShortcode task is significantly faster with this pull request which hasn’t been released yet (as of 04/09/2019). You could choose to run silverstripe/assets:1.4.x-dev to benefit from the performance improvements.

Strategy 1: Incremental Migration

Requirements

You need a separate environment, and the ability to copy files and database content to it.

Overview

The idea here is to run the bulk of the migration in the lead-up to go-live as a background task, without locking content editing. Once this is complete, you are ready for go-live. After locking your content editing for go-live, find all the file records which have been modified since that initial migration started, and incrementally migrate those. Since it’s unlikely that the majority of file records will have been modified in a few days, this migration should run a lot faster, minimising the time in which content authors can’t update the website. The partial migration relies on the fact that the file migration task only processes files which don’t have a value in the FileHash database column.

Caveats

  • Note that this migration does not account for files which are replaced or deleted after the “preparation” step below
  • The TagsToShortcode task can’t be run incrementally, and can take a long time - please check this ahead of time.
  • If you are using the silverstripe/secureassets module (default in CWP), files in protected folders will be moved to assets/.protected by the migration tooling. Any subsequent file sync (e.g. through rsync) will re-create these files in their original location as unprotected public duplicates.

Preparation

  • Copy your 3.x assets and database to the staging environment for your new 4.x website
  • Run the usual migration steps as part of 4.x (excl. dev/tasks/TagsToShortcode)
  • Run the file migration tasks (this might take a while)
  • Check that the migration has been successful

Go-Live

  • Lock content editing (talk to your content authors)
  • Copy new and updated 3.x assets into your 4.x staging environment (e.g. through rsync). Since filesystem locations on public files are the same between 3.x and 4.x, this should not cause duplicates.
  • Option 1: Copy database tables (on custom hosting)
    • Copy database tables into new ones with a _Partial suffix (in the same database)
  • Option 2: Export/import database tables (on SilverStripe Platform or CWP)
    • Export the already migrated File, File_Live and File_Versions tables from the 4.x database
    • In the exported files, suffix the table names with _Partial (to avoid overwriting existing data when reimporting)
    • Import the 3.x database into your 4.x staging environment (which will overwrite the previously migrated File* tables)
    • Import the exported files (resulting in three new tables: File_Partial, File_Live_Partial, File_Versions_Partial)
  • Run the SQL commands below to port over any previously migrated file records
  • Run other file migration tasks: dev/tasks/MigrateFileTask?only=move-thumbnails,fix-secureassets,fix-folder-permissions
  • Run dev/tasks/TagsToShortcodeTask
  • Optional: Run dev/tasks/MigrateFileTask?only=generate-cms-thumbnails. This can also run in the background after go-live, since missing CMS thumbnails don’t directly affect your website operation.
  • Check that the file migration has been successful
  • Delete the three temporary *_Partial database tables
-- Update File table from File_Partial
replace into File
select File_Partial.*
from File left join File_Partial
on File.ID = File_Partial.ID
where File_Partial.FileHash is not null;

-- Update File table from File_Live_Partial
replace into File_Live
select File_Live_Partial.*
from File_Live left join File_Live_Partial
on File_Live.ID = File_Live_Partial.ID
where File_Live_Partial.FileHash is not null;

-- Update File table from File_Versions_Partial
replace into File_Versions
select File_Versions_Partial.*
from File_Versions left join File_Versions_Partial
on File_Versions.ID = File_Versions_Partial.ID
where File_Versions_Partial.FileHash is not null;

Strategy 2: Parallelise Tasks

Requirements

A separate 4.x staging environment on different infrastructure which doesn’t interfere with your current 3.x production website. No direct access to the hosting environment is required, but we recommend that you keep an eye on your server capacity (monitoring, logging). You’ll need the ability to run queuedjobs on your environment.

Overview

The fastest way is to parallelise the task. This isn’t supported by the file migration task built in, but we’ve seen successful migrations with the customisations outlined below.

Caveats

  • The parallelisation only applies to the main “move-files” subtask. Other subtasks such as dev/tasks/TagsToShortcodeTask aren’t easy to parallelise.
  • You will need to determine what level of parallelisation is appropriate for your environment, which is mostly a factor of available memory. Disk and CPU performance also play a role. We recommend starting with as many parallel processes as CPU cores on your environment.
  • This migration task will use a lot of server resources, and should not be run on an environment that serves production traffic (which shouldn’t really be the case on your 4.x staging environment).
  • Always migrate to the latest available SilverStripe 4.x release

Preparation

Go-Live

  • Run dev/tasks/queue/FilePaginated?planQueue=1&subtasks=100&only=move-files. This will create jobs for your queue, which you can inspect via admin/queuedjobs. Your queue should start picking them up automatically.
  • Run other file migration tasks: dev/tasks/MigrateFileTask?only=move-thumbnails,fix-secureassets,fix-folder-permissions
  • Run dev/tasks/TagsToShortcodeTask
  • Optional: Run dev/tasks/MigrateFileTask?only=generate-cms-thumbnails. This can also run in the background after go-live, since missing CMS thumbnails don’t directly affect your website operation.
  • Check that the file migration has been successful
5 Likes