MigrateFileTask misses out about 33% of files


#1

Silverstripe Version: 4.4.0-rc1

Question:

I am working on a migration of a large site to 4.x. We are using 4.4 because we need un-hashed file URLs (although this problem exists for us in 4.3 as well). About 1 out of 3 files ends up with the red “file can not be found” indication in the assets admin, and we cannot find any reason why or pattern. It is extremely difficult to debug because each run of the migration task takes over 15 hours to run (on a server dedicated to the task, with 4 CPUs, 8 GiB of RAM and 10 GiB of swap space). MigrateFileTask runs to completion but fails silently on a large proportion of files.

I’m looking for anyone else who has experienced anything like this for mutual assistance.

My estimate of the failure rate is based on this query:

SELECT COUNT(*)
FROM File
WHERE ClassName <> 'SilverStripe\\Assets\\Folder'
AND  (FileFilename IS NULL OR FileFilename NOT LIKE '%/%');

#2

Have you tried running 4.4.x-dev? There has been a lot of work done since the RC which addresses issues with file migrations.


#3

Hi @odnoc,

Can you post a sample of DB entries for files that did not migrate correctly? It might help us identify a pattern.

Also if you could confirm that files that are not migrating are actually present in your file system that would be helpful.

There’s a known issue where if the filename in your database uses different casing than the file system, the task will think the file doesn’t exists (e.g.: “myfile.txt” vs “MyFile.TXT”)


#4

Just looking back at your query, this part will match any file that is in directory FileFilename NOT LIKE '%/%'. FileFilename contains the full name of your file including any directories.


#7

Right. That bit was to exclude files in the top level of assets.


#5

Are you aware of either devs or authors running a “filesystem sync” in 3.x? We’re suspecting that might have something to do with it. Used to be a button in the 3.x assets UI, I think we removed it in later releases. Or any manual uploading of files (via SFTP etc) which would have required this syncing?

Did you ever migrate the site between servers which might have different case sensitivity settings in the filesystem? Or download and re-upload asset snapshots on Windows?

By the way, we’re also working on better formatting of those errors, and are trying to make it really obvious when something couldn’t be migrated. How many files are you migrating, and how large is the assets folder? We’re still looking for experience values around how long migrations take.


#6

By the way, these are the cards in our “smoother upgrading” epic. Unfortunately I made this an internal epic, so can’t link to it. But you can infer the issue URLs from the screenshot :slight_smile:


#8

This site has been running for years. I would assign a high probability to filesystem sync having been run at some point.

As for case sensitivity, we are running production and migrations entirely on FreeBSD with either UFS or ZFS filesystems. Historically, long ago, the site would have passed through Mac OS. However, none of the problems we’ve seen have involved case mismatches.