Migration to Universal File Field and GCS

Prev Next

Migration of the existing client data to GCS consists of the 2 main steps:
- DB Migration

- Sync with GCS

Step 1: Empty dir clean up (v. 19.0.5)

Starting from version 19.0.5 background job has been introduced to remove empty folders that had been historically created when item was created with no file uploaded to the file field.

This clean-up can potentially reduce the time needed for migration in v.20.

Clean-up process starts automatically and runs on the background. No performance impact on the running system is expected. However, it is possible to stop/resume the process in System Configuration > Scrips > Migrate Single File Fields to Multiple.

Clean-up runs only once and places the completion marker file in the Files folder.

Note that on very large systems clean-up process is time-consuming and can take 24+ hours.

It is recommended to run dry-run (next step) after folder clean-up is completed.

Step 2: Dry-run

Before starting dry-run it is recommended to make sure that empty directories clean-up job, that starts automatically when worker is up, has been finished. This will give a more precise estimation of the migration time.

 

Dry run is available to assess migration duration and potential issues/failures. 

Note!

Although a dry-run is unlikely to impose significant load on the system, it is recommended to perform it outside of peak hours.


Dry run does not put system into maintenance mode, meaning that system can be used while dry-run is running.

Rough estimation for actual migration time can be calculated as

Actual duration = dry-run time / 3

 

Step 3: DB migration (offline)

 

Please note!

The migration process utilizes 16 parallel threads, which will increase CPU load.

There is no empty-dir clean-up in 20.0.2. In case not run on v.19.0.5 can potentially increase migration time.

 

  • Migration is an upgrade task starting from v.20. 

  • Convert File Field to Universal File Field (Multiple File field with limit 1).

  • After migration is finished:

           - all single file fields in all modules and sites will be converted to the Universal file field                           - maintenance mode will be disabled and system will be available for use

           - possibility to add single file fields will be disabled by default. It will be possible though to                         enable this possibility again from the migration page.

  • Server logs will indicate conversion progress. For each module:field, the start, finish and 2-minute progress is logged. Additional detailed logs will be displayed in case of: migration of single item takes > 60s, number of version exceeds 250 or number of files is > 1000.

  • Verbose, item-level migration details are available at the debug level, intended solely for testing purposes due to the extensive log output.

  • Both active and inactive modules will be migrated.

  • Modules with enabled EFS will be skipped.

  • All the files remain in the same location (MediaFileRoot) after migration.

  • Preview and proofing preview files of the latest version of the file will be migrated.

  • Previously existing proofing previews will be relinked - not regenerated. Existing previews for single file will be migrated to the new media holder. Proofing and previous version previews are not expected to be impacted.

  • There will be no action applied to preview files associated to previous files. However, when making a previous version as current > preview generation for this specific file version will start automatically. For both renditions and proofing preview.

  • Migration state is stored in the DB in MigrationStatsFieldSffToMff table.

  • Starting from v.20.0.2 in case migration process has failed due to the container down, the process will be automatically resumed. Rollback is executed only for the field for which process was interrupted.

  • Note, that the rollback process for the FileEntity table is a heavy and time-consuming process for large number of entries in FileEntity that needs to be cleaned. Therefore, it may be faster to restore DB and start migration anew. 

  • It is possible to configure the number of parallel threads for migration in webapp.yaml. Default number of threads is 16.

  • In case of deadlocks during migration, there will be a retry of 5 times with 100ms interval.

 

Step 4: Sync to GCS (online)

  • Can only be started after DB migration has been successfully performed.

  • GCS feature flag must be enabled for the system and File Storage set to "GOOGLE".  In case any of the 2 is missing - synchronization will not start.

  • Migration should be started manually from System Configuration > Scripts > Sync to GCS.

  • Moving files will start as a background process and system can be used in the meanwhile.

  • All file entities with type LOCAL, size > 0 and no expiration will be processed.

  • Files will be moved to google cloud storage one by one until all processed. Data fetched with pagination 100 internally, but processed one by one.

  • After a file has been successfully migrate to GCS, file is deleted from the local FS. Meaning, that migrated files clean-up is done per file.

  • Progress reporting: every 100 files to log.

  • In case  connection to the bucket is lost during synchronization > the process will continue automatically as soon as bucket is available again.

  • After successful synchronization to the GCS it is possible to perfrom File System clean-up that will delete the subfolders and remaining .xml files under /MediaFileRoot directory.  Can be started from the same System Configuration > Scripts page. Precondtions are:
    - no single file fields present in any of the modules/sites

         - files migrated to GCS

If for any reason any of the files has not been migrated to the GCS, the following error will be present in logs and clean-up will not be done:

Validation failed. N files have not been migrated to Google Cloud Storage

In order to identify what exactly has not been migrated the following query can be used:

SELECT FROM_UNIXTIME(fe.uploadedTime/1000) as uploadedDate,FROM_UNIXTIME(fe.expiryTime/1000) as expiryDate, fe.name, fe.size, fe.location
FROM FileEntity fe WHERE fe.fileStorage = 'LOCAL' and fe.size > 0

Migration of the existing client data to GCS consists of the 2 main steps:
- DB Migration

- Sync with GCS

Step 1: Empty dir clean up (v. 19.0.5)

Starting from version 19.0.5 background job has been introduced to remove empty folders that had been historically created when item was created with no file uploaded to the file field.

This clean-up can potentially reduce the time needed for migration in v.20.

Clean-up process starts automatically and runs on the background. No performance impact on the running system is expected. However, it is possible to stop/resume the process in System Configuration > Scrips > Migrate Single File Fields to Multiple.

Clean-up runs only once and places the completion marker file in the Files folder.

Note that on very large systems clean-up process is time-consuming and can take 24+ hours.

It is recommended to run dry-run (next step) after folder clean-up is completed.

Step 2: Dry-run

Before starting dry-run it is recommended to make sure that empty directories clean-up job, that starts automatically when worker is up, has been finished. This will give a more precise estimation of the migration time.

Dry run is available to assess migration duration and potential issues/failures.

Note!

Although a dry-run is unlikely to impose significant load on the system, it is recommended to perform it outside of peak hours.


Dry run does not put system into maintenance mode, meaning that system can be used while dry-run is running.

Rough estimation for actual migration time can be calculated as

Actual duration = dry-run time / 3

Step 3: DB migration (offline)

Please note!

The migration process utilizes 16 parallel threads, which will increase CPU load.

There is no empty-dir clean-up in 20.0.2. In case not run on v.19.0.5 can potentially increase migration time.

  • Migration is an upgrade task starting from v.20.

  • Convert File Field to Universal File Field (Multiple File field with limit 1).

  • After migration is finished:

- all single file fields in all modules and sites will be converted to the Universal file field - maintenance mode will be disabled and system will be available for use

- possibility to add single file fields will be disabled by default. It will be possible though to enable this possibility again from the migration page.

  • Server logs will indicate conversion progress. For each module:field, the start, finish and 2-minute progress is logged. Additional detailed logs will be displayed in case of: migration of single item takes > 60s, number of version exceeds 250 or number of files is > 1000.

  • Verbose, item-level migration details are available at the debug level, intended solely for testing purposes due to the extensive log output.

  • Both active and inactive modules will be migrated.

  • Modules with enabled EFS will be skipped.

  • All the files remain in the same location (MediaFileRoot) after migration.

  • Preview and proofing preview files of the latest version of the file will be migrated.

  • Previously existing proofing previews will be relinked - not regenerated. Existing previews for single file will be migrated to the new media holder. Proofing and previous version previews are not expected to be impacted.

  • There will be no action applied to preview files associated to previous files. However, when making a previous version as current > preview generation for this specific file version will start automatically. For both renditions and proofing preview.

  • Migration state is stored in the DB in MigrationStatsFieldSffToMff table.

  • Starting from v.20.0.2 in case migration process has failed due to the container down, the process will be automatically resumed. Rollback is executed only for the field for which process was interrupted.

  • Note, that the rollback process for the FileEntity table is a heavy and time-consuming process for large number of entries in FileEntity that needs to be cleaned. Therefore, it may be faster to restore DB and start migration anew.

Step 4: Sync to GCS (online)

  • Can only be started after DB migration has been successfully performed.

  • GCS feature flag must be enabled for the system and File Storage set to "GOOGLE".  In case any of the 2 is missing - synchronization will not start.

  • Migration should be started manually from System Configuration > Scripts > Sync to GCS.

  • Moving files will start as a background process and system can be used in the meanwhile.

  • All file entities with type LOCAL, size > 0 and no expiration will be processed.

  • Files will be moved to google cloud storage one by one until all processed. Data fetched with pagination 100 internally, but processed one by one.

  • After a file has been successfully migrate to GCS, file is deleted from the local FS. Meaning, that migrated files clean-up is done per file.

  • Progress reporting: every 100 files to log.

  • In case  connection to the bucket is lost during synchronization > the process will continue automatically as soon as bucket is available again.

  • After successful synchronization to the GCS it is possible to perfrom File System clean-up that will delete the subfolders and remaining .xml files under /MediaFileRoot directory.  Can be started from the same System Configuration > Scripts page. Precondtions are:
    - no single file fields present in any of the modules/sites

         - files migrated to GCS

If for any reason any of the files has not been migrated to the GCS, the following error will be present in logs and clean-up will not be done:

Validation failed. N files have not been migrated to Google Cloud Storage

In order to identify what exactly has not been migrated the following query can be used:

SELECT FROM_UNIXTIME(fe.uploadedTime/1000) as uploadedDate,FROM_UNIXTIME(fe.expiryTime/1000) as expiryDate, fe.name, fe.size, fe.location
FROM FileEntity fe WHERE fe.fileStorage = 'LOCAL' and fe.size > 0