Data Backup and Testing Best Practices

1. Purpose

This document outlines best practices for protecting university data through proper backup configuration, monitoring, and recovery testing.

The UTORrecover central backup platform provides the technical capability to store and retain backup copies of data. However, the service or data owner remains fully responsible for:

Determining what data must be backed up
Scheduling backups
Monitoring backup success and failures
Validating backup integrity
Performing and documenting recovery tests
Ensuring that source data has not been compromised

Backup is not a set-it-and-forget-it activity. It is an ongoing operational responsibility.

2. Roles and Responsibilities

EIS Provides:

Backup infrastructure and storage
Approved backup tools and configuration guidance
Retention policy options
Technical support for platform-related issues
Secure storage and access controls

Service / Data Owner Is Responsible For:

Identifying critical data and systems
Scheduling backup jobs appropriately
Monitoring job success and investigating failures
Reviewing alerts and logs
Validating backup coverage
Performing regular restore tests
Ensuring data integrity before and after backup
Documenting recovery procedures
Confirming compliance with applicable regulatory requirements

If backups are not scheduled, monitored, and tested by the service owner, data protection cannot be assumed.

3. Define What Must Be Protected

Before configuring backups, service owners must clearly define:

What systems contain authoritative data
Where that data resides
How often it changes
Acceptable data loss window (Recovery Point Objective, RPO)
Acceptable downtime window (Recovery Time Objective, RTO)

Without defined RPO and RTO targets, backup frequency and retention cannot be properly configured.

4. Backup Scheduling Best Practices

Frequency

Backups should be scheduled based on business impact:

Critical transactional systems: frequent or continuous backups
Active file shares: daily minimum
Moderate-use systems: at least daily
Low-change systems: weekly may be sufficient

If in doubt, default to daily backups.

Retention

Retention should reflect:

Operational recovery needs
Ransomware resilience
Legal or regulatory requirements

A typical baseline approach:

Daily backups retained for at least 30 days
Weekly backups retained for 3 months
Monthly backups retained for 12 months or longer as required

Longer retention improves resilience against delayed ransomware detection.

5. Monitoring and Verification

A successful backup configuration today does not guarantee a successful backup tomorrow.

Service owners must:

Review backup job reports daily or per job schedule
Investigate failed or partially successful jobs immediately
Confirm that backup sizes and durations are consistent with expectations
Validate that new data sources are included in protection scope

Warning signs that require investigation:

Repeated warnings or partial successes
Sudden reduction in backup size
Unexpected increases in backup size
Jobs completing unusually quickly

Backup jobs that run but are never reviewed are effectively unverified.

6. Recovery Testing

Backups are only useful if data can be restored successfully.

Minimum Testing Standard

Service owners should perform recovery testing:

At least quarterly for critical systems
At least annually for non-critical systems
After significant infrastructure or application changes
After backup configuration changes

What Recovery Testing Should Include

Restoring files to an alternate location
Validating file integrity
Verifying application startup (if applicable)
Testing database consistency checks
Confirming permissions and metadata

A restore that has not been tested should not be assumed to work.

Document all restore tests, including:

Date of test
Scope of restore
Time to recovery
Issues encountered
Corrective actions taken

7. Ransomware and Data Integrity

Backups are a key control against ransomware, but only if properly managed.

Critical Risk

If source data has been encrypted, corrupted, or silently modified and this goes undetected, the backup system will faithfully copy that compromised data.

Without regular integrity validation, you may only discover compromise when attempting recovery.

Best Practices for Ransomware Resilience

Monitor for abnormal file change rates
Monitor for unusual file extensions or encryption patterns
Maintain sufficient retention to roll back beyond initial infection
Review backup logs for sudden increases in changed data volume
Test restoring data from older restore points

The service owner must ensure that:

Source systems are monitored for compromise
Compromise detection is not solely dependent on backup reports
Older restore points remain available long enough to recover clean data

Backup infrastructure is not a substitute for endpoint protection, patching, or monitoring.

8. Data Integrity Validation

Backup success does not equal data integrity.

Service owners should implement:

Periodic hash or checksum validation for critical datasets
Database consistency checks
Application-level validation after restore
Verification of file counts and directory structures

Where possible, automate validation processes.

9. Change Management

Backups must be reviewed whenever:

A system is upgraded
Storage locations change
New volumes are added
Applications are migrated
Cloud services are adopted

Many backup failures occur because systems change and backup scope does not.

Every infrastructure change should trigger a review of backup configuration.

10. Common Failure Scenarios

The following are common causes of failed recoveries:

Backups were configured but never monitored
Jobs were failing silently
New storage volumes were not added to the backup policy
Ransomware encrypted data weeks before detection
Retention window was too short
Restore procedures were never documented
Staff responsible for backups left and knowledge was not transferred

These failures are preventable with disciplined oversight.

11. Documentation Requirements

Each service owner should maintain a documented backup and recovery plan including:

Systems in scope
Backup schedule
Retention configuration
Location of backup logs
Restore procedures
Recovery test history
Responsible contacts

This documentation should be reviewed annually.

12. Accountability Statement

Participation in the central backup service provides technical capability and secure storage.

It does not transfer operational responsibility.

The service or data owner is accountable for:

Ensuring backups are configured correctly
Monitoring job outcomes
Testing restores
Confirming data integrity
Detecting compromise of source systems

If these responsibilities are not actively fulfilled, data recovery cannot be guaranteed.

13. Recommended Operational Checklist

Monthly:

Review job success rates
Review storage consumption trends
Validate inclusion of new data sources

Quarterly:

Perform documented restore tests
Validate RPO and RTO alignment
Review ransomware resilience posture

Annually:

Review retention policies
Update documentation
Reconfirm responsible contacts

Information Technology Services (ITS)