Information Technology Services (ITS)
Data Backup and Testing Best Practices
1. Purpose
This document outlines best practices for protecting university data through proper backup configuration, monitoring, and recovery testing.
The UTORrecover central backup platform provides the technical capability to store and retain backup copies of data. However, the service or data owner remains fully responsible for:
-
Determining what data must be backed up
-
Scheduling backups
-
Monitoring backup success and failures
-
Validating backup integrity
-
Performing and documenting recovery tests
-
Ensuring that source data has not been compromised
Backup is not a set-it-and-forget-it activity. It is an ongoing operational responsibility.
2. Roles and Responsibilities
EIS Provides:
-
Backup infrastructure and storage
-
Approved backup tools and configuration guidance
-
Retention policy options
-
Technical support for platform-related issues
-
Secure storage and access controls
Service / Data Owner Is Responsible For:
-
Identifying critical data and systems
-
Scheduling backup jobs appropriately
-
Monitoring job success and investigating failures
-
Reviewing alerts and logs
-
Validating backup coverage
-
Performing regular restore tests
-
Ensuring data integrity before and after backup
-
Documenting recovery procedures
-
Confirming compliance with applicable regulatory requirements
If backups are not scheduled, monitored, and tested by the service owner, data protection cannot be assumed.
3. Define What Must Be Protected
Before configuring backups, service owners must clearly define:
-
What systems contain authoritative data
-
Where that data resides
-
How often it changes
-
Acceptable data loss window (Recovery Point Objective, RPO)
-
Acceptable downtime window (Recovery Time Objective, RTO)
Without defined RPO and RTO targets, backup frequency and retention cannot be properly configured.
4. Backup Scheduling Best Practices
Frequency
Backups should be scheduled based on business impact:
-
Critical transactional systems: frequent or continuous backups
-
Active file shares: daily minimum
-
Moderate-use systems: at least daily
-
Low-change systems: weekly may be sufficient
If in doubt, default to daily backups.
Retention
Retention should reflect:
-
Operational recovery needs
-
Ransomware resilience
-
Legal or regulatory requirements
A typical baseline approach:
-
Daily backups retained for at least 30 days
-
Weekly backups retained for 3 months
-
Monthly backups retained for 12 months or longer as required
Longer retention improves resilience against delayed ransomware detection.
5. Monitoring and Verification
A successful backup configuration today does not guarantee a successful backup tomorrow.
Service owners must:
-
Review backup job reports daily or per job schedule
-
Investigate failed or partially successful jobs immediately
-
Confirm that backup sizes and durations are consistent with expectations
-
Validate that new data sources are included in protection scope
Warning signs that require investigation:
-
Repeated warnings or partial successes
-
Sudden reduction in backup size
-
Unexpected increases in backup size
-
Jobs completing unusually quickly
Backup jobs that run but are never reviewed are effectively unverified.
6. Recovery Testing
Backups are only useful if data can be restored successfully.
Minimum Testing Standard
Service owners should perform recovery testing:
-
At least quarterly for critical systems
-
At least annually for non-critical systems
-
After significant infrastructure or application changes
-
After backup configuration changes
What Recovery Testing Should Include
-
Restoring files to an alternate location
-
Validating file integrity
-
Verifying application startup (if applicable)
-
Testing database consistency checks
-
Confirming permissions and metadata
A restore that has not been tested should not be assumed to work.
Document all restore tests, including:
-
Date of test
-
Scope of restore
-
Time to recovery
-
Issues encountered
-
Corrective actions taken
7. Ransomware and Data Integrity
Backups are a key control against ransomware, but only if properly managed.
Critical Risk
If source data has been encrypted, corrupted, or silently modified and this goes undetected, the backup system will faithfully copy that compromised data.
Without regular integrity validation, you may only discover compromise when attempting recovery.
Best Practices for Ransomware Resilience
-
Monitor for abnormal file change rates
-
Monitor for unusual file extensions or encryption patterns
-
Maintain sufficient retention to roll back beyond initial infection
-
Review backup logs for sudden increases in changed data volume
-
Test restoring data from older restore points
The service owner must ensure that:
-
Source systems are monitored for compromise
-
Compromise detection is not solely dependent on backup reports
-
Older restore points remain available long enough to recover clean data
Backup infrastructure is not a substitute for endpoint protection, patching, or monitoring.
8. Data Integrity Validation
Backup success does not equal data integrity.
Service owners should implement:
-
Periodic hash or checksum validation for critical datasets
-
Database consistency checks
-
Application-level validation after restore
-
Verification of file counts and directory structures
Where possible, automate validation processes.
9. Change Management
Backups must be reviewed whenever:
-
A system is upgraded
-
Storage locations change
-
New volumes are added
-
Applications are migrated
-
Cloud services are adopted
Many backup failures occur because systems change and backup scope does not.
Every infrastructure change should trigger a review of backup configuration.
10. Common Failure Scenarios
The following are common causes of failed recoveries:
-
Backups were configured but never monitored
-
Jobs were failing silently
-
New storage volumes were not added to the backup policy
-
Ransomware encrypted data weeks before detection
-
Retention window was too short
-
Restore procedures were never documented
-
Staff responsible for backups left and knowledge was not transferred
These failures are preventable with disciplined oversight.
11. Documentation Requirements
Each service owner should maintain a documented backup and recovery plan including:
-
Systems in scope
-
Backup schedule
-
Retention configuration
-
Location of backup logs
-
Restore procedures
-
Recovery test history
-
Responsible contacts
This documentation should be reviewed annually.
12. Accountability Statement
Participation in the central backup service provides technical capability and secure storage.
It does not transfer operational responsibility.
The service or data owner is accountable for:
-
Ensuring backups are configured correctly
-
Monitoring job outcomes
-
Testing restores
-
Confirming data integrity
-
Detecting compromise of source systems
If these responsibilities are not actively fulfilled, data recovery cannot be guaranteed.
13. Recommended Operational Checklist
Monthly:
-
Review job success rates
-
Review storage consumption trends
-
Validate inclusion of new data sources
Quarterly:
-
Perform documented restore tests
-
Validate RPO and RTO alignment
-
Review ransomware resilience posture
Annually:
-
Review retention policies
-
Update documentation
-
Reconfirm responsible contacts