How to make your critical data last decades

  • Post by Maxime Cote
  • Aug 03, 2020
post-thumb

Panic! The computer doesn’t boot anymore! It just gives us error messages. All the family photos were on this computer! What to do? We have some recent one in the camera but the years of photos are gone? What do you mean backups? We back them up on the computer from the camera…

Sadly many of us had to live thru a similar phone call at one point or another. Losing data is never good, especially valuable data like photos or notes. That’s why the longevity of your information is a critical concept to think about. The problem is that it can be tricky to know where to start. In part, because there’s a lot of moving pieces, it also requires some understanding of the services you use. The flip side is that taking the time to evaluate this is one of the best ways to prevent or mitigate disaster. Below are some of the concepts to look into or be aware of to help get started.

Important Data

First, let’s look at what is “important” data? There’s a lot of possibilities: photos, notes, legal documents, taxes filling, etc. Those types of data are essential because they are often impossible to replace. Uniqueness or difficulty in recreating is the main factor for the importance of data since those can be copied infinitely but not created again. Photos of years long gone can’t be taken back, and it’s the same with notes about hard-earned knowledge. That’s why you want them to last forever or as close to forever as possible. Leaving them untouched would make the chance of losing them quite high over the years. Now, how does one preserve pet photos for many years in different services/applications?

Lock-in

Let’s start with the more complex and tricky one since a lot of the rest depends on it. Lock-in is when an application makes it very hard or impossible for you to move away. Imagine a grumpy guard who takes your data, register it on your account, and then puts it in a locked safe that you don’t have the keys. It’s a “feature” of many stable applications that don’t want too many competitors. Suppose you trap all the customers, then no need to fear competitors or innovate as much. Thankfully it’s often seen as bad practice (which it is in my opinion), but that means they’re just subtler with it, so you don’t realize until it’s too late. Let’s see what it looks like in practice.

No export

The most obvious and dangerous one is no export capability; if you can’t get your data out of the service, you are effectively trapped in or give up your data. That same guard is in a soundproof booth, and the only available entry for you is depositing your data. Nowadays, this one is rare, with laws like the GDPR that require users to be able to access their data. It’s still critical to look into because without export; there’s nothing else you can do but trust the company. You need to believe that they will keep your data safe and won’t ever have bugs or outages. Sadly, if the service you’re using or planning to use doesn’t have export, there’s not much you can do. The best way against this is not using that application and voting with your wallet to signal them this is not right.

Feature lock-in

There’s also features lock-in, whereas the application will have one or more exceptional features that your usage depends upon. The line often gets a bit blurry with features, since it’s a big differentiator between service and applications. A way to think about it is that no critical use or workflow should require any unique features. If the only way you can take an excellent pet picture is with this unique camera and nothing else, it’s a problem. What can help against that is using the standard features instead of the unique one. You can also look at other alternative applications and only use the features they share. Finally, you can additionally rely on yourself and your skills instead of the service for that workflow, which will make you able to navigate any app. Your abilities and system shouldn’t be application dependent as much as possible.

Price lock-in

Sometimes there’s also price lock-in that can happen. That is often signaled with a very long deal to get a reasonable price, “Pay for five years in advance to save! Our price will go from 20$/month to 5$/month” deals. In some cases, those are legitimate deals, but they need to be looked into in detail first. The central problem with those deals is that they will make you feel pressure to keep using the application “since I paid for it already.” A simple way against this is to wait before giving in to those deals. If you’re a new user of the service, it’s probably better to use monthly subscription first, then move to those long term deals. Starting with monthly service will also give you a sense of the pacing of updates and reliability, so you know if the service will be around for the duration of the deal.

Export

Great, no lock-in in the application you’re using, and you can even export your data, so all is good, right? Not quite yet, we need to make sure those exports are correct and usable. There are two main categories for exports, backup of the current app, and migrating out of the app. If the software only offers the backup exports and no migration, you are still in trouble.

Backup only export

What does export for backup look like? They will often be in either binary format or a proprietary format made by the developer. You also cannot open it outside of using the application or cannot look into what’s inside. Imagine that guard again giving you the box of your data to bring home but no key. Those formats are suitable for backup, but because they require the app to work, you cannot move away to another service. An easy way to test for that is to export from the application and see the resulting data. If it’s a binary blob that nothing else can open, then you could be in trouble. If that’s the case, it’s still worth making sure that there’s no other, sometimes hidden, export that would be correct instead.

Migration export

Migration export is the other way of exporting your data and the one you want. That way will often create an archive (zip, 7zip, tar, etc.) of all your data and allow you to download it. You should be able to open this archive with any related 3rd party tool and look at the data inside. The data will often be JSON for the metadata and then the files themselves, but the actual format is very service dependent. In JSON’s case, you might not be able to use it right away, but you can easily create scripts and tools to extract the content in case of a problem. You just want to make sure it also contains all the other attachments and files; otherwise, it’s useless. If you can find all or at least most of your data inside then, it’s okay. Now you also know how to handle the migration to another service too.

Backup

Now that the application is not locking you in, you can safely export your data; we’re all good, right? All those are a big step in the right direction but not the full picture. Because not all your data lives in an application, and not all those software lives in the cloud. You don’t want to leave behind some poor data that turns out to be pretty important just forgotten. So you also need to take care of your local data with backups. I see backups in two prominent families, the “at rest” data and the “working” data. The main difference being how often you need to back it up and how small the “window of possible loss of data” can be before it hurts.

“At rest” backup

Backups that are “at rest” are things like games, operating system configuration, older documents, photos, and other files that don’t change often. Those would be fine if they are backed up every day at the same time. In those cases, the “window of possible data loss” would be around 24h. That would mean you would lose 24h of changes if you have to rollback in case of a problem. Since those document often doesn’t change for a long time, it should be fine to lose a day.

“Working” Backup

The other family is the “working” documents. Those are things you’re actively working on, and because of that, they need constant backup. Since you’re actively working on them, the “window of possible loss” needs to be as small as possible. Because if you were to lose 24h, it would be very painful and hard to redo all the work. Things that often change like your budget, planning, work documents, a presentation, and code would be devastating to lose a day on.

Three copy rule

In both cases, the principle for backups is the same, just with different tools. For critical data, you want to have three copies, one local, one remote, and one in the cloud.

Local

The local one will be an external drive that you connect to your computer for the backup or another machine entirely (that old computer you never could part with and gathering dust). That backup is the fastest to use in case of a problem, no download, no syncing, just copy it back and go. The downside is that if your external disk and your computer break, it’s game over. That might seem hard/unlucky, but natural disasters or fires are very good at making this real.

Remote

That’s where the remote part comes in; you want your backups to exists in a remote location, a friend house, work office, or family. Just make sure you send the data to them at regular intervals for safekeeping. Since they will be the fallback to get your data back, if you wait a month to get it there, you will have lost a month. Natural disasters like flooding or tornado could once again make you lose it all, but the chances are smaller now.

Cloud

Finally, the cloud part is the most common one, simply send your data to the cloud in safe storage. That is usually the most time consuming because you need to send your data across the internet and then back if you need it in an emergency. That’s why the first two methods are useful, as they are usually faster for sizable data archives. But with that, finally, our data are pretty much as safe as can be.

Backup software

That whole three copy rules can seem very daunting, and it can be but thankfully nowadays, many of the backups software can help. They will always handle at least two of those three automatically (usually local and cloud). Some also offer peer-to-peer sync for the remote copy where the other computer installs the software, and with permission, they receive the data for them to keep safe. That removes most of the overhead in setting it up.

Those applications work best for the “at rest” backup, but they will often only run once a day or on an interval, so they’re not as suited for the working files. For the “working” backup, you can use instead of a sync service like Dropbox, Google Drive, One Drive, or Tresorit to achieve most of it since those will sync and back up the data live as you change things. They will also possibly cover the three locations, if you sync across multiple computers in multiple locations, like work and home.

Encryption

The one final thing to make sure no matter what you choose is that the data is encrypted. It needs to be encrypted locally before leaving your computer to be sent to the cloud or another computer. As soon as the data leave your computer, there are chances for something malicious to happen, and local encryption makes sure no one can do anything. That is especially important if you trust the service with all of your data.