Hero Image

Storage

Storage

In my current setup, I have a single FUSE-mountpoint that I can point software at, and it is none the wiser about what it really is.

My /data mountpoint

My /data consists of:

  • An ext2 partition on an mdraid stripe of 2x300GB NVMe SSD for caching and things that need to be really fast, mounted as /scratch.
  • A ZFS RAIDZ1/"RAID5" of 3x8TB, mounted as /local. It has redundancy and is fast enough. This is where I store most things initially.
  • A ZFS stripe of 2x4TB, mounted as /mnt/raid1. This has very fast sequential speeds, but low IOPS. This is for off-loading from the main pool.
  • A ZFS stripe of 3x1TB + 1x2TB, mounted as /mnt/raid2. This has more disks with slower sequential speeds, but higher IOPS. This is for off-loading from the main pool.
  • Google Drive mounted as /remote. This has very high overhead per file, but has unlimited storage (df will show 1PB). Great for long-term storage.

With this I combine insane speed with redundancy and higher IOPS, all into one mount, without any applications knowing on which disk is what.

Local storage

I trust that you know how to set up your own local storage, but if you're looking to combine the Google Drive with the local storage, I suggest you mount it at a different location before starting. For instance, if you have /data now, move it to something like /local. You'll still be using it later to move things to Google, so might as well use an easy-to-remember name.

Google Suite with Google Drive and Unlimited Storage

Please note that you can just sign up with 1 user, and it will give you unlimited storage, regardless of the 5-user requirement that they state on the website!

Getting Google Drive

On the Google Suite website, they sell "Google for Business" packages. For this, you need a domain name to sign up. It will include e-mail, and all other Google features for you. You will need to get the most expensive version per user for unlimited storage. This is about € 10,40 excl. VAT. Yeah, 10 bucks per month for unlimited storage, and no longer having to worry about redundancy, disk replacements and the sort.

If you haven't already, just sign up with a domain you already have, or get one. Google also sells domains, and lets you quickly manage things, but any local domain provider will do (including my favorite TransIP.

What to do and not to do

I just use the Drive of the user itself, not a Team Drive. Team Drives can have additional benefits, but also have additional disavantages. I've had many people run into problems with Team Drives, so be careful. I'd stick with just the regular Drive of the user.

If you had the idea of putting a lot of media files on there, note that Google stated that your Drive is like your local harddrive and that they don't care what you put on it. However, Google being Google, I'd recommend using encryption. rclone (which I'll get back to in a moment) comes with built-in encryption for you.

Do not use the "Share..." functionality on rclone data on Drive if you can avoid it, this has multiple reasons, which I won't get into here.

Create a folder on Drive into which you will store your rclone stuff, I call mine "Backups".

Rate limits

  • You can upload 1TB per authorized user per day.
  • You can download 5TB per authorized user per day.
  • There's a rate limit of actions/second so you can't just keep hammering the API. rclone is your friend in this.

Setting up rclone

You can get rclone on its website. Debian/Ubuntu packages are available. Keep this up-to-date! It's for your own good!

Basics

This can seem a little complicated, and there is an official guide. What you basically do is this:

  • rclone config
  • Choose n for "New remote".
  • Give it a name, I call mine d, but you can pick whatever. This remote you won't be using often when using encryption!
  • Tell rclone it's a "drive" storage (Google Drive).
  • You can leave the client_id and client_secret empty, though you can set up your own (see the guide).
  • Give rclone full access (option 1).
  • Since you're probably working remotely, choose "headless" or N here. Copy/paste the URL into your browser, log in with your new Google Suite account and authorize it.
  • Say no to "configure this as a team drive".
  • It will print the config on the screen and ask you to accept.

You now have a remote that can access your Google Drive storage and it will be unencrypted only.

Encryption

Time to set up encryption, by creating another remote (this also has a guide:

  • rclone config
  • Choose n for "New remote".
  • Give it a name, I call mine x since I type it quite often. You will be using this one relatively often, so pick something short and easy.
  • Tell rclone it's a "crypt" storage ("Encrypt/Decrypt a remote").
  • If you created that "Backups" folder on Drive earlier, now tell it to point to "d:Backups" (assuming you picked d in the one above).
  • Encrypt filenames (option 2).
  • Encrypt directory names (option 1).
  • You can enter your own password, and pass phrase for salt. YOU CAN NOT CHANGE THIS LATER, PICK VERY SECURE ONES IF YOU DON'T LET IT GENERATE THEM FOR YOU.
  • It will print the config on the screen and ask you to accept.

Now the "x"-remote will be fully encrypted and stuff will appear like hashed names in the Backups folder of your Google Drive when you start putting things there.

Caching

Google has pretty strict rate limits and you don't want (or need to!) ask it about everything on the Drive every single time.

If you're using the Drive in only one location (read: server), you can set up a pretty nice caching mechanism that rclone has. Do note that my setup requires about 25GB worth of (fast!) cache/scratch space, and I put mine on NVMe SSDs.

Anyway, time to configure it:

  • rclone config
  • Choose n for "New remote".
  • Give it a name, I call mine google since it's easy to remember. You won't be using this name very often, mostly for setting up a mount point after this.
  • Tell rclone that you want one of type "cache" ("Cache a remote")
  • Point it to "x:" here, as that's your encrypted remote.
  • The Plex options can be skipped, they're useless without Plex, and not so terribly useful with Plex, either.
  • I kept chunk sizes default, and you can configure how much space you want to let the chunk cache have on disk. I use 25GB.
  • It will print the config on the screen and ask you to accept.

Note that the cache will always get automatically updated by rclone if and only if you use rclone to upload or change things directly through the cache remote. More on that later.

In summary

We now have:

  • Local storage mounted on /local.
  • The actual unencrypted Drive available to us as remote d:.
  • The Backups directory on Drive available to use with on-the-fly en-/decryption as remote x:.
  • A cached version of that available to use as remote google:.

Testing

Some commands (in order) to check if everything is working:

rclone ls d:
rclone ls x:
rclone ls google:
mkdir /local/test
echo "This is a test." | tee /local/test/this_is_a_test.txt
rclone move /local/test google: # this moves the contents of inside /local/test into the root-level directory of google:
rclone cat google:this_is_a_test.txt

None should return errors, and the last command should show "This is a test.", pulled from the remote.

Mounting the remote

To test, you can do:

rclone mount --allow-other google: /remote

You can look there and you should find this_is_a_test.txt. You can delete it, too. When you're doing, do fusermount -uz /remote.

Persistent mount:

I don't recommend mounting it as root. I use a user data (uid/gid 1001004) for it. First, edit /etc/fuse.conf and make sure user_allow_other is enabled.

Be sure to replace my specific settings with yours below (like uid/gid). Also adapt your cache paths to suit your needs. Pick a fast disk (that gets TRIMmed) and/or tmpfs (RAM). Make sure /remote exists (and is empty), and then create a new systemd-Unit, as /etc/systemd/system/rclone.service as follows:

[Unit]
Description=RClone Mount
AssertPathIsDirectory=/remote

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/bin/rclone mount --config=/path/to/your/rclone/config/file --allow-other --log-file=/var/log/rclone.log --gid=1001004 --uid=1001004 --dir-cache-time 1h --fast-list --cache-chunk-path=/scratch/cache/rclone/chunks_mount --cache-db-path=/scratch/cache/rclone/db_mount --cache-dir=/scratch/cache/rclone/vfs google: /remote
ExecStop=/bin/fusermount -uz /remote
Restart=always
RestartSec=5
StartLimitInterval=60s
StartLimitBurst=3

[Install]
WantedBy=default.target

Do a systemctl daemon-reload, followed by a systemctl start rclone. When your Drive is getting bigger, it might take a while, but /remote should be populated soon, showing up in df, and contain this_is_a_test.txt. if you're satisfied, make it start upon boot time by issueing systemctl enable rclone.

Another summary

You should now have two main mounts:

  • /local with your, well, locally stored things.
  • /remote, a cached and encrypted Google Drive mount.

Combining the two (or more) with MergerFS

For this, you will want to use mergerfs, which is available in Ubuntu as a package with that name. So: apt install mergerfs. MergerFS has multiple advantages over the older UnionFS, of which one is hardlinking that just works. And it's more versatile.

Next, try:

mergerfs -o defaults,allow_other,use_ino,hard_remove,category.create=ff,category.action=ff,category.search=ff,fsname=data: /local:/remote /data

You can add other disks if you want, for instance my /mnt/raid1 and /mnt/raid2. Just :-seperate them into the /local:/remote there. For me, I appended them after those as they're less of importance and I just manually do things on them.

You should now have a combined mount of /local and /remote as /data. You can point applications to that and they won't know where's what. Unmount it with fusermount -uz /data.

Next, add it to systemd in a similar fashion. Create /etc/systemd/system/merger.service and do this:

[Unit]
Description=MergerFS Mount
AssertPathIsDirectory=/data
After=rclone.service
Requires=rclone.service
PartOf=rclone.service

[Service]
Type=forking
User=data
Group=data
ExecStart=/usr/bin/mergerfs -o defaults,allow_other,use_ino,hard_remove,category.create=ff,category.action=ff,category.search=ff,fsname=data: /local:/remote /data
ExecStop=/bin/fusermount -uz /data
Restart=on-abort
RestartSec=5
StartLimitInterval=60s
StartLimitBurst=3

[Install]
WantedBy=default.target

Do a systemctl daemon-reload, followed by a systemctl start merger. It should be immediately available in /data and show up in df. If you're satisfied, make it start upon boot time by issueing systemctl enable merger.

Note: I make this service depend on rclone.service.

Using rclone to move data

I've been using it in many ways and for a long time, the way I eventually settled upon was:

  • Copy data from /local to /remote with rclone copy /local google:.
  • Later, move data from /local to /remote with rclone move /local google:.
  • I usually do some big things manually cherry-picked rather than the whole thing in one go, but that's me. Or just copy and delete data manually, too.

I often add --stats=5s -v -v to see what it's doing or when it's hitting limits, but/and you can also specify a log file.

LEAVE MOVING TO TRASH ENABLED

LEAVE THE TRASH ENABLED, DO NOT DISABLE RCLONE'S 'MOVE TO TRASH`. It's easy to make mistakes and the Trash is sometimes the only way to fix it.