Getting off Big Tech: Hosting a personal cloud
- Publish Date
- Eric Newbury
Maybe the removal of Google’s free photo storage has you looking for alternatives. Or maybe, like me, you want to start utilizing Open Source software for reasons ofprivacy and personal values. In my previous post in the Getting off Big Tech series, we learned how to de-google your phone, but we haven’t gained much in the way of privacy or cost savings and still trust proprietary cloud services with all of our file/photo and messaging needs.
While not the best for most people, the first option is to simply live without cloud services. Store your photos on an external hard drive with daily backups, sync your photos over Bluetooth or USB, and send messages with SMS. Most of us remember a time when these were the only options, and some of us still happily use these methods. However, if you are someone who can live without the latest and greatest and the old way worked for you, there’s nothing wrong with keeping it simple.
Option two is to use and pay for alternative cloud services built sustainably on Open Source and utilize the expertise and infrastructure of these managed services to ensure a pain-free experience. If cost is your number one concern, this isn’t the ideal solution. But if you are already paying for cloud services (or paying with your personal information), then using Open Source projects is a fantastic alternative. You have certainty that your information is being treated in a way you are comfortable with, and you are contributing to an economy that is based on expertise and working hours.
The third option is to host your own cloud services from a home server or a server that you rent. While this option is free, don’t underestimate the amount of time you will have to put in initially setting up these services, as well as time for maintenance, upgrades, and troubleshooting. Even as I write this, my home server is currently offline after a power surge fried the device, and I’m out of the country and unable to do repairs. Thanks to the work I’ve put in to automate my home server setup, and nightly offsite backups, I’ve been able to restore my cloud to a virtual server on Digital Ocean until I can return home and do repairs.
Despite my cautionary tale, my last year of self-hosting has been relatively painless with almost no downtime—but you need to be someone who enjoys tinkering and be fairly diligent about security and backups. If this sounds like you, then carry on.
You’ve got another choice to make, renting a cloud server on something like Linode or DigitalOcean, or running your own server from your home. Again, this is a trade between convenience and reliability on the one hand and frugality and control on the other.
- No upfront cost
- Easy networking with a static public IP address
- Uptime guarantees from the cloud provider
- Global access to server even without working SSH
- Easy snapshot backups
- Expensive, pricing not optimized for large on-disk storage
- Performance not great for budget tier because of CPU sharing
- No monthly cost
- Personal satisfaction of knowing your data is completely in your hands
- Upfront cost of a server
- Server noise can be an issue depending on the device
- Extra complexity of setting up services within a local network with no public IP
- ISP usually charges extra for a static public IP address, dynamic DNS will be needed
- Onsite maintenance, not accessible from the internet if there is a networking issue
- No automatic backups
The “cloud server” option seems to be a better deal on the surface, especially now, considering my current outage, but the expense alone was enough to tip me towards the home server. File Storage/Drive service runs best when it can store data on disk, rather than external cloud “Object Storage” like S3, and virtual servers with large amounts of disk space are not cheap. I’m currently paying just under $50 a month for a server with 4 cores, 8GB ram, and 150GB of storage. I don’t need all those specs, but the cheaper options don’t let me easily optimize for disk space only. Performance is also not the best at times since the affordable plans are usually shared physical CPUs.
Setting up a home server
I started my journey using what I already had, an old tower computer with around 4GB memory, and a couple TB of Hard Disk space, but it quickly became clear that a good home server has one main deal-breaker … it must be QUIET! You never quite realize how loud a computer is until you are trying to have a cup of coffee and a conversation in the living room but can’t hear the other person talk over what sounds like a Boeing trying to achieve takeoff thrust.
I did some research and purchased an Intel Nuc 8 with 8GB RAM, a quad-core i5 CPU, and around 2.5TB of hard drive space for around $650. Some of the specs are definitely overkill for a server, but I’ll definitely need the storage space, and since I’ll be running several services, some of which will be doing image classification, the RAM and CPU power is a nice to have. The other main advantage is they are known for being incredibly quiet and fit in the palm of your hand, so you can tuck it away just about anywhere.
Even though it’s very quiet by default, I made a few BIOS updates to run the NUC in “quiet mode” which keeps the fan off most of the time, but still spins it up as needed for heavier workloads. I also configured it to auto-start when power is restored, so that if the house experiences a power outage while I’m away, the server will boot back up as soon as power comes back on. Another nice to have is to dim the LED lights on the front of the device since my server sits in the living room, and I don’t like it splashing blue LED light across the room when it’s dark.
Additionally, I know there is a setting for “Boot on LAN” which would give the network card the ability to boot the server when it receives a special network packet. This would let me remotely boot the server even if it was manually powered off for some reason and I wasn’t home to push the button. I’ve yet to run into a scenario where this was necessary, so I haven’t bothered to set it up, but it’s an interesting feature.
Now that you have a server, you need to consider the extra hurdles that come from running it behind a home router.
Because you only have one public IP address for your entire home network, and even though you have a server listening for requests inside that network, you need to configure your router to send traffic coming into the house to the server, and not to some other device on your network like your roommate’s phone.
Static local IP & port forwarding
Router settings differ dramatically per model. In general, you’ll need to log into the router admin portal and configure your server to have a static/reserved local IP address. Then, if you turn it off and on, it will be assigned the same IP every time, and no other device can steal it. Second, you need to setup “port forwards” for all the ports that will be in use by your services, so that the router will hand that traffic off to the server which should be listening for that traffic. For instance, I’ve forwarded ports like 443 for HTTPS traffic, 22 for SSH, and a few other ranges of ports for WebRTC traffic.
One thing to note is that some Internet Service Providers will block certain ports they deem to
be risky. For instance, my ISP blocks port 80, the one typically used for unencrypted HTTP traffic.
All my HTTP services run on SSL, so this isn’t a huge problem, but it just means I can’t get an
automatic redirect to https if I type in
mydomain.com without specifying the https prefix. My
browser settings automatically redirect all urls I type into https, so I don’t really have a
problem with this, but if you are sharing a link with someone over a messaging app, it’s worth
keeping in mind that you will need to send them the full
Another big “gotcha” I ran into was
“NAT-Loopback.” By default, many
routers—like the one that came from my ISP—don’t know how to forward traffic to
themselves. For instance, if you’re on your home wifi, and you try to load
resolves to your own public IP address, the router doesn’t know how to send traffic out the public
gateway and route it right back in. A common solution to this is to setup an internal DNS server for
your network that will resolve
mydomain.com to the server’s local IP address. Unfortunately in my
case, our default router also did not support setting a custom DNS server for your network.
In my case, the only solution was to purchase a new router that supported “NAT-Loopback” which just means it knows how to forward traffic back to itself, passing through the public gateway and back in with the proper port forwarding. When purchasing a router, NAT-Loopback never seems to appear in the specifications as a feature that it has or doesn’t have. I had to rely on forums and some obscure and potentially outdated support pages. Alternatively, you could get a router that supports setting an internal DNS server. In my case, it supports both, but I rely on NAT-Loopback.
Another annoyance with running a server off a home network is that ISP’s don’t guarantee that your
public IP address will not change randomly, unless you sign up for one of their business services
that are prohibitively expensive for my personal use. This means that your DNS records for
mydomain.com could end up pointing at an IP address that your server is no longer sitting behind.
In my case, my IP address hasn’t changed in over a year, so you could just pretend it’s static and
if it ever goes down, just update the DNS records manually with your new IP address (if you’re
traveling, hopefully you have a roommate who can tell you your new IP by logging into
Otherwise you’ll need to set up some kind of Dynamic DNS. This is basically just a daemon that runs on your server, checks your public IP address periodically, and if it differs from what it was before, it notifies your nameservers of the updated IP. Some DNS providers use existing Dynamic DNS solutions like the namecheap-ddns-client. In my case I’m using Digital Ocean’s DNS services, and wrote a simple python script that runs on a cron job that updates the records using Digital Ocean’s API.
Running your own server can be very messy business. My needs are always changing, and, surprise surprise, I usually don’t know what I’m doing the first few times I try to set up some service. I usually create a mess along the way. My first iteration setting up my server, I went free-form, running services directly on the box using systemd, littering configuration files and data all over the place. Before long it became clear that I needed something a bit more organized.
I considered Canonical’s “snaps” that ship with Ubuntu Server because they provide nice isolated environments for applications to run in, including all the dependent services like databases, caches, and web servers. However, this seemed a little too restrictive for me since I planned on adding a lot of different services. The perfectionist in me wanted to only run single instances of redis, postgres, and nginx, not one per service. Also, many of the services I wanted to run didn’t have snaps since it’s still a rather niche technology mainly used in the Ubuntu world.
Instead, I opted to dockerize everything. Just about everything out there has a dockerized version, and this allowed me to try out random services on my server temporarily. If it didn’t work out, I could just delete the docker container. Any data for the container that I wanted to keep around, or backup periodically, I could expose with a docker volume/mount but everything else was kept cleanly hidden away within the container. Most docker images are pretty slim, containing only one service (or configurable to do so), allowing me to give it access to an existing database or webserver.
I’ve also setup a service called Portainer that exposes an admin portal for Docker, allowing me to easily restart containers from a webportal if needed. This can be handy if I’m traveling and one of my services is misbehaving. I can login from my phone and restart the container without the need to SSH in from a workstation.
Because I am running many http services on one server, I have a lot of proxying needs. For example,
chat.mydomain.com are routed to different services listening on different
ports on the same box. Rather than using something like nginx, that would require a lot of
configuration every time I wanted to try out a new service, I started using Traefik, which is able
to listen to the docker daemon for new containers being added/removed/changed and utilizes labels on
those containers to determine the routing. So in order to try out a new service, I can just add a
container with a few labels with routing rules, and Traefik will automatically start directing
traffic to that service. It also has the ability to handle fetching SSL certificates automatically,
so you don’t need to rely on an external deamon like certbot.
This is the big one. Even using docker, there are many configuration files, cron jobs, and system settings that I’ve settled on over time. Keeping track of all these manually would be a nightmare. I’ve settled on using Ansible to automate the setup of my server. This gives me a single source of truth for the setup of my server which I can update over time. It also provides version control. My setup is available at github.com/enewbury/cloud. I like Ansible so much that I’ve also automated the setup of my linux workstation as well. Feel free to base your Ansible playbook off of mine. But similar to dotfiles, eventually you’ll want to make them your own and do something that really works for you.
Not only does using Ansible organize and document my setup in one place, it allows me to spin up a new server in minutes. Granted, downloading backups takes a few hours. But that actual setup of an entire server is a matter of simply updating my secret inventory file with my new server’s IP (or just updating the DNS record to point at the new server) and running the playbook.
Speaking of backups: yes, backups, please. A “cloud” isn’t a backup in itself. Think of it more as just a synchronization service between your devices, not a backup. For instance, if you accidentally delete a file on your computer, that change will be synced immediately to your cloud drive, and be lost on all other devices for good. So, yes, you still want to have an offsite backup of all your cloud data.
I chose Backblaze because it is super cheap. I pay 1 or 2 cents a
month for a few hundred GB of data. It is also S3 compatible, making streaming data to it easy with
The actual backup itself is handled with Restic which creates
encrypted snapshots similar to Apple’s Time Machine, and integrates with
rclone, which streams to
backblaze. Just be sure not to lose your secrets ;)
On my server, I have a directory,
/opt/cloud, that contains all of my docker volume mounts, as well
database_dump directory. Every night I run a cron job which pauses my services so they don’t
create more data, dumps all my databases to the dump directory, and then runs a
restic backup on
/opt/cloud directory. Finally it cleans up old snapshots and pushes the new restic
“repo” to backblaze to be persisted.
Recently, because of the power surge, I had to do a restore in real life for the first time. The
process was relatively easy. I could build it into my Ansible playbook in the future with a
restore tag. For now, I point the DNS records at my new server, manually install rclone and
restic, initialize the repository using my secrets which I store separately in my password manager
(I purposefully don’t self-host my password manager since I don’t want all my eggs in one basket),
then pull down my
/opt/cloud directory. Then I simply run my playbook to set up all my
infrastructure. I then load my database dumps back in using
docker exec postgres psql < dump.sql.
The backups download took a few hours and cost a few dollars since downloads cost more than uploads and storage on Backblaze. The setup took only a few minutes and I was up and running.
If you go the route of running your own server, after the initial setup, your costs are pretty much zero, though you will spend some time doing admin work. If you rent a server, your mileage may very, but it’s likely you may spend a bit more than you do currently. However, you can add as many services as you want. And since the load is light, you probably won’t incur any additional cost.
Running your own personal cloud is not without its problems, but in the end, it gets the job done. As all my friends will tell you, my favorite phrase is “embrace the jank!”
This journey is not for everyone. But based on the fact that I went over a year with no significant downtime, and was able to get everything up and running again on a new server in an afternoon, it’s definitely doable. And I’m never going back.