The following page will describe a number of best practices that are important to consider when managing your own instance of the Realm Object Server.
We recommend using the Realm Kubernetes Helm chart that deploys the Realm Object Server with default best practice settings in a distributed cluster when deploying in a self-hosted manner. Always deploy the distributed cluster version of the Realm Object Server not the single node.js app which is more geared for local development and testing. It can be difficult to predict the usage pattern of the app until it is actually launched which could lead to excessive resource consumption. By deploying in cluster-mode the Realm Object Server can be scaled horizontally while the app is live in production whereas a single node ROS will need to be brought down in a maintenance window and then shifted to a distributed cluster deployment. Always automate the deployment of ROS using a configuration management tool like Ansible, Terraform, or Chef - building ROS in a docker is also recommended.
The Realm Object Server is a stateful application, so always deploy and attach it to a persistent volume - any server-side apps that use Realm SDKs should always be mounted with persistent storage. While you can and should use the built-in synchronous replication of Realm to enable a hot standby for sub-second failover, you should always regularly take file system backups of the volumes running the Realm Object Server. Using systems such as AWS EC2 Snapshots allows you to automate the backup to happen once a day or shorter which will enable you to redeploy in case of disaster recovery.
Realm keeps a log of all transactions that have occurred from the beginning of the Realm instantiation. After some time, the log will contain superfluous transactions that no longer matter to the final state on the sync-client device - for instance a transaction that inserts an object, modifies an object field, and then another transaction that modifies the object field to the original state - the end result is the same - the original object. While the semantics are important for conflict resolution, once the final state has been achieved, the logs should be truncated to improve performance to the sync-client. This can be done by setting a historyTtl on the ROS configuration to a time which represents a date in the past that the server will hold untruncated logs in the Realm.
If the historyTtl is too low and users do not launch the app within the transaction history timeframe, users will need to perform a full realm data download and will lose any new data created by their sync-client. If the historyTtl is too high users that do not frequently use the application will not need do a full Realm download but the Realm data may get large and initial download may take longer. The real decision to make when setting the historyTtl is how many days do you want to wait for a user to commonly sync. How likely is it for a user to create data and not sync via WiFi or cellular? The longer you wait the less likely they will lose data that has not been synced but the more likely you will impact the rest of users' performance. This is really a product/user experience question.
Once logs have been truncated the actual payload of logs will be shrunk but the size of the file on disk will remain the same. In order to reclaim the free space in the Realm file the operator should regularly vacuum their Realms, we recommend every hour or day - this is covered in detail here
When deploying any enterprise application including ROS, we always recommend deploying with an integrated external monitoring and logging system. A detailed tutorial can be found here. In addition the needs for logging during the development phase vs. production differs. Tips for configuration can be found here.
We practice and recommend a modern version of DevOps that prioritizes uptime of the application, so having a process monitor such as pm2 or the Linux systemd system restart ROS is paramount. But restarting the application, which may stabilize the system, may obfuscate the error and root cause of the crash. Exporting logs and metrics to an external system allows for operators and support to find and fix the problem.
Be sure to enable process start on server bootup - for instance, in pm2 it is done with pm2 startup command.
Before heading into production, make sure you have been through out preparation checklist. This document is full of critical topics that need to be addressed and implemented before going live.