Guide - How to Build a Production App with Realm Sync

Introduction

The purpose of this guide is to lead you down a path when building an app with Realm sync that will set you up with a performant and scalable design. We will be prescriptive on how to build and design an app with Realm without exposing you to the minutiae of Realm’s inner workings so that the app will scale in an efficient manner thereby limiting re-factors, particularly on the client-side.

Explanation of Terms

Realm - Realm is a type of database known as an object database. It is relational in the sense that it has a schema but it does not use SQL to access, retrieve, and write data. Instead the native query language for each platform is used (e.g. iOS, Android) and Realm APIs are used to edit and write new data. Objects are stored in structures called realms that preserves the object graph of relationships between objects. A realm file is a representation of this data stored to disk in an ACID compliant manner. Access to the realm file is controlled by the Realm process which is facilitated through the developer interacting with Realm APIs in the language of their choice. It is not uncommon to have a handful of realms on an app and have hundreds of thousands of realms on a Realm Object Server.

Realm Object Server - the Realm Object Server is a node.js application that serves as the head-end or server-side component of the Realm Platform. It performs administrative functions such as authentication and permissions as well as storing and merging all data received from realm sync-enabled mobile clients. The Realm Object Server can be deployed as a single node for development or as a distributed cluster via Kubernetes for production and horizontal scale-out. Because the Realm Object Server is a node.js application it is configured and started just like any other node.js app - via an index.js or index.ts run in a folder directory.

Realm Platform - the Realm Platform is the combination of the Realm Object Server with a Realm sync-enabled mobile app. The platform enables a developer to abstract away the network, synchronization, and conflict resolution and focus on a building apps with great user experiences. The automatic synchronization of data from a mobile app to your head-end realm object server on either your infrastructure or our cloud enables use cases like offline and real-time - a breeze.

Realm Cloud - the Realm Cloud is a multi-tenant implementation of the Realm Object Server managed by the Realm operations team across multiple regions of AWS. The Realm team manages all of the server-side operations for you including log management, file size growth, high availability, failover, cluster deployment, and auto-scaling. Additionally, our cloud provides built-in monitoring, logging, and alerting that helps us identify root cause for errors or exceptions that may come up when running the app. Realm Cloud is the recommended way for testing Realm Sync with a proof of concept or running in production. A dedicated cloud instance is available for customers with performance or governance concerns.

Realm Sync Types - Follow this link for a more detailed explanation

Realm URLs - a Realm URL is the domain name or IP address where your Realm Object Server can be reached, whether deployed on your server or managed via Realm Cloud. When authenticating a user on a mobile app powered by Realm Sync a HTTP or HTTPS call is made to the /auth endpoint of Realm Object Server by using the http:// or https:// of the URL of the Realm Object Server. After successful authentication the Realm Sync mobile app will create a Realm Sync websocket connection with a realm:// or realms:// to the URL of the Realm Object Server. The Realm Sync URL contains a path after the domain name or IP address of the Realm Object Server to a particular Realm file on the server.

More details can be found here.

Realm Types - a realm on the Realm Object Server can be one of three different types and these types correspond to the type of Realm Sync you are using. A reference realm is used for query-based synchronization - it is the realm that queries are run against in order to generate and create what is known as a partial realm which is the results of a particular query for a certain user. It is important to keep an eye on disk usage since partial realms will be created per device per user and with large user bases this can add up fast. A full-type realm is for full-type sync and the entire realm is returned to the client.

General Design

The most important thing to consider when building an app with Realm is the design of the schema and how data will be shared between users.

The first thing to think about is how the app will generate data and how that data will be distributed to different users of the app along with how users will then interact with said data. Data can be either read-only or read/write to end-users or to a specific group of users of the app. Data can be read-only to some users and read/write to others - for instance, for all mobile users of an app the data could be read-only but then a backend application could have read/write permissions and the ability to insert new data which would then be distributed to all users subscribing to the data. This is known as a data-push or data-caching use case.

Read-only data is common information that is necessary for the app to function or boot-up, such as type tables, items in the inventory for a retail app, or roster information for a sales or a team management app.

Read/write data is data that is generated by end-users and either shared with other users or sent to the backend for safekeeping or to trigger processing by server-side applications. One of the key advantages of building an app with Realm Sync is the automatic conflict resolution - as end-users share data they will always have a deterministic view of the data even when going offline and making changes. he local Realm will always accept new changes even when offline; when a connection is restored, they will then be synced to the Realm Object Server where a master record is merged and then distributed to all users subscribed to the data.

Full Sync and Query-Based Sync

The schema design is predicated on the method of sync used. Realm has two modes of sync: Full and Query-Based

Query-Based Sync is analogous to a GET request in a REST API. The mobile client will submit a query to the backend ROS where the Realm Object Server then processes this query against a master or reference realm and the result set is synced down to the client. The design of query-based sync makes the queries easy to reason about because the developer simply needs to make a query on a parent object and all child objects will be pulled in automatically, which is an advantage when designing a greenfield app. The backend architecture is also easy to reason about as everything goes into a single realm which is analogous to one large database. This makes it doubly easy for an admin portal or BI tool to open the reference Realm and run queries across the entire dataset. However, because queries must be run on the backend server first and then synced down to the client, there is latency and processing associated with query-based sync compared to full-sync which will consume more physical resources such as memory, CPU, and disk, and depending on the queries it can be a lot more. This becomes especially important because the sync-clients create queries which the sync-server will always honor and return results for. An easy way to overload your sync-server is to architect your app in such a way where the user inputs a query parameter which creates a query subscription, as the app is used more and more these subscriptions pile up and the server continually updates old subscriptions which are no longer relevant to the user. Because of this you must always keep track of your subscriptions and be sure to unsubscribe when the user is done with that data in your app such as when you create a particular subscription based on a project task and then the user completes that task. We are always working to improve the performance of the Realm system, but as of December 2018 we recommend query-based sync for use cases where only smaller number of clients will all be connected concurrently, , and we always encourage developers to use our cloud when considering query-based sync. We expect the performance to continually improve in 2019, so we encourage you to confirm your app design with your Realm technical contacts. Additionally, the big-O notation or efficiency of a query, directly determines how fast query-based sync data can be returned to the sync-client and propagated to other sync-clients. For instance, a query that uses the IN predicate to match against an array of fields will be far less performant than a single predicate against a list of field values.

Full Sync is analogous to fetching a file at a particular path - your mobile client will get all data contained within the realm found at that URI path. This makes full-sync ideal for apps that need to scale to tens of thousands of concurrent connections or more; additionally, full-sync is ideal for scaling out the ROS system horizontally as load on the system increases. However, because each realm is a compartmentalized bucket of data where the user has either all or nothing, the developer must reason about how the app will split the data into these buckets from the beginning. This can be complicated because at this time there are only manual ways to link objects across realms which will lose the automatic conflict resolution built into every realm - the ability to query across multiple realms can also only be achieved manually. The conflict resolution algorithm is CPU intensive and this can be taxed if there are many writers submitting inserts of objects or setting updates on the same realm. Every use case is different but we recommend a team size of 20-30 writers per realm for a shared realm. If you are generating a lot of merge conflicts, the concurrency number will be lower than if each user is just uploading unrelated data. The volume of data also plays a role. The frequency of changes also plays a role. An ideal use case for full-sync is for a single writer, such as a backend application which inserts data into a global Realm which has read-only permissions associated to all mobile users - this can easily scale past tens of thousands of concurrent users. Another ideal use case for full-sync is where each user has their own individual or private realm; for instance, this could be used for user account info or preferences, or for an individual user’s shopping cart as part of a retail app. As long as only the user and/or a backend application interacts with the mobile app the performance is ideal and linear.

As your app grows, you may start to hit scaling challenges for a full-sync realm. When this happens, it’s time to start creating clones of your full-sync read-only realms. Let’s say you had a read-only /catalog realm for your retail and shopping app. To do this you will need to create a clone of the catalog app so there are two - /catalog1 & /catalog2. You can do this by either performing a file copy and rename or if you are using Realm Cloud by building a script that copies the data out of one and into the other. Anytime your backend app would need to update the catalog, it should now open and update both catalogs. On the client-side you will need to add app code that determines which catalog the sync-client should connect to - a simple check of the first character of a random GUID of each user would enable you to divide half to one realm and half to the other. This would require an app update in the event that further scale-out is needed. A more flexible method is to have the server assign you a url. This can be done though a per-user realm and allows you to transparently scale out with more replicas as needed.

Users

When using Realm Sync every user should be identified by a unique client ID called the userId which is a GUID or UUID auto-generated by ROS when it successfully logs in a user. The concept of a SyncUser is important when building an app with realm sync because you cannot get a realm reference and start writing data without a successfully authenticated user. This userId can be overwritten by the developer to define their own userId if needed for administrative purposes - for instance, to match a Realm userId with a user id, such as an email, stored as part of a custom authentication process. It is always recommended to give a unique userId to each sync-client that may sign on and use your app because you may want to identify certain misbehaving users in your system. Even if your app does not require the user to login and create a unique account we still recommend that your app, in user code, logs in with a randomly generated user that occurs in the background of the user using the app; we have created anonymous user explicitly for this purpose. Once you have successfully authenticated a user, realm will cache the SyncUser object locally, and the developer can quickly access this cached user by making a call to SyncUser.current if they need to open a realm on say a new ViewController. It is important to access this cached user first because making unnecessary calls to logIn a user can overwhelm the backend authentication process and bring down system performance.

Opening Realms

We always recommend on first app load to use the asyncOpen API of each respective binding to open the realm. One reason is that it is the only way to open a read-only realm but another advantage is that it is the most efficient way to download the realm since there is no client-side processing or merging that will kick in. The drawback is that the app must be online and connected to ROS in order to return a realm reference in the callback. You must also consider the case if there is a large amount of data to be downloaded - the user could sit there waiting if so. Care must be taken to make sure that the first realm can be downloaded in a reasonable amount of time. Many realms can be used on the client-side app - in fact, you should. Consider which data is necessary for the functioning of the app - this realm should be downloaded first. For instance, in a field worker app, the user should download type tables and a list of projects they could work on - once the user selects a particular project the app then opens the project realm which contains all of the relevant data associated with the project.

Once the realm is opened asynchronously, we recommend opening the realm synchronously from here on out. The merge algorithm on the client side will no longer be triggered once the schema is established. Synchronously opening the realm once the realm has been downloaded also enables an offline-first use case. We also recommend using synchronous open for all large data sets that are not contingent on the functioning of the app. For instance, in a retail app, you may want to asyncOpen the types of products, the locations of stores, and other business logic data, but the actual inventory of clothing should be synchronously downloaded in the background by the Realm process while the UI updates continuously.

If the app must be started in offline mode use case we recommend starting the app with a non-synced realm because a synced realm requires a valid sync user which can only be obtained by online successful communication with ROS. Once the app comes online and is able to obtain a valid sync user and open a permitted synced realm, the data that is contained within the non-synced realm should be transferred to the synced realm via user code script.

Sync-client Design

One of the great advantages of Realm is that any object which is presented in the UI to the user always represents the correct state of that object - an object never needs to be refreshed or re-fetched, a concept that we call “live objects.” However, in many Android or iOS architectures, especially in Reactive or LiveData designs, it is common to detach or abstract away the data layer from the UI. Realm supports these common practices, and a Realm object may be detached from the data layer and loaded as an unmanaged object to the view but it will not always represent the current state of that object since it has been loaded into memory. Care must be taken when writing an unmanaged object back into Realm because by default the write of the realm object will be interpreted as a completely new object to Realm sync. In order to avoid this we always recommended using the .copyToRealmOrUpdate flag which will perform an intelligent diff on the object before insertion and only update the fields if necessary.

Authentication

Always use JWT to authenticate the client because this ensures identity and we always recommend HTTPS in your app because it mandates encryption on the wire. We recommend leveraging a 3rd party token issuer to manage users and issuing tokens - it is quick, easy, and secure. You should always disable anonymous and nickname authentication after you complete your proof of concept. Every installation of your app should be uniquely identified as an individual user even if your app does not explicitly require the user to login - your app should perform a silent login in the app code that generates a GUID for that app installation.

Errors

We always recommend establishing a custom error handler for the Realm Sync protocol in the app so that you can handle exceptions in your user code. The most important error to handle is a client reset. This can occur when the history of the data of the sync-client is ahead of the history of the data of the sync-server or ROS. This situation most commonly occurs in a backup and recovery situation on the server side. In order to handle a client reset the developer must have user code in the client reset callback that takes the old realm, compares it to the realm just downloaded from ROS, and inserts the additive objects to the new realm.

Storing Images, Video, and Other Blob Data

While the Realm database can be used to store binary data we generally do not recommend storing blob data in Realm because it can be inefficient for Realm sync. The diffing algorithm for syncing changes works on a field level, so if a single piece of the image changes, the entire image will need to be re-synced instead of just the individual bits that changed. This same logic can be applied for large JSON blobs as well which are stored as large string fields -- if a single key-value changes, the entire JSON must be re-synced. Instead, if you wanted to use Realm sync for transferring images, we recommend using Realm sync to transfer the image to ROS and then have the Event Handler react to the new image, remove the image from Realm and store it in some object storage such as S3, and then simply store a reference to the URL in the Realm database. If the client wants to see the full image again they can pull the image from the URL via REST or file access.

Writing Data for Best Performance

When building an app with Realm, care must be taken when deciding when to use a write transactions and which data should be modified per transaction. This is because Realm sync works atomically -- it downloads or uploads data for a single transaction and then commits that transaction to disk before downloading the next transaction. This can become problematic if you bulk-load a lot of data into a single transaction because old mobile devices may not have the available memory or compute to hold the entire transaction in-memory before committing to disk. Additionally, acquiring and releasing the write lock for a transaction does consume resources so you should look to find a middle ground between a full bulk load and a transaction per object. The right number of objects per transaction will depend on your app and schema and design but a good rule of thumb to follow is 10,000 objects or 1MB per transaction.

Working with Threads

One important thing to consider when using Realm is that a realm is thread-confined; references to objects cannot be passed across threads. Because of this, there are several conveniences APIs in most bindings that help with performing writes on background threads,such as the executeTransactionAsync on Android. If no such convenience API is available to perform your work we recommend dispatching the work to a background thread but attaching a notification token to the work to know when the background job is done, then opening a new realm reference on the background thread, performing the work such as large write transaction or a long query, and then returning the result to main or UI thread. Be sure to close the realm or null out the realm reference opened on the background thread so that it can be garbage collected. A common cause for client side realms ballooning in size is forgetting to clean up all those realm references made on background threads. Be sure to implement the compactOnLaunch option when opening realms to help mitigate this problem but be careful because this will block the app if the realm size is massive.

Permissions

One of the more complicated but necessary features of the Realm Platform is permissions. When first building a Realm enabled app, by default the Realm data access is wide open. This is to allow for quick development cycles in building your prototype. However, before going into production be sure to lock down the data with our permissions system so that users only have access to the data they need. We always recommend first getting the basic design of your app done to ensure that everything is syncing and then starting to lock down the objects and realms specific users can view. Extensive documentation on how to use different permission systems is found in the section of the docs linked below. It is important to note that full sync and query-based sync use different permission models.

Query-Based Sync Permissions

Our recommendation is to always remove the canModifySchema permission so that users cannot change the schema design as this could affect other users of the app. While it may be useful during the initial app development and the POC phase to allow developers to quickly iterate on the schema, once multiple developers are using the same realm data for development testing, a fat-fingered schema can bring the rest of the developers to a halt. And certainly, this is something only a server-side app with a Realm admin account should have the ability to change in production.

General Recommendations

Knowing where to apply the permissions code is dependent on the use case of your application. For instance, if data is created by mobile users and then must be shared to other users then the permissions code should be kept on the sync-client. A common use case would be for a field-service app where the worker notices a job that needs to get done, creates a ticket, assigns the ticket to a group of users using object level permissions and assignsthe canRead permission to the role that contains the group.

If the data is mission critical with extensive security requirements and is often pushed down to mobile clients then it is recommended to keep the permissions in a server-side admin-only app. For example, a medical app might have different levels of access for their treatment data such as doctor, nurse, and patient. Each role would only see a subset of the treatment data contained in the treatment realm.

Server-side Design (Realm Object Server)

Years of experience and learning of running the Realm Object Server have gone into the production deployment of Realm Cloud. We always recommend going first to the Realm Cloud for the development of your app to take advantage of our knowledge and best practices. There are dedicated plans available for apps that require certain special customizations. If for compliance or governance reasons you cannot make use of a public or private cloud then please reach out to sales@realm.io to see if we can accommodate your specific need, and otherwise follow these best practices as closely as possible.

Deployment for Production

We recommend using the Realm Kubernetes Helm chart that deploys the Realm Object Server with default best practice settings in a distributed cluster when deploying in a self-hosted manner. Always deploy the distributed cluster version of the Realm Object Server not the single Node.js app which is more geared for local development and testing. It can be difficult to predict the usage pattern of the app until it is actually launched which could lead to excessive resource consumption. By deploying in cluster-mode the Realm Object Server can be scaled horizontally while the app is live in production whereas a single node ROS will need to be brought down in a maintenance window and then shifted to a distributed cluster deployment. Always automate the deployment of ROS using a configuration management tool like Ansible, Terraform, or Chef - building ROS in a docker is also recommended.

Storage

The Realm Object Server is a stateful application, so always deploy and attach it to a persistent volume - any server-side apps that use Realm SDKs should always be mounted with persistent storage. While you can and should use the built-in synchronous replication of Realm to enable a hot standby for sub-second failover, you should always regularly take file system backups of the volumes running the Realm Object Server. Using systems such as AWS EC2 Snapshots allows you to automate the backup to happen once a day or shorter which will enable you to redeploy in case of disaster recovery.

Disk Consumption

Realm keeps a log of all transactions that have occurred from the beginning of the Realm instantiation. After some time, the log will contain superfluous transactions that no longer matter to the final state on the sync-client device - for instance a transaction that inserts an object, modifies an object field, and then another transaction that modifies the object field to the original state - the end result is the same - the original object. While the semantics are important for conflict resolution, once the final state has been achieved, the logs should be truncated to improve performance to the sync-client. This can be done by setting a historyTtl on the ROS configuration to a time which represents a date in the past that the server will hold untruncated logs in the Realm.

If the historyTtl is too low and users do not launch the app within the transaction history timeframe, users will need to perform a full realm data download and will lose any new data created by their sync-client. If the historyTtl is too high users that do not frequently use the application will not need do a full Realm download but the Realm data may get large and initial download may take longer. The real decision to make when setting the historyTtl is how many days do you want to wait for a user to commonly sync. How likely is it for a user to create data and not sync via WiFi or cellular? The longer you wait the less likely they will lose data that has not been synced but the more likely you will impact the rest of users' performance. This is really a product/user experience question.

Once logs have been truncated the actual payload of logs will be shrunk but the size of the file on disk will remain the same. In order to reclaim the free space in the Realm file the operator should regularly vacuum their Realms, we recommend every hour or day - this is covered in detail here

When deploying any enterprise application including ROS, we always recommend deploying with an integrated external monitoring and logging system. A detailed tutorial can be found here

We practice and recommend a modern version of DevOps that prioritizes uptime of the application, so having a process monitor such as pm2 or the Linux systemd system restart ROS is paramount. But restarting the application, which may stabilize the system, may obfuscate the error and root cause of the crash. Exporting logs and metrics to an external system allows for operators and support to find and fix the problem.

Be sure to enable process start on server bootup - for instance, in pm2 it is done with pm2 startup command.

Adapter, Event Handler, GraphQL

The adapter and global notifier APIs are built upon the Realm Sync SDKs and as such they sync the realm they are connecting to and then react to changes which enables the developer to push or pull data into native or 3rd party APIs. Because of this it is always recommended to mount a persistent volume to apps which run the Adapter or Event Handler Realm APIs. Commonly, these APIs are utilized through the node.js Realm SDK, because of the non-blocking and asynchronous nature of Javascript, events can be fired before the user code of the previous event completes. This can lead to a situation where the Realm for each event data change is copied to disk causing the disk usage of the event handler to increase dramatically because the realm is only garbage collected upon the successful return of the user code response. Because of this we recommend blocking the event loop by using a Javascript promise or async/await to make sure that the change callback does not fire before receiving a result from user code.

The Realm Adapter and Event Handler APIs take, as an argument, a regular expression based on the Realm path URI. The best way to scale these Realm server-side apps is to reduce the scope of realms they react to. For instance, you may have an account realm such as account1, account2, … accountN. You can then deploy a cluster of Adapter/EventHandler apps that react to a subset of account realms based on their number.

Because the conflict resolution engine of Realm Sync can only scale to a certain factor, we recommend using a per-user private realm which the Event Handler reacts to new object insertions and inserts a meta object into a global read-only realm. For instance, you may have a Realm per-team to work on projects, but members of other teams may want to know about updates from other teams’ projects, instead of having a user open every team realm they want to get updates from, we recommend using the Event Handler to insert new updates to a global read-only project updates Realm which sync-clients can then subscribe to and then on-demand open the Realms of interest.

GraphQL

The Realm GraphQL service is a separate component of ROS that is useful for integrating with native or 3rd party APIs where the bindings Realm provides (.NET and node.js) differ. It is especially useful to push data or bulk load data via REST from other systems in your backend. We recommend running as a stand-alone instance when running in a self-hosted distributed cluster for scaling and performance reasons. This can also be used to enable an Admin Portal or Web API for access and editing of Realm data or for BI purposes. Because of the eventually consistent nature of Realm Sync we recommend monitoring the pending uploaded bytes endpoint on the GraphQL service to determine that the data you write via mutation has been synced to the Realm.