Merge pull request #416 from bigchaindb/docs-on-rethinkdb-storage-setup

Docs on RethinkDB storage setup
This commit is contained in:
Troy McConaghy 2016-07-05 14:13:22 +02:00 committed by GitHub
commit 471f032c1a
5 changed files with 86 additions and 37 deletions

View File

@ -0,0 +1,38 @@
# Example RethinkDB Storage Setups
## Example 1: A Partition of an AWS Instance Store
Many [AWS EC2 instance types](https://aws.amazon.com/ec2/instance-types/) comes with an [instance store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html): temporary storage that disappears when the instance disappears. The size and setup of an instance store depends on the EC2 instance type.
We have some scripts for [deploying a _test_ BigchainDB cluster on AWS](../clusters-feds/deploy-on-aws.html). Those scripts include commands to set up a partition (`/dev/xvdb`) on an [instance store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) for RethinkDB data. Those commands can be found in the file `/deploy-cluster-aws/fabfile.py`, under `def install_rethinkdb()` (i.e. the Fabric function to install RethinkDB).
An AWS instance store is convenient, but it's intended for "buffers, caches, scratch data, and other temporary content." Moreover:
* You pay for all the storage, regardless of how much you use.
* You can't increase the size of the instance store.
* If the instance stops, terminates, or reboots, you lose the associated instance store.
* Instance store data isn't replicated, so if the underlying disk drive fails, you lose the data in the instance store.
* "You can't detach an instance store volume from one instance and attach it to a different instance."
The [AWS documentation says](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html), "...do not rely on instance store for valuable, long-term data. Instead, you can build a degree of redundancy (for example, RAID 1/5/6), or use a file system (for example, HDFS and MapR-FS) that supports redundancy and fault tolerance."
**Even if you don't use an AWS instance store partition to store your node's RethinkDB data, you may find it useful to read the steps in `def install_rethinkdb()`: [see fabfile.py](https://github.com/bigchaindb/bigchaindb/blob/master/deploy-cluster-aws/fabfile.py).**
## Example 2: An Amazon EBS Volume
TODO
Note: Amazon EBS volumes are always replicated.
## Example 3: Using Amazon EFS
TODO
## Other Examples?
TODO
Maybe RAID, ZFS, ... (over EBS volumes, i.e. a DIY Amazon EFS)

1
docs/source/appendices/index.rst Normal file → Executable file
View File

@ -12,5 +12,6 @@ Appendices
the-Bigchain-class
consensus
ntp-notes
example-rethinkdb-storage-setups
local-rethinkdb-cluster
licenses

View File

@ -2,13 +2,11 @@
This section explains a way to deploy a cluster of BigchainDB nodes on Amazon Web Services (AWS). We use some Bash and Python scripts to launch several instances (virtual servers) on Amazon Elastic Compute Cloud (EC2). Then we use Fabric to install RethinkDB and BigchainDB on all those instances.
**NOTE: At the time of writing, these script _do_ launch a bunch of EC2 instances, and they do install RethinkDB plus BigchainDB on each instance, but don't expect to be able to use the cluster for anything useful. There are several issues related to configuration, networking, and external clients that must be sorted out first. That said, you might find it useful to try out the AWS deployment scripts, because setting up to use them, and using them, will be very similar once those issues get sorted out.**
## Why?
You might ask why one would want to deploy a centrally-controlled BigchainDB cluster. Isn't BigchainDB supposed to be decentralized, where each node is controlled by a different person or organization?
That's true, but there are some reasons why one might want a centrally-controlled cluster: 1) for testing, and 2) for initial deployment, after which the control of each node can be handed over to a different entity.
Yes! These scripts are for deploying _test_ clusters, not production clusters.
## Python Setup

View File

@ -18,36 +18,31 @@ We don't test BigchainDB on Windows or Mac OS X, but you can try.
* If you have Mac OS X and want to experiment with BigchainDB, then you could do that [using Docker](run-with-docker.html).
## Memory Requirements
## Storage Requirements
Every OS has memory requirements; check the memory requirements of your OS.
When it comes to storage for RethinkDB, there are many things that are nice to have (e.g. SSDs, high-speed input/output [IOPS], replication, reliability, scalability, pay-for-what-you-use), but there are few _requirements_ other than:
There is [documentation about RethinkDB's memory requirements](https://rethinkdb.com/docs/memory-usage/). In particular: "RethinkDB requires data structures in RAM on each server proportional to the size of the data on that servers disk, usually around 1% of the size of the total data set." ([source](https://rethinkdb.com/limitations/))
1. have enough storage to store all your data (and its replicas), and
2. make sure your storage solution (hardware and interconnects) can handle your expected read & write rates.
For RethinkDB's failover mechanisms to work, [every RethinkDB table must have at least three replicas](https://rethinkdb.com/docs/failover/) (i.e. a primary replica and two others). For example, if you want to store 10 GB of unique data, then you need at least 30 GB of storage. (Indexes and internal metadata are stored in RAM.)
As for the read & write rates, what do you expect those to be for your situation? It's not enough for the storage system alone to handle those rates: the interconnects between the nodes must also be able to handle them.
## Memory (RAM) Requirements
In their [FAQ](https://rethinkdb.com/faq/), RethinkDB recommends that, "RethinkDB servers have at least 2GB of RAM... RethinkDB has a custom caching engine and can run on low-memory nodes with large amounts of on-disk data..." ([source](https://rethinkdb.com/faq/))
In particular: "RethinkDB requires data structures in RAM on each server proportional to the size of the data on that servers disk, usually around 1% of the size of the total data set." ([source](https://rethinkdb.com/limitations/))
Also, "The storage engine is used in conjunction with a custom, B-Tree-aware caching engine which allows file sizes many orders of magnitude greater than the amount of available memory. RethinkDB can operate on a terabyte of data with about ten gigabytes of free RAM." ([source](https://www.rethinkdb.com/docs/architecture/))
## Storage Requirements
The RethinkDB storage engine has a number of SSD optimizations, so you can benefit from using SSDs. ([source](https://www.rethinkdb.com/docs/architecture/))
If you want a RethinkDB cluster to store an amount of data D, with a replication factor of R (on every table), and the cluster has N nodes, then each node will need to be able to store R×D/N data plus the storage required for the OS and various other software (RethinkDB, Python, etc.). The secondary indexes also require some storage.
For failover to work, [every RethinkDB table must have at least three replicas](https://rethinkdb.com/docs/failover/), i.e. R ≥ 3.
Also, RethinkDB tables can have [at most 64 shards](https://rethinkdb.com/limitations/). For example, if you have only one table and more than 64 nodes, some nodes won't have the primary of any shard, i.e. they will have replicas only. In other words, once you pass 64 nodes, adding more nodes won't provide storage space for new data; it will only add more space for shard replicas. If the biggest single-node storage available is d, then the most you can store in a RethinkDB cluster is < 64×d: accomplished by putting one primary shard in each of 64 nodes, with all replica shards on other nodes. (This is assuming one table. If there are T tables, then the most you can store is < 64×d×T.)
RethinkDB has [documentation about its memory requirements](https://rethinkdb.com/docs/memory-usage/). You can use that page to get a better estimate of how much memory you'll need.
## Compatible File Systems
## Filesystem Requirements
RethinkDB "supports most commonly used file systems." ([source](https://www.rethinkdb.com/docs/architecture/))
It has [issues with BTRFS](https://github.com/rethinkdb/rethinkdb/issues/2781) (B-tree file system).
It's best to have a file system that supports direct I/O, because that will improve RethinkDB performance (if you tell RethinkDB to use direct I/O). Many compressed or encrypted file systems don't support direct I/O.
## CPU Requirements
Most servers will have enough CPUs (or vCPUs) to run a BigchainDB node. The more you have, the higher throughput will be.
RethinkDB "supports most commonly used file systems" ([source](https://www.rethinkdb.com/docs/architecture/)) but it has [issues with BTRFS](https://github.com/rethinkdb/rethinkdb/issues/2781) (B-tree file system).
It's best to use a filesystem that supports direct I/O, because that will improve RethinkDB performance (if you tell RethinkDB to use direct I/O). Many compressed or encrypted filesystems don't support direct I/O.

37
docs/source/nodes/setup-run-node.md Normal file → Executable file
View File

@ -28,17 +28,36 @@ NTP is a standard protocol. There are many NTP daemons implementing it. We don't
Please see the [notes on NTP daemon setup in the Appendices](../appendices/ntp-notes.html).
## Set Up the File System for RethinkDB
## Set Up Storage for RethinkDB Data
Ideally, use a file system that supports direct I/O (Input/Output), a feature whereby file reads and writes go directly from RethinkDB to the storage device, bypassing the operating system read and write caches.
Below are some things to consider when setting up storage for the RethinkDB data. The appendices have a [section with concrete examples](../appendices/example-rethinkdb-storage-setups.html).
TODO: What file systems support direct I/O? How can you check? How do you enable it, if necessary?
We suggest you set up a separate storage "device" (partition, RAID array, or logical volume) to store the RethinkDB data. Here are some questions to ask:
See `def install_rethinkdb()` in `deploy-cluster-aws/fabfile.py` for an example of configuring a file system on an AWS instance running Ubuntu.
* How easy will it be to add storage in the future? Will I have to shut down my server?
* How big can the storage get? (Remember that [RAID](https://en.wikipedia.org/wiki/RAID) can be used to make several physical drives look like one.)
* How fast can it read & write data? How many input/output operations per second (IOPS)?
* How does IOPS scale as more physical hard drives are added?
* What's the latency?
* What's the reliability? Is there replication?
* What's in the Service Level Agreement (SLA), if applicable?
* What's the cost?
Mount the partition for RethinkDB on `/data`: we will tell RethinkDB to store its data there.
There are many options and tradeoffs. Don't forget to look into Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS), or their equivalents from other providers.
TODO: This section needs more elaboration
**Storage Notes Specific to RethinkDB**
* The RethinkDB storage engine has a number of SSD optimizations, so you _can_ benefit from using SSDs. ([source](https://www.rethinkdb.com/docs/architecture/))
* If you want a RethinkDB cluster to store an amount of data D, with a replication factor of R (on every table), and the cluster has N nodes, then each node will need to be able to store R×D/N data.
* RethinkDB tables can have [at most 64 shards](https://rethinkdb.com/limitations/). For example, if you have only one table and more than 64 nodes, some nodes won't have the primary of any shard, i.e. they will have replicas only. In other words, once you pass 64 nodes, adding more nodes won't provide more storage space for new data. If the biggest single-node storage available is d, then the most you can store in a RethinkDB cluster is < 64×d: accomplished by putting one primary shard in each of 64 nodes, with all replica shards on other nodes. (This is assuming one table. If there are T tables, then the most you can store is < 64×d×T.)
* When you set up storage for your RethinkDB data, you may have to select a filesystem. (Sometimes, the filesystem is already decided by the choice of storage.) We recommend using a filesystem that supports direct I/O (Input/Output). Many compressed or encrypted file systems don't support direct I/O. The ext4 filesystem supports direct I/O (but be careful: if you enable the data=journal mode, then direct I/O support will be disabled; the default is data=ordered). If your chosen filesystem supports direct I/O and you're using Linux, then you don't need to do anything to request or enable direct I/O. RethinkDB does that.
<p style="background-color: lightgrey;">What is direct I/O? It allows RethinkDB to write directly to the storage device (or use its own in-memory caching mechanisms), rather than relying on the operating system's file read and write caching mechanisms. (If you're using Linux, a write-to-file normally writes to the in-memory Page Cache first; only later does that Page Cache get flushed to disk. The Page Cache is also used when reading files.)</p>
* RethinkDB stores its data in a specific directory. You can tell RethinkDB _which_ directory using the RethinkDB config file, as explained below. In this documentation, we assume the directory is `/data`. If you set up a separate device (partition, RAID array, or logical volume) to store the RethinkDB data, then mount that device on `/data`.
## Install RethinkDB Server
@ -50,7 +69,6 @@ If you don't already have RethinkDB Server installed, you must install it. The R
Create a RethinkDB configuration file (text file) named `instance1.conf` with the following contents (explained below):
```text
server-tag=original
directory=/data
bind=all
direct-io
@ -61,10 +79,9 @@ join=node2_hostname:29015
# continue until there's a join= line for each node in the federation
```
* `server-tag=original` is an optional line, but you'll be glad you included it later if you decide to create a set of backup-only servers as described in [the section on continuous backup](../clusters-feds/backup.html#incremental-or-continuous-backup).
* `directory=/data` tells the RethinkDB server process to store its share of the database data in `/data`.
* `directory=/data` tells the RethinkDB node to store its share of the database data in `/data`.
* `bind=all` binds RethinkDB to all local network interfaces (e.g. loopback, Ethernet, wireless, whatever is available), so it can communicate with the outside world. (The default is to bind only to local interfaces.)
* `direct-io` tells RethinkDB to use direct I/O (explained earlier).
* `direct-io` tells RethinkDB to use direct I/O (explained earlier). Only include this line if your file system supports direct I/O.
* `join=hostname:29015` lines: A cluster node needs to find out the hostnames of all the other nodes somehow. You _could_ designate one node to be the one that every other node asks, and put that node's hostname in the config file, but that wouldn't be very decentralized. Instead, we include _every_ node in the list of nodes-to-ask.
If you're curious about the RethinkDB config file, there's [a RethinkDB documentation page about it](https://www.rethinkdb.com/docs/config-file/). The [explanations of the RethinkDB command-line options](https://rethinkdb.com/docs/cli-options/) are another useful reference.