mirror of
https://github.com/bigchaindb/bigchaindb.git
synced 2024-10-13 13:34:05 +00:00
Merge pull request #5 from bigchaindb/improve-documentation
Improve documentation
This commit is contained in:
commit
6a08b72a41
@ -1,7 +1,62 @@
|
||||
# Server/Cluster Deployment and Administration
|
||||
|
||||
This section covers everything which might concern a BigchainDB server/cluster administrator:
|
||||
* deployment
|
||||
* security
|
||||
* monitoring
|
||||
* troubleshooting
|
||||
|
||||
|
||||
|
||||
## Deploying a local cluster
|
||||
One of the advantages of RethinkDB as the storage backend is the easy installation. Developers like to have everything locally, so let's install a local storage backend cluster from scratch.
|
||||
Here is an example to run a cluster assuming rethinkdb is already [installed](installing.html#installing-and-running-rethinkdb-server-on-ubuntu-12-04) in
|
||||
your system:
|
||||
|
||||
# preparing two additional nodes
|
||||
# remember, that the user who starts rethinkdb must have write access to the paths
|
||||
mkdir -p /path/to/node2
|
||||
mkdir -p /path/to/node3
|
||||
|
||||
# then start your additional nodes
|
||||
rethinkdb --port-offset 1 --directory /path/to/node2 --join localhost:29015
|
||||
rethinkdb --port-offset 2 --directory /path/to/node3 --join localhost:29015
|
||||
|
||||
That's all, folks! Cluster is up and running. Check it out in your browser at http://localhost:8080, which opens the console.
|
||||
|
||||
Now you can install BigchainDB and run it against the storage backend!
|
||||
|
||||
## Securing the storage backend
|
||||
We have turned on the bind=all option for connecting other nodes and making RethinkDB accessible from outside the server. Unfortunately this is insecure. So, we will need to block RethinkDB off from the Internet. But we need to allow access to its services from authorized computers.
|
||||
|
||||
For the cluster port, we will use a firewall to enclose our cluster. For the web management console and the driver port, we will use SSH tunnels to access them from outside the server. SSH tunnels redirect requests on a client computer to a remote computer over SSH, giving the client access to all of the services only available on the remote server's localhost name space.
|
||||
|
||||
Repeat these steps on all your RethinkDB servers.
|
||||
|
||||
First, block all outside connections:
|
||||
|
||||
# the web management console
|
||||
sudo iptables -A INPUT -i eth0 -p tcp --dport 8080 -j DROP
|
||||
sudo iptables -I INPUT -i eth0 -s 127.0.0.1 -p tcp --dport 8080 -j ACCEPT
|
||||
|
||||
# the driver port
|
||||
sudo iptables -A INPUT -i eth0 -p tcp --dport 28015 -j DROP
|
||||
sudo iptables -I INPUT -i eth0 -s 127.0.0.1 -p tcp --dport 28015 -j ACCEPT
|
||||
|
||||
# the communication port
|
||||
sudo iptables -A INPUT -i eth0 -p tcp --dport 29015 -j DROP
|
||||
sudo iptables -I INPUT -i eth0 -s 127.0.0.1 -p tcp --dport 29015 -j ACCEPT
|
||||
|
||||
Save the iptables config:
|
||||
|
||||
sudo apt-get update
|
||||
sudo apt-get install iptables-persistent
|
||||
|
||||
More about iptables you can find in the [man pages](http://linux.die.net/man/8/iptables).
|
||||
|
||||
## Monitoring the storage backend
|
||||
Monitoring is pretty simple. You can do this via the [monitoring url](http://localhost:8080). Here you see the complete behaviour of all nodes.
|
||||
One annotation: if you play around with replication the number of transaction will increase. So for the real throughput you should devide the number of transactions by the number of replicas.
|
||||
|
||||
## Troubleshooting
|
||||
Since every software may have some minor issues we are not responsible for the storage backend.
|
||||
If your nodes in a sharded and replicated cluster are not in sync, it may help if you delete the data of the affected node and restart the node. If there are no other problems your node will come back and sync within a couple of minutes. You can verify this by monitoring the cluster via the [monitoring url](http://localhost:8080).
|
||||
|
@ -17,21 +17,11 @@ data = "message"
|
||||
tx_hash = hashlib.sha3_256(data).hexdigest()
|
||||
```
|
||||
|
||||
## Keys
|
||||
## Signature algorithm and keys
|
||||
|
||||
For signing and veryfing signatures we are using the ECDSA with 192bit key lengths and
|
||||
[python-ecdsa](https://github.com/warner/python-ecdsa) as the python implementation.
|
||||
The signature algorithm used by BigchainDB is ECDSA with the secp256k1 curve
|
||||
using the python [cryptography](https://cryptography.io/en/latest/) module.
|
||||
|
||||
The public-key or verification key are converted to string and hex encoded before storing them to the blockchain. For example:
|
||||
|
||||
```python
|
||||
import binascii
|
||||
from ecdsa import SigningKey
|
||||
|
||||
# generate signing key in hex encoded form
|
||||
sk = SigningKey.generate()
|
||||
sk_hex = binascii.hexlify(sk.to_string())
|
||||
|
||||
# get signing key from hex
|
||||
sk = SigningKey.from_string(binascii.unhexlify(sk_hex))
|
||||
```
|
||||
The private key is the base58 encoded hexadecimal representation of private number.
|
||||
The public key is the base58 encoded hexadecimal representation of the
|
||||
compressed public numbers.
|
||||
|
@ -1,10 +1,10 @@
|
||||
##################
|
||||
The Bigchain Class
|
||||
Developer Interface
|
||||
##################
|
||||
|
||||
The Bigchain class is the top-level API for BigchainDB. If you want to create and initialize a BigchainDB database, you create a Bigchain instance (object). Then you can use its various methods to create transactions, write transactions (to the object/database), read transactions, etc.
|
||||
|
||||
.. autoclass:: bigchain.Bigchain
|
||||
.. autoclass:: bigchaindb.Bigchain
|
||||
:members:
|
||||
|
||||
.. automethod:: bigchain.core.Bigchain.__init__
|
||||
.. automethod:: bigchaindb.core.Bigchain.__init__
|
@ -1,19 +1,2 @@
|
||||
# Frequently Asked Questions (FAQ)
|
||||
|
||||
## Questions About the BigchainDB Whitepaper
|
||||
|
||||
**Question 1?**
|
||||
|
||||
Answer 1.
|
||||
|
||||
**Question 2?**
|
||||
|
||||
Answer 2.
|
||||
|
||||
## Other Questions
|
||||
|
||||
**Why do we use blocks and not just create the chain with transactions?**
|
||||
|
||||
With distributed data stores there is no guarantees in the order in which transactions will be commited to the database. Witouth knowing what is previous transactions to be commited to the database we cannot include its hash in the current transaction to build the chain.
|
||||
|
||||
To solve this problem we decided to use blocks and create the chain with the blocks.
|
||||
coming soon...
|
||||
|
@ -1,3 +1,128 @@
|
||||
# Getting Started
|
||||
|
||||
Some short examples of using BigchainDB, i.e. short tutorials.
|
||||
With BigchainDB and rethinkDB [installed and running](intalling.html) we can start creating and
|
||||
transferring digital assets.
|
||||
|
||||
##### Importing BigchainDB
|
||||
|
||||
First, lets open a python shell
|
||||
```shell
|
||||
$ python
|
||||
```
|
||||
|
||||
Then we can import and instantiate BigchainDB
|
||||
```python
|
||||
from bigchaindb import Bigchain
|
||||
b = Bigchain()
|
||||
```
|
||||
|
||||
When instantiating `Bigchain` witouth arguments it reads the configurations
|
||||
stored in `$HOME/.bigchaindb`
|
||||
|
||||
|
||||
##### Creating digital assets
|
||||
In BigchainDB only the federation nodes are allowed to create digital assets
|
||||
using the `CREATE` operation on the transaction.
|
||||
Digital assets are usually created and assigned to a user, which in BigchainDB
|
||||
is represented by a public key.
|
||||
|
||||
BigchainDB allows you to define your digital asset as a generic python dict that
|
||||
will be used has a payload of the transaction.
|
||||
|
||||
```python
|
||||
# create a test user
|
||||
testuser1_priv, testuser1_pub = b.generate_keys()
|
||||
|
||||
# define a digital asset
|
||||
digital_asset = {'msg': 'Hello BigchainDB!'}
|
||||
|
||||
# a create transaction uses the operation `CREATE` has no inputs
|
||||
tx = b.create_transaction(b.me, testuser1_pub, None, 'CREATE', payload=digital_asset)
|
||||
|
||||
# all transactions need to be signed by the user creating the transaction
|
||||
tx_signed = b.sign_transaction(tx, b.me_private)
|
||||
|
||||
# write the transaction to the bigchain
|
||||
# the transaction will be store in a backlog where it will be validated,
|
||||
# included in a block and written to the bigchain
|
||||
b.write_transaction(tx_signed)
|
||||
```
|
||||
|
||||
##### Reading transactions from the bigchain
|
||||
|
||||
After a couple of seconds we can check if the transactions was included in the
|
||||
bigchain.
|
||||
|
||||
```python
|
||||
# retrieve a transaction from the bigchain
|
||||
tx_retrieved = b.get_transaction(tx_signed['id'])
|
||||
|
||||
'id': '6539dded9479c47b3c83385ae569ecaa90bcf387240d1ee2ea3ae0f7986aeddd',
|
||||
'transaction': { 'current_owner': 'pvGtcm5dvwWMzCqagki1N6CDKYs2J1cCwTNw8CqJic3Q',
|
||||
'data': { 'hash': '872fa6e6f46246cd44afdb2ee9cfae0e72885fb0910e2bcf9a5a2a4eadb417b8',
|
||||
'payload': {'msg': 'Hello BigchainDB!'}},
|
||||
'input': None,
|
||||
'new_owner': 'ssQnnjketNYmbU3hwgFMEQsc4JVYAmZyWHnHCtFS8aeA',
|
||||
'operation': 'CREATE',
|
||||
'timestamp': '1455108421.753908'}}
|
||||
```
|
||||
|
||||
The new owner of the digital asset is now
|
||||
`ssQnnjketNYmbU3hwgFMEQsc4JVYAmZyWHnHCtFS8aeA` which is the public key of
|
||||
`testuser1`
|
||||
|
||||
|
||||
##### Transferring digital assets
|
||||
|
||||
Now that `testuser1` has a digital asset assigned to him he can now transfered it
|
||||
to another user. Transfer transactions now require an input. The input will a
|
||||
transaction id of a digital asset that was assigned to `testuser1` which in our
|
||||
case is `6539dded9479c47b3c83385ae569ecaa90bcf387240d1ee2ea3ae0f7986aeddd`
|
||||
|
||||
```python
|
||||
# create a second testuser
|
||||
testuser2_priv, testuser2_pub = b.generate_keys()
|
||||
|
||||
# create a transfer transaction
|
||||
tx_transfer = b.create_transaction(testuser1_pub, testuser2_pub, tx_retrieved['id'], 'TRANSFER')
|
||||
|
||||
# sign the transaction
|
||||
tx_transfer_signed = b.sign_transaction(tx_transfer, testuser1_priv)
|
||||
|
||||
# write the transaction
|
||||
b.write_transaction(tx_transfer_signed)
|
||||
|
||||
# check if the transaction is already in the bigchain
|
||||
tx_transfer_retrieved = b.get_transaction(tx_transfer_signed['id'])
|
||||
|
||||
{ 'id': '1b78c313257540189f27da480152ed8c0b758569cdadd123d9810c057da408c3',
|
||||
'signature': '3045022056166de447001db8ef024cfa1eecdba4306f92688920ac24325729d5a5068d47022100fbd495077cb1040c48bd7dc050b2515b296ca215cb5ce3369f094928e31955f6',
|
||||
'transaction': { 'current_owner': 'ssQnnjketNYmbU3hwgFMEQsc4JVYAmZyWHnHCtFS8aeA',
|
||||
'data': None,
|
||||
'input': '6539dded9479c47b3c83385ae569ecaa90bcf387240d1ee2ea3ae0f7986aeddd',
|
||||
'new_owner': 'zVzophT73m4Wvf3f8gFYokddkYe3b9PbaMzobiUK7fmP',
|
||||
'operation': 'TRANSFER',
|
||||
'timestamp': '1455109497.480323'}}
|
||||
```
|
||||
|
||||
##### Double Spends
|
||||
|
||||
BigchainDB makes sure that a digital asset assigned to a user cannot be
|
||||
transfered multiple times.
|
||||
|
||||
If we try to create another transaction with the same input as before the
|
||||
transaction will be marked invalid and the validation will trow a double spend
|
||||
exception
|
||||
|
||||
```python
|
||||
# create another transfer transaction with the same input
|
||||
tx_transfer2 = b.create_transaction(testuser1_pub, testuser2_pub, tx_retrieved['id'], 'TRANSFER')
|
||||
|
||||
# sign the transaction
|
||||
tx_transfer_signed2 = b.sign_transaction(tx_transfer2, testuser1_priv)
|
||||
|
||||
# check if the transaction is valid
|
||||
b.validate_transaction(tx_transfer_signed2)
|
||||
Exception: input `6539dded9479c47b3c83385ae569ecaa90bcf387240d1ee2ea3ae0f7986aeddd` was already spent
|
||||
|
||||
```
|
||||
|
@ -7,7 +7,6 @@ BigchainDB Documentation
|
||||
Table of Contents
|
||||
-----------------
|
||||
|
||||
Note to reviewers of this documentation: For now, *all* documentation files (.rst and .md) are at the same level in the Table of Contents heirarchy. Later, we can organize them into a more sensible heirarchy.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
@ -20,12 +19,10 @@ Note to reviewers of this documentation: For now, *all* documentation files (.rs
|
||||
faq
|
||||
release-notes
|
||||
software-architecture
|
||||
the-bigchain-class
|
||||
cryptography
|
||||
models
|
||||
json-serialization
|
||||
transaction-validation
|
||||
rethinkdb-benchmarks
|
||||
developer-interface
|
||||
|
||||
|
||||
Indices and Tables
|
||||
|
@ -1,7 +1,64 @@
|
||||
# Installing BigchainDB
|
||||
|
||||
Link to RethinkDB installation instructions
|
||||
BigchainDB works on top of [rethinkDB](http://rethinkdb.com/) server. In order to use
|
||||
BigchainDB we first need to install rethinkDB server.
|
||||
|
||||
How to install BigchainDB
|
||||
##### Installing and running rethinkDB server on Ubuntu >= 12.04
|
||||
|
||||
Installing with Docker?
|
||||
Rethinkdb provides binaries for all major distros. For ubuntu we only need to
|
||||
add the [RethinkDB repository](http://download.rethinkdb.com/apt/) to our list
|
||||
of repositories and install via `apt-get`
|
||||
|
||||
```shell
|
||||
source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt
|
||||
$DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
|
||||
wget -qO- https://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
|
||||
sudo apt-get update
|
||||
sudo apt-get install rethinkdb
|
||||
```
|
||||
|
||||
For more information, rethinkDB provides [detailed
|
||||
instructions](http://rethinkdb.com/docs/install/) on how to install in a variety
|
||||
of systems.
|
||||
|
||||
RethinkDB does not require any special configuration. To start rethinkdb server
|
||||
just run this command on the terminal.
|
||||
|
||||
```shell
|
||||
$ rethinkdb
|
||||
```
|
||||
|
||||
##### Installing and running BigchainDB
|
||||
BigchainDB is distributed as a python package. Installing is simple using `pip`
|
||||
|
||||
```shell
|
||||
$ pip install bigchaindb
|
||||
```
|
||||
|
||||
After installing BigchainDB we can run it with:
|
||||
|
||||
```shell
|
||||
$ bigchaindb start
|
||||
```
|
||||
|
||||
During the first run BigchainDB takes care of configuring a single node
|
||||
environment.
|
||||
|
||||
##### Installing from source
|
||||
|
||||
BigchainDB is in its early stages and being actively developed on its [GitHub
|
||||
repository](https://github.com/BigchainDB/bigchaindb). Contributions are highly
|
||||
appreciated.
|
||||
|
||||
Clone the public repository
|
||||
```shell
|
||||
$ git clone git@github.com:BigchainDB/bigchaindb.git
|
||||
```
|
||||
|
||||
Install from the source
|
||||
```shell
|
||||
$ python setup.py install
|
||||
```
|
||||
|
||||
##### Installing with Docker
|
||||
Coming soon...
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
We needed to clearly define how to serialize a JSON object to calculate the hash.
|
||||
|
||||
The serialization should produce the same byte output independently of the architecture running the software. If there are diferences in the serialization, hash validations will fail although the transaction is correct.
|
||||
The serialization should produce the same byte output independently of the architecture running the software. If there are differences in the serialization, hash validations will fail although the transaction is correct.
|
||||
|
||||
For example, consider the following two methods of serializing `{'a': 1}`:
|
||||
```python
|
||||
@ -24,7 +24,7 @@ deserialize(serialize(data)) == data
|
||||
True
|
||||
```
|
||||
|
||||
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interopelability.
|
||||
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interoperability.
|
||||
|
||||
```python
|
||||
import json
|
||||
@ -35,8 +35,8 @@ json.dumps(data, skipkeys=False, ensure_ascii=False,
|
||||
```
|
||||
|
||||
- `skipkeys`: With skipkeys `False` if the provided keys are not a string the serialization will fail. This way we enforce all keys to be strings
|
||||
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting ensure_ascii to `False` we allow unicode characters and force the encoding to `utf-8`.
|
||||
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e. g. an extra whitespace introduced in the serialization would not still create a valid json object but the hash would be different.
|
||||
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting `ensure_ascii` to `False` we allow unicode characters and force the encoding to `utf-8`.
|
||||
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e.g. an extra whitespace introduced in the serialization would not still create a valid JSON object but the hash would be different.
|
||||
|
||||
Every time we need to perform some operation on the data like calculating the hash or signing/verifying the transaction, we need to use the previous criteria to serialize the data and then use the `byte` representation of the serialized data (if we treat the data as bytes we eliminate possible encoding errors e.g. unicode characters). For example:
|
||||
```python
|
||||
@ -52,4 +52,4 @@ signature = sk.sign(tx_serialized)
|
||||
# verify signature
|
||||
tx_serialized = bytes(serialize(tx))
|
||||
vk.verify(signature, tx_serialized)
|
||||
```
|
||||
```
|
||||
|
@ -4,7 +4,7 @@ Transactions, blocks and votes are represented using JSON documents with the fol
|
||||
|
||||
## The Transaction Model
|
||||
|
||||
```
|
||||
```json
|
||||
{
|
||||
"id": "<sha3 hash>",
|
||||
"transaction": {
|
||||
@ -12,7 +12,7 @@ Transactions, blocks and votes are represented using JSON documents with the fol
|
||||
"new_owner": "<pub-key>",
|
||||
"input": "<sha3 hash>",
|
||||
"operation": "<string>",
|
||||
"timestamp": "<rethinkdb timestamp>",
|
||||
"timestamp": "<timestamp>",
|
||||
"data": {
|
||||
"hash": "<sha3 hash>",
|
||||
...
|
||||
@ -35,7 +35,7 @@ This can be changed in the future to allow multiple inputs per transaction.
|
||||
- `operation`: String representation of the operation being performed (REGISTER, TRANSFER, ...) this will define how
|
||||
the transactions should be validated
|
||||
- `timestamp`: Time of creation of the transaction in UTC
|
||||
- `data`: Json object describing the asset (digital content). It contains at least the field `hash` which is a
|
||||
- `data`: JSON object describing the asset (digital content). It contains at least the field `hash` which is a
|
||||
sha3 hash of the digital content.
|
||||
- `signature`: ECDSA signature of the transaction with the `current_owner` private key
|
||||
|
||||
@ -59,7 +59,7 @@ Still to be defined when new blocks are created (after x number of transactions,
|
||||
or both).
|
||||
A block contains a group of transactions and includes the hash of the hash of the previous block to build the chain.
|
||||
|
||||
- `id`: sha3 hash of the current block. This is also a rethinkdb primary key, this way we make sure that all blocks are unique.
|
||||
- `id`: sha3 hash of the current block. This is also a RethinkDB primary key, this way we make sure that all blocks are unique.
|
||||
- `block`: The actual block
|
||||
- `timestamp`: timestamp when the block was created
|
||||
- `transactions`: the list of transactions included in the block
|
||||
@ -85,7 +85,7 @@ This is the structure that each node will append to the block `votes` list.
|
||||
"previous_block": "<id of the block previous to this one>",
|
||||
"is_block_valid": "<true|false>",
|
||||
"invalid_reason": "<None|DOUBLE_SPEND|TRANSACTIONS_HASH_MISMATCH|NODES_PUBKEYS_MISMATCH",
|
||||
"timestamp": "<rethinkdb timestamp of the voting action>"
|
||||
"timestamp": "<timestamp of the voting action>"
|
||||
},
|
||||
"signature": "<ECDSA signature of vote block>"
|
||||
}
|
||||
|
@ -1,3 +1,3 @@
|
||||
# Release Notes
|
||||
|
||||
This section has the release notes for each version of BigChainDB.
|
||||
This section has the release notes for each version of BigchainDB.
|
||||
|
@ -1,195 +0,0 @@
|
||||
# RethinkDB Benchmarks
|
||||
|
||||
## Goal
|
||||
|
||||
The goal was to test RethinkDB scalability properties, to understand its limits, and to see if we could reach a speed of 1M transactions per second.
|
||||
|
||||
## Terminology
|
||||
|
||||
### Settings
|
||||
|
||||
To test the writing performance of rethinkdb we have a process that inserts a
|
||||
block in the database in an infinite loop
|
||||
|
||||
The block is a valid block with small transactions (transactions without any
|
||||
payload). The entire block has around 900KB
|
||||
|
||||
```python
|
||||
while True:
|
||||
r.table(table).insert(r.json(BLOCK_SERIALIZED), durability='soft').run(conn)
|
||||
```
|
||||
|
||||
In `hard` durability mode, writes are committed to disk before acknowledgments
|
||||
are sent; in `soft` mode, writes are acknowledged immediately after being stored
|
||||
in memory.
|
||||
|
||||
This means that the insert will block until rethinkdb acknowledges that the data
|
||||
was cached. In each server we can start multiple process.
|
||||
|
||||
### Write units
|
||||
|
||||
Lets define `1 write unit` as being 1 process. For example in a 32 node cluster
|
||||
with each node running 2 processes we would have `64 writes`. This will make it
|
||||
easier to compare different tests.
|
||||
|
||||
### Sharding
|
||||
|
||||
Sharding in distributed datastores means partitioning a table so that the data
|
||||
can be evenly distributed between all nodes in the cluster. In rethinkdb and
|
||||
most distributed datastores there is a maximum limit of 32 shards per table.
|
||||
|
||||
In rethinkdb a `shard` is also called a `primary replica`, since by default the
|
||||
replication factor is 1. Increasing the replication factor produces `secondary
|
||||
replicas` that are used for data redundancy (if a node holding a primary replica
|
||||
goes down another node holding a secondary replica of the same data can step up
|
||||
and become the primary replica)
|
||||
|
||||
For these tests we are using 32 core ec2 instances with SSD storage and 10Gbps
|
||||
network connections (`c3.8xlarge`). For the tests we used either 32 or 64 node
|
||||
clusters all running on the same aws region.
|
||||
|
||||
These tests show rethinkdb performance and what we can expect from the database.
|
||||
This does not show the performance of the bigchain
|
||||
|
||||
## Tests
|
||||
|
||||
### Test 1
|
||||
|
||||
- **number of nodes**: 32
|
||||
- **number of processes**: 2 processes per node
|
||||
- **write units**: 32 x 2 = 64 writes
|
||||
- **output**: stable 1K writes per second
|
||||
|
||||
This was the most successful test. We are able to reach a stable output of 1K
|
||||
blocks per second. The load on the machines is stable and the IO is at an
|
||||
average of 50-60 %.
|
||||
|
||||
Other tests have shown that increasing the number write units per machine can
|
||||
lead to a stable performance up to 1.5K writes per second but the load on the
|
||||
nodes would increase until the node would eventually fail. This means that we
|
||||
are able to handle bursts for a short amount of time (10-20 min).
|
||||
|
||||
This test can be used has a baseline for the future in where 64 writes equal 1K
|
||||
transactions per second. Or that each write unit produces an output of
|
||||
`1000/64` writes per second, approximately 16 writes per second.
|
||||
|
||||
|
||||
### Test 2
|
||||
|
||||
- **number of nodes**: 32
|
||||
- **number of processes**:
|
||||
- 16 nodes running 2 processes
|
||||
- 16 nodes running 3 processes
|
||||
- **write units**: 16 x 3 + 16 x 2 = 80 writes
|
||||
- **expected output**: 1250 writes per second
|
||||
- **output**: stable 1.2K writes per second
|
||||
|
||||
Increasing a bit the number of write units shows an increase in output close to
|
||||
the expected value but in this case the IO around 90 % close to the limit that
|
||||
the machine can handle.
|
||||
|
||||
|
||||
### Test 3
|
||||
|
||||
- **number of nodes**: 32
|
||||
- **number of processes**:
|
||||
- 16 nodes running 2 processes
|
||||
- 16 nodes running 4 processes
|
||||
- **write units**: 16 x 4 + 16 x 2 = 96 writes
|
||||
- **expected output**: 1500 writes per second
|
||||
- **output**: stable 1.4K writes per second
|
||||
|
||||
These test produces results similar to previous one. The reason why we don't
|
||||
reach the expected output may be because rethinkdb needs time to cache results
|
||||
and at some point increasing the number of write units will not result in an
|
||||
higher output. Another problem is that as the rethinkdb cache fills (because the
|
||||
rethinkdb is not able to flush the data to disk fast enough due to IO
|
||||
limitations) the performance will decrease because the processes will take more
|
||||
time inserting blocks.
|
||||
|
||||
|
||||
### Test 4
|
||||
|
||||
- **number of nodes**: 64
|
||||
- **number of processes**: 1 process per node
|
||||
- **write units**: 64 x 1 = 64 writes
|
||||
- **expected output**: 1000 writes per second
|
||||
- **output**: stable 1K writes per second
|
||||
|
||||
In this case we are increasing the number of nodes in the cluster by 2x. This
|
||||
won't have an impact in the write performance because the maximum amount of
|
||||
shards per table in rethinkdb is 32 (rethinkdb will probably increase this limit
|
||||
in the future). What this provides is more CPU power (and storage for replicas,
|
||||
more about replication in the next section). We just halved the amount write
|
||||
units per node maintaining the same output. The IO in the nodes holding the
|
||||
primary replica is the same has test 1.
|
||||
|
||||
|
||||
### Test 5
|
||||
|
||||
- **number of nodes**: 64
|
||||
- **number of processes**: 2 process per node
|
||||
- **write units**: 64 x 2 = 128 writes
|
||||
- **expected output**: 2000 writes per second
|
||||
- **output**: unstable 2K (peak) writes per second
|
||||
|
||||
|
||||
In this case we are doubling the amount of write units. We are able to reach the
|
||||
expected output but the output performance is unstable due to the fact that we
|
||||
reached the IO limit on the machines.
|
||||
|
||||
|
||||
### Test 6
|
||||
|
||||
|
||||
- **number of nodes**: 64
|
||||
- **number of processes**:
|
||||
- 32 nodes running 1 processes
|
||||
- 32 nodes running 2 processes
|
||||
- **write units**: 32 x 2 + 32 x 1 = 96 writes
|
||||
- **expected output**: 1500 writes per second
|
||||
- **output**: stable 1.5K writes per second
|
||||
|
||||
This test is similar to Test 3. The only difference is that now the write units
|
||||
are distributed between 64 nodes meaning that each node is writing to its local
|
||||
cache and we don't overload the cache of the nodes like we did with Test 3. This
|
||||
is another advantage of adding more nodes beyond 32.
|
||||
|
||||
|
||||
## Testing replication
|
||||
|
||||
Replication is used for data redundancy. In rethinkdb we are able to specify the
|
||||
number of shards and replicas per table. Data in secondary replicas is no
|
||||
directly used, its just a mirror of a primary replica and used in case the node
|
||||
holding the primary replica fails.
|
||||
|
||||
Rethinkdb does a good job trying to distribute data evenly between nodes. We ran
|
||||
some tests to check this.
|
||||
|
||||
Note that by increasing the number of replicas we also increase the number of
|
||||
writes in the cluster. For a replication factor of 2 we double the amount of
|
||||
writes on the cluster, with a replication factor of 3 we triple the amount of
|
||||
writes and so on.
|
||||
|
||||
|
||||
With 64 nodes and since we can only have 32 shards we have 32 nodes holding
|
||||
shards (primary replicas)
|
||||
|
||||
With a replication factor of 2 we will have 64 replicas (32 primary replicas and
|
||||
32 secondary replicas). Since we already have 32 nodes holding the 32
|
||||
shards/primary replicas rethinkdb uses the other 32 nodes to hold the secondary
|
||||
replicas. So in a 64 node cluster with 32 shards and a replication factor of 2,
|
||||
32 nodes will be holding the primary replicas and the other 32 nodes will be holding
|
||||
the secondary replicas.
|
||||
|
||||
With this setup if we run Test 4 now that we have a replication factor of 2 we
|
||||
will have twice the amount of writes but a nice result is that the IO in the
|
||||
nodes holding the primary replicas does not increase when compared to Test 4
|
||||
because all of the excess writing is now being done the 32 nodes holding the
|
||||
secondary replicas.
|
||||
|
||||
Another fact about replication. If I have a 64 node cluster and create a table
|
||||
with 32 shards, 32 nodes will be holding primary replicas and the other nodes do
|
||||
not hold any data. If I create another table with 32 shards rethinkdb will
|
||||
create the shards in the nodes that where not holding any data, evenly
|
||||
distributing the data.
|
@ -1,21 +0,0 @@
|
||||
# BigchainDB Software Architecture
|
||||
|
||||
Here we define the components needed for the software implementation of the prototype.
|
||||
|
||||
## bigspool
|
||||
|
||||
Bigchain implementation of the spool protocol
|
||||
|
||||
## bigchain
|
||||
|
||||
API to create, read, and push transactions to the bigchain
|
||||
|
||||
## validator
|
||||
|
||||
Transaction validator. Decides which transactions to include in the bigchain. Each node in the federation will be
|
||||
running this code and the a transaction will be valid as long as more then half the nodes decided that the transaction
|
||||
is valid
|
||||
|
||||
## processor
|
||||
|
||||
Creates the blocks from the transactions and cleans the backlog.
|
@ -1,13 +0,0 @@
|
||||
# Transaction Validation
|
||||
|
||||
## Generic Validation
|
||||
|
||||
1. Query the bigchain and check if `current_owner` actually owns the `hash`.
|
||||
2. Check if the transaction was signed with `current_owner` private key.
|
||||
|
||||
## Specific Validation
|
||||
|
||||
1. Query the bigchain and check if `current_owner` actually owns the `hash`.
|
||||
2. Check if the transaction was signed with `current_owner` private key.
|
||||
3. Depending on the `operation` additional checks may need to be performed. This will be specified by the protocol
|
||||
running in the chain e. g. [Spool protocol](https://github.com/ascribe/spool)
|
Loading…
x
Reference in New Issue
Block a user