mirror of
https://github.com/bigchaindb/bigchaindb.git
synced 2024-10-13 13:34:05 +00:00
Merge 6d46ca43bc192138d8fabd7c4ad81aeb565fde12 into 67598872ef07b8d580d6938e6d4002f2c1bd4b5d
This commit is contained in:
commit
0ee62c8f0e
@ -19,7 +19,7 @@ tx_hash = hashlib.sha3_256(data).hexdigest()
|
||||
|
||||
## Keys
|
||||
|
||||
For signing and veryfing signatures we are using the ECDSA with 192bit key lengths and
|
||||
For signing and verifying signatures we are using the ECDSA with 192bit key lengths and
|
||||
[python-ecdsa](https://github.com/warner/python-ecdsa) as the python implementation.
|
||||
|
||||
The public-key or verification key are converted to string and hex encoded before storing them to the blockchain. For example:
|
||||
|
@ -14,6 +14,6 @@ Answer 2.
|
||||
|
||||
**Why do we use blocks and not just create the chain with transactions?**
|
||||
|
||||
With distributed data stores there is no guarantees in the order in which transactions will be commited to the database. Witouth knowing what is previous transactions to be commited to the database we cannot include its hash in the current transaction to build the chain.
|
||||
With distributed data stores there is no guarantees in the order in which transactions will be committed to the database. Without knowing what is previous transactions to be committed to the database we cannot include its hash in the current transaction to build the chain.
|
||||
|
||||
To solve this problem we decided to use blocks and create the chain with the blocks.
|
||||
|
@ -7,7 +7,7 @@ BigchainDB Documentation
|
||||
Table of Contents
|
||||
-----------------
|
||||
|
||||
Note to reviewers of this documentation: For now, *all* documentation files (.rst and .md) are at the same level in the Table of Contents heirarchy. Later, we can organize them into a more sensible heirarchy.
|
||||
Note to reviewers of this documentation: For now, *all* documentation files (.rst and .md) are at the same level in the Table of Contents hierarchy. Later, we can organize them into a more sensible hierarchy.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
We needed to clearly define how to serialize a JSON object to calculate the hash.
|
||||
|
||||
The serialization should produce the same byte output independently of the architecture running the software. If there are diferences in the serialization, hash validations will fail although the transaction is correct.
|
||||
The serialization should produce the same byte output independently of the architecture running the software. If there are differences in the serialization, hash validations will fail although the transaction is correct.
|
||||
|
||||
For example, consider the following two methods of serializing `{'a': 1}`:
|
||||
```python
|
||||
@ -24,7 +24,7 @@ deserialize(serialize(data)) == data
|
||||
True
|
||||
```
|
||||
|
||||
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interopelability.
|
||||
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interoperability.
|
||||
|
||||
```python
|
||||
import json
|
||||
@ -35,8 +35,8 @@ json.dumps(data, skipkeys=False, ensure_ascii=False,
|
||||
```
|
||||
|
||||
- `skipkeys`: With skipkeys `False` if the provided keys are not a string the serialization will fail. This way we enforce all keys to be strings
|
||||
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting ensure_ascii to `False` we allow unicode characters and force the encoding to `utf-8`.
|
||||
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e. g. an extra whitespace introduced in the serialization would not still create a valid json object but the hash would be different.
|
||||
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting `ensure_ascii` to `False` we allow unicode characters and force the encoding to `utf-8`.
|
||||
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e.g. an extra whitespace introduced in the serialization would not still create a valid JSON object but the hash would be different.
|
||||
|
||||
Every time we need to perform some operation on the data like calculating the hash or signing/verifying the transaction, we need to use the previous criteria to serialize the data and then use the `byte` representation of the serialized data (if we treat the data as bytes we eliminate possible encoding errors e.g. unicode characters). For example:
|
||||
```python
|
||||
@ -52,4 +52,4 @@ signature = sk.sign(tx_serialized)
|
||||
# verify signature
|
||||
tx_serialized = bytes(serialize(tx))
|
||||
vk.verify(signature, tx_serialized)
|
||||
```
|
||||
```
|
||||
|
@ -35,7 +35,7 @@ This can be changed in the future to allow multiple inputs per transaction.
|
||||
- `operation`: String representation of the operation being performed (REGISTER, TRANSFER, ...) this will define how
|
||||
the transactions should be validated
|
||||
- `timestamp`: Time of creation of the transaction in UTC
|
||||
- `data`: Json object describing the asset (digital content). It contains at least the field `hash` which is a
|
||||
- `data`: JSON object describing the asset (digital content). It contains at least the field `hash` which is a
|
||||
sha3 hash of the digital content.
|
||||
- `signature`: ECDSA signature of the transaction with the `current_owner` private key
|
||||
|
||||
@ -59,7 +59,7 @@ Still to be defined when new blocks are created (after x number of transactions,
|
||||
or both).
|
||||
A block contains a group of transactions and includes the hash of the hash of the previous block to build the chain.
|
||||
|
||||
- `id`: sha3 hash of the current block. This is also a rethinkdb primary key, this way we make sure that all blocks are unique.
|
||||
- `id`: sha3 hash of the current block. This is also a RethinkDB primary key, this way we make sure that all blocks are unique.
|
||||
- `block`: The actual block
|
||||
- `timestamp`: timestamp when the block was created
|
||||
- `transactions`: the list of transactions included in the block
|
||||
|
@ -1,3 +1,3 @@
|
||||
# Release Notes
|
||||
|
||||
This section has the release notes for each version of BigChainDB.
|
||||
This section has the release notes for each version of BigchainDB.
|
||||
|
@ -8,7 +8,7 @@ The goal was to test RethinkDB scalability properties, to understand its limits,
|
||||
|
||||
### Settings
|
||||
|
||||
To test the writing performance of rethinkdb we have a process that inserts a
|
||||
To test the writing performance of RethinkDB we have a process that inserts a
|
||||
block in the database in an infinite loop
|
||||
|
||||
The block is a valid block with small transactions (transactions without any
|
||||
@ -23,7 +23,7 @@ In `hard` durability mode, writes are committed to disk before acknowledgments
|
||||
are sent; in `soft` mode, writes are acknowledged immediately after being stored
|
||||
in memory.
|
||||
|
||||
This means that the insert will block until rethinkdb acknowledges that the data
|
||||
This means that the insert will block until RethinkDB acknowledges that the data
|
||||
was cached. In each server we can start multiple process.
|
||||
|
||||
### Write units
|
||||
@ -35,10 +35,10 @@ easier to compare different tests.
|
||||
### Sharding
|
||||
|
||||
Sharding in distributed datastores means partitioning a table so that the data
|
||||
can be evenly distributed between all nodes in the cluster. In rethinkdb and
|
||||
can be evenly distributed between all nodes in the cluster. In RethinkDB and
|
||||
most distributed datastores there is a maximum limit of 32 shards per table.
|
||||
|
||||
In rethinkdb a `shard` is also called a `primary replica`, since by default the
|
||||
In RethinkDB a `shard` is also called a `primary replica`, since by default the
|
||||
replication factor is 1. Increasing the replication factor produces `secondary
|
||||
replicas` that are used for data redundancy (if a node holding a primary replica
|
||||
goes down another node holding a secondary replica of the same data can step up
|
||||
@ -48,7 +48,7 @@ For these tests we are using 32 core ec2 instances with SSD storage and 10Gbps
|
||||
network connections (`c3.8xlarge`). For the tests we used either 32 or 64 node
|
||||
clusters all running on the same aws region.
|
||||
|
||||
These tests show rethinkdb performance and what we can expect from the database.
|
||||
These tests show RethinkDB performance and what we can expect from the database.
|
||||
This does not show the performance of the bigchain
|
||||
|
||||
## Tests
|
||||
@ -100,10 +100,10 @@ the machine can handle.
|
||||
- **output**: stable 1.4K writes per second
|
||||
|
||||
These test produces results similar to previous one. The reason why we don't
|
||||
reach the expected output may be because rethinkdb needs time to cache results
|
||||
reach the expected output may be because RethinkDB needs time to cache results
|
||||
and at some point increasing the number of write units will not result in an
|
||||
higher output. Another problem is that as the rethinkdb cache fills (because the
|
||||
rethinkdb is not able to flush the data to disk fast enough due to IO
|
||||
higher output. Another problem is that as the RethinkDB cache fills (because the
|
||||
RethinkDB is not able to flush the data to disk fast enough due to IO
|
||||
limitations) the performance will decrease because the processes will take more
|
||||
time inserting blocks.
|
||||
|
||||
@ -118,7 +118,7 @@ time inserting blocks.
|
||||
|
||||
In this case we are increasing the number of nodes in the cluster by 2x. This
|
||||
won't have an impact in the write performance because the maximum amount of
|
||||
shards per table in rethinkdb is 32 (rethinkdb will probably increase this limit
|
||||
shards per table in RethinkDB is 32 (RethinkDB will probably increase this limit
|
||||
in the future). What this provides is more CPU power (and storage for replicas,
|
||||
more about replication in the next section). We just halved the amount write
|
||||
units per node maintaining the same output. The IO in the nodes holding the
|
||||
@ -158,7 +158,7 @@ is another advantage of adding more nodes beyond 32.
|
||||
|
||||
## Testing replication
|
||||
|
||||
Replication is used for data redundancy. In rethinkdb we are able to specify the
|
||||
Replication is used for data redundancy. In RethinkDB we are able to specify the
|
||||
number of shards and replicas per table. Data in secondary replicas is no
|
||||
directly used, its just a mirror of a primary replica and used in case the node
|
||||
holding the primary replica fails.
|
||||
@ -177,7 +177,7 @@ shards (primary replicas)
|
||||
|
||||
With a replication factor of 2 we will have 64 replicas (32 primary replicas and
|
||||
32 secondary replicas). Since we already have 32 nodes holding the 32
|
||||
shards/primary replicas rethinkdb uses the other 32 nodes to hold the secondary
|
||||
shards/primary replicas RethinkDB uses the other 32 nodes to hold the secondary
|
||||
replicas. So in a 64 node cluster with 32 shards and a replication factor of 2,
|
||||
32 nodes will be holding the primary replicas and the other 32 nodes will be holding
|
||||
the secondary replicas.
|
||||
@ -190,6 +190,6 @@ secondary replicas.
|
||||
|
||||
Another fact about replication. If I have a 64 node cluster and create a table
|
||||
with 32 shards, 32 nodes will be holding primary replicas and the other nodes do
|
||||
not hold any data. If I create another table with 32 shards rethinkdb will
|
||||
not hold any data. If I create another table with 32 shards RethinkDB will
|
||||
create the shards in the nodes that where not holding any data, evenly
|
||||
distributing the data.
|
||||
|
Loading…
x
Reference in New Issue
Block a user