Merge 6d46ca43bc192138d8fabd7c4ad81aeb565fde12 into 67598872ef07b8d580d6938e6d4002f2c1bd4b5d

This commit is contained in:
Alberto Granzotto 2016-02-10 13:59:32 +00:00
commit 0ee62c8f0e
7 changed files with 23 additions and 23 deletions

View File

@ -19,7 +19,7 @@ tx_hash = hashlib.sha3_256(data).hexdigest()
## Keys
For signing and veryfing signatures we are using the ECDSA with 192bit key lengths and
For signing and verifying signatures we are using the ECDSA with 192bit key lengths and
[python-ecdsa](https://github.com/warner/python-ecdsa) as the python implementation.
The public-key or verification key are converted to string and hex encoded before storing them to the blockchain. For example:

View File

@ -14,6 +14,6 @@ Answer 2.
**Why do we use blocks and not just create the chain with transactions?**
With distributed data stores there is no guarantees in the order in which transactions will be commited to the database. Witouth knowing what is previous transactions to be commited to the database we cannot include its hash in the current transaction to build the chain.
With distributed data stores there is no guarantees in the order in which transactions will be committed to the database. Without knowing what is previous transactions to be committed to the database we cannot include its hash in the current transaction to build the chain.
To solve this problem we decided to use blocks and create the chain with the blocks.

View File

@ -7,7 +7,7 @@ BigchainDB Documentation
Table of Contents
-----------------
Note to reviewers of this documentation: For now, *all* documentation files (.rst and .md) are at the same level in the Table of Contents heirarchy. Later, we can organize them into a more sensible heirarchy.
Note to reviewers of this documentation: For now, *all* documentation files (.rst and .md) are at the same level in the Table of Contents hierarchy. Later, we can organize them into a more sensible hierarchy.
.. toctree::
:maxdepth: 5

View File

@ -2,7 +2,7 @@
We needed to clearly define how to serialize a JSON object to calculate the hash.
The serialization should produce the same byte output independently of the architecture running the software. If there are diferences in the serialization, hash validations will fail although the transaction is correct.
The serialization should produce the same byte output independently of the architecture running the software. If there are differences in the serialization, hash validations will fail although the transaction is correct.
For example, consider the following two methods of serializing `{'a': 1}`:
```python
@ -24,7 +24,7 @@ deserialize(serialize(data)) == data
True
```
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interopelability.
After looking at this further, we decided that the python json module is still the best bet because it complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to make sure that we obtain maximum interoperability.
```python
import json
@ -35,8 +35,8 @@ json.dumps(data, skipkeys=False, ensure_ascii=False,
```
- `skipkeys`: With skipkeys `False` if the provided keys are not a string the serialization will fail. This way we enforce all keys to be strings
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting ensure_ascii to `False` we allow unicode characters and force the encoding to `utf-8`.
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e. g. an extra whitespace introduced in the serialization would not still create a valid json object but the hash would be different.
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting `ensure_ascii` to `False` we allow unicode characters and force the encoding to `utf-8`.
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different implementations could use different separators for serialization resulting in a still valid transaction but with a different hash e.g. an extra whitespace introduced in the serialization would not still create a valid JSON object but the hash would be different.
Every time we need to perform some operation on the data like calculating the hash or signing/verifying the transaction, we need to use the previous criteria to serialize the data and then use the `byte` representation of the serialized data (if we treat the data as bytes we eliminate possible encoding errors e.g. unicode characters). For example:
```python
@ -52,4 +52,4 @@ signature = sk.sign(tx_serialized)
# verify signature
tx_serialized = bytes(serialize(tx))
vk.verify(signature, tx_serialized)
```
```

View File

@ -35,7 +35,7 @@ This can be changed in the future to allow multiple inputs per transaction.
- `operation`: String representation of the operation being performed (REGISTER, TRANSFER, ...) this will define how
the transactions should be validated
- `timestamp`: Time of creation of the transaction in UTC
- `data`: Json object describing the asset (digital content). It contains at least the field `hash` which is a
- `data`: JSON object describing the asset (digital content). It contains at least the field `hash` which is a
sha3 hash of the digital content.
- `signature`: ECDSA signature of the transaction with the `current_owner` private key
@ -59,7 +59,7 @@ Still to be defined when new blocks are created (after x number of transactions,
or both).
A block contains a group of transactions and includes the hash of the hash of the previous block to build the chain.
- `id`: sha3 hash of the current block. This is also a rethinkdb primary key, this way we make sure that all blocks are unique.
- `id`: sha3 hash of the current block. This is also a RethinkDB primary key, this way we make sure that all blocks are unique.
- `block`: The actual block
- `timestamp`: timestamp when the block was created
- `transactions`: the list of transactions included in the block

View File

@ -1,3 +1,3 @@
# Release Notes
This section has the release notes for each version of BigChainDB.
This section has the release notes for each version of BigchainDB.

View File

@ -8,7 +8,7 @@ The goal was to test RethinkDB scalability properties, to understand its limits,
### Settings
To test the writing performance of rethinkdb we have a process that inserts a
To test the writing performance of RethinkDB we have a process that inserts a
block in the database in an infinite loop
The block is a valid block with small transactions (transactions without any
@ -23,7 +23,7 @@ In `hard` durability mode, writes are committed to disk before acknowledgments
are sent; in `soft` mode, writes are acknowledged immediately after being stored
in memory.
This means that the insert will block until rethinkdb acknowledges that the data
This means that the insert will block until RethinkDB acknowledges that the data
was cached. In each server we can start multiple process.
### Write units
@ -35,10 +35,10 @@ easier to compare different tests.
### Sharding
Sharding in distributed datastores means partitioning a table so that the data
can be evenly distributed between all nodes in the cluster. In rethinkdb and
can be evenly distributed between all nodes in the cluster. In RethinkDB and
most distributed datastores there is a maximum limit of 32 shards per table.
In rethinkdb a `shard` is also called a `primary replica`, since by default the
In RethinkDB a `shard` is also called a `primary replica`, since by default the
replication factor is 1. Increasing the replication factor produces `secondary
replicas` that are used for data redundancy (if a node holding a primary replica
goes down another node holding a secondary replica of the same data can step up
@ -48,7 +48,7 @@ For these tests we are using 32 core ec2 instances with SSD storage and 10Gbps
network connections (`c3.8xlarge`). For the tests we used either 32 or 64 node
clusters all running on the same aws region.
These tests show rethinkdb performance and what we can expect from the database.
These tests show RethinkDB performance and what we can expect from the database.
This does not show the performance of the bigchain
## Tests
@ -100,10 +100,10 @@ the machine can handle.
- **output**: stable 1.4K writes per second
These test produces results similar to previous one. The reason why we don't
reach the expected output may be because rethinkdb needs time to cache results
reach the expected output may be because RethinkDB needs time to cache results
and at some point increasing the number of write units will not result in an
higher output. Another problem is that as the rethinkdb cache fills (because the
rethinkdb is not able to flush the data to disk fast enough due to IO
higher output. Another problem is that as the RethinkDB cache fills (because the
RethinkDB is not able to flush the data to disk fast enough due to IO
limitations) the performance will decrease because the processes will take more
time inserting blocks.
@ -118,7 +118,7 @@ time inserting blocks.
In this case we are increasing the number of nodes in the cluster by 2x. This
won't have an impact in the write performance because the maximum amount of
shards per table in rethinkdb is 32 (rethinkdb will probably increase this limit
shards per table in RethinkDB is 32 (RethinkDB will probably increase this limit
in the future). What this provides is more CPU power (and storage for replicas,
more about replication in the next section). We just halved the amount write
units per node maintaining the same output. The IO in the nodes holding the
@ -158,7 +158,7 @@ is another advantage of adding more nodes beyond 32.
## Testing replication
Replication is used for data redundancy. In rethinkdb we are able to specify the
Replication is used for data redundancy. In RethinkDB we are able to specify the
number of shards and replicas per table. Data in secondary replicas is no
directly used, its just a mirror of a primary replica and used in case the node
holding the primary replica fails.
@ -177,7 +177,7 @@ shards (primary replicas)
With a replication factor of 2 we will have 64 replicas (32 primary replicas and
32 secondary replicas). Since we already have 32 nodes holding the 32
shards/primary replicas rethinkdb uses the other 32 nodes to hold the secondary
shards/primary replicas RethinkDB uses the other 32 nodes to hold the secondary
replicas. So in a 64 node cluster with 32 shards and a replication factor of 2,
32 nodes will be holding the primary replicas and the other 32 nodes will be holding
the secondary replicas.
@ -190,6 +190,6 @@ secondary replicas.
Another fact about replication. If I have a 64 node cluster and create a table
with 32 shards, 32 nodes will be holding primary replicas and the other nodes do
not hold any data. If I create another table with 32 shards rethinkdb will
not hold any data. If I create another table with 32 shards RethinkDB will
create the shards in the nodes that where not holding any data, evenly
distributing the data.