Problem: Docs say little re/ privacy & handling private data (#2437)

* Problem: Docs should say more about how to handle privacy & private data Solution: Write a new docs page about that * Problem: Server docs say little re/ prod. node security & privacy Solution: Create a new docs page, "Production Node Security & Privacy" * Address initial comments from @gautamdhameja
2024-10-13 13:34:05 +00:00 · 2018-08-06 10:15:09 +02:00 · 2018-08-06 10:15:09 +02:00 · 8090a35676
commit 8090a35676
parent d25d806cd8
6 changed files with 115 additions and 17 deletions
--- a/docs/root/source/index.rst
+++ b/docs/root/source/index.rst
@ -91,4 +91,5 @@ More About BigchainDB
   transaction-concepts
   store-files
   permissions
+   private-data
   Data Models <https://docs.bigchaindb.com/projects/server/en/latest/data-models/index.html>
--- a/docs/root/source/permissions.rst
+++ b/docs/root/source/permissions.rst
@ -53,20 +53,7 @@ You could do more elaborate things too. As one example, each time someone writes
 Read Permissions
 ================

-All the data stored in a BigchainDB network can be read by anyone with access to that network. One *can* store encrypted data, but if the decryption key ever leaks out, then the encrypted data can be read, decrypted, and leak out too. (Deleting the encrypted data is :doc:`not an option <immutable>`.)
-
-The permission to read some specific information (e.g. a music file) can be thought of as an *asset*. (In many countries, that permission or "right" is a kind of intellectual property.)
-BigchainDB can be used to register that asset and transfer it from owner to owner.
-Today, BigchainDB does not have a way to restrict read access of data stored in a BigchainDB network, but many third-party services do offer that (e.g. Google Docs, Dropbox). 
-In principle, a third party service could ask a BigchainDB network to determine if a particular user has permission to read some particular data. Indeed they could use BigchainDB to keep track of *all* the rights a user has for some data (not just the right to read it).
-That third party could also use BigchainDB to store audit logs, i.e. records of every read, write or other operation on stored data.
-
-BigchainDB can be used in other ways to help parties exchange private data:
-
- It can be used to publicly disclose the *availability* of some private data (stored elsewhere). For example, there might be a description of the data and a price.
- It can be used to record the TLS handshakes which two parties sent to each other to establish an encrypted and authenticated TLS connection, which they could use to exchange private data with each other. (The stored handshake information wouldn't be enough, by itself, to decrypt the data.) It would be a "proof of TLS handshake."
- See the BigchainDB `Privacy Protocols repository <https://github.com/bigchaindb/privacy-protocols>`_ for more techniques.
-
+See the page titled, :doc:`BigchainDB, Privacy and Private Data <private-data>`.

 Role-Based Access Control (RBAC)
 ================================
--- a/docs/root/source/private-data.rst
+++ b/docs/root/source/private-data.rst
@ -0,0 +1,100 @@
+BigchainDB, Privacy and Private Data
+------------------------------------
+
+Basic Facts
+===========
+
+#. One can store arbitrary data (including encrypted data) in a BigchainDB network, within limits: there’s a maximum transaction size. Every transaction has a ``metadata`` section which can store almost any Unicode string (up to some maximum length). Similarly, every CREATE transaction has an ``asset.data`` section which can store almost any Unicode string.
+#. The data stored in certain BigchainDB transaction fields must not be encrypted, e.g. public keys and amounts. BigchainDB doesn’t offer private transactions akin to Zcoin.
+#. Once data has been stored in a BigchainDB network, it’s best to assume it can’t be change or deleted.
+#. Every node in a BigchainDB network has a full copy of all the stored data.
+#. Every node in a BigchainDB network can read all the stored data.
+#. Everyone with full access to a BigchainDB node (e.g. the sysadmin of a node) can read all the data stored on that node.
+#. Everyone given access to a node via the BigchainDB HTTP API can find and read all the data stored by BigchainDB. The list of people with access might be quite short.
+#. If the connection between an external user and a BigchainDB node isn’t encrypted (using HTTPS, for example), then a wiretapper can read all HTTP requests and responses in transit.
+#. If someone gets access to plaintext (regardless of where they got it), then they can (in principle) share it with the whole world. One can make it difficult for them to do that, e.g. if it is a lot of data and they only get access inside a secure room where they are searched as they leave the room.
+
+Storing Private Data Off-Chain
+==============================
+
+A system could store data off-chain, e.g. in a third-party database, document store, or content management system (CMS) and it could use BigchainDB to:
+
+- Keep track of who has read permissions (or other permissions) in a third-party system. An example of how this could be done is described below.
+- Keep a permanent record of all requests made to the third-party system.
+- Store hashes of documents-stored-elsewhere, so that a change in any document can be detected.
+- Record all handshake-establishing requests and responses between two off-chain parties (e.g. a Diffie-Hellman key exchange), so as to prove that they established an encrypted tunnel (without giving readers access to that tunnel). There are more details about this idea in `the BigchainDB Privacy Protocols repository <https://github.com/bigchaindb/privacy-protocols>`_.
+
+A simple way to record who has read permission on a particular document would be for the third-party system (“DocPile”) to store a CREATE transaction in a BigchainDB network for every document+user pair, to indicate that that user has read permissions for that document. The transaction could be signed by DocPile (or maybe by a document owner, as a variation). The asset data field would contain 1) the unique ID of the user and 2) the unique ID of the document. The one output on the CREATE transaction would only be transferable/spendable by DocPile (or, again, a document owner).
+
+To revoke the read permission, DocPile could create a TRANSFER transaction, to spend the one output on the original CREATE transaction, with a metadata field to say that the user in question no longer has read permission on that document.
+
+This can be carried on indefinitely, i.e. another TRANSFER transaction could be created by DocPile to indicate that the user now has read permissions again.
+
+DocPile can figure out if a given user has read permissions on a given document by reading the last transaction in the CREATE → TRANSFER → TRANSFER → etc. chain for that user+document pair.
+
+There are other ways to accomplish the same thing. The above is just one example.
+
+You might have noticed that the above example didn’t treat the “read permission” as an asset owned (controlled) by a user because if the permission asset is given to (transferred to or created by) the user then it cannot be controlled any further (by DocPile) until the user transfers it back to DocPile. Moreover, the user could transfer the asset to someone else, which might be problematic.
+
+Storing Private Data On-Chain, Encrypted
+========================================
+
+There are many ways to store private data on-chain, encrypted. Every use case has its own objectives and constraints, and the best solution depends on the use case. `The BigchainDB consulting team <https://www.bigchaindb.com/services/>`_, along with our partners, can help you design the best solution for your use case.
+
+Below we describe some example system setups, using various crypto primitives, to give a sense of what’s possible.
+
+Please note:
+
+- Ed25519 keypairs are designed for signing and verifying cryptographic signatures, `not for encrypting and decrypting messages <https://crypto.stackexchange.com/questions/27866/why-curve25519-for-encryption-but-ed25519-for-signatures>`_. For encryption, you should use keypairs designed for encryption, such as X25519.
+- If someone (or some group) publishes how to decrypt some encrypted data on-chain, then anyone with access to that encrypted data will be able to get the plaintext. The data can’t be deleted.
+- Encrypted data can’t be indexed or searched by MongoDB. (It can index and search the ciphertext, but that’s not very useful.) One might use homomorphic encryption to index and search encrypted data, but MongoDB doesn’t have any plans to support that any time soon. If there is indexing or keyword search needed, then some fields of the ``asset.data`` or ``metadata`` objects can be left as plain text and the sensitive information can be stored in an encrypted child-object.
+
+System Example 1
+~~~~~~~~~~~~~~~~
+
+Encrypt the data with a symmetric key and store the ciphertext on-chain (in ``metadata`` or ``asset.data``). To communicate the key to a third party, use their public key to encrypt the symmetric key and send them that. They can decrypt the symmetric key with their private key, and then use that symmetric key to decrypt the on-chain ciphertext.
+
+The reason for using a symmetric key along with public/private keypairs is so the ciphertext only has to be stored once.
+
+System Example 2
+~~~~~~~~~~~~~~~~
+
+This example uses `proxy re-encryption <https://en.wikipedia.org/wiki/Proxy_re-encryption>`_:
+
+#. MegaCorp encrypts some data using its own public key, then stores that encrypted data (ciphertext 1) in a BigchainDB network.
+#. MegaCorp wants to let others read that encrypted data, but without ever sharing their private key and without having to re-encrypt themselves for every new recipient. Instead, they find a “proxy” named Moxie, to provide proxy re-encryption services.
+#. Zorban contacts MegaCorp and asks for permission to read the data.
+#. MegaCorp asks Zorban for his public key.
+#. MegaCorp generates a “re-encryption key” and sends it to their proxy, Moxie.
+#. Moxie (the proxy) uses the re-encryption key to encrypt ciphertext 1, creating ciphertext 2.
+#. Moxie sends ciphertext 2 to Zorban (or to MegaCorp who forwards it to Zorban).
+#. Zorban uses his private key to decrypt ciphertext 2, getting the original un-encrypted data.
+
+Note:
+
+- The proxy only ever sees ciphertext. They never see any un-encrypted data.
+- Zorban never got the ability to decrypt ciphertext 1, i.e. the on-chain data.
+- There are variations on the above flow.
+
+System Example 3
+~~~~~~~~~~~~~~~~
+
+This example uses `erasure coding <https://en.wikipedia.org/wiki/Erasure_code>`_:
+
+#. Erasure-code the data into n pieces.
+#. Encrypt each of the n pieces with a different encryption key.
+#. Store the n encrypted pieces on-chain, e.g. in n separate transactions.
+#. Share each of the the n decryption keys with a different party.
+
+If k < N of the key-holders gets and decrypts k of the pieces, they can reconstruct the original plaintext. Less than k would not be enough.
+
+System Example 4
+~~~~~~~~~~~~~~~~
+
+This setup could be used in an enterprise blockchain scenario where a special node should be able to see parts of the data, but the others should not.
+
+- The special node generates an X25519 keypair (or similar asymmetric *encryption* keypair).
+- A BigchainDB end user finds out the X25519 public key (encryption key) of the special node.
+- The end user creates a valid BigchainDB transaction, with either the asset.data or the metadata (or both) encrypted using the above-mentioned public key.
+- This is only done for transactions where the contents of asset.data or metadata don't matter for validation, so all node operators can validate the transaction.
+- The special node is able to decrypt the encrypted data, but the other node operators can't, and nor can any other end user.
--- a/docs/server/source/production-nodes/index.rst
+++ b/docs/server/source/production-nodes/index.rst
@ -4,7 +4,8 @@ Production Nodes
 .. toctree::
   :maxdepth: 1

+   node-requirements
   node-assumptions
   node-components
-   node-requirements
+   node-security-and-privacy
   reverse-proxy-notes
--- a/docs/server/source/production-nodes/node-assumptions.md
+++ b/docs/server/source/production-nodes/node-assumptions.md
@ -10,5 +10,3 @@ We make some assumptions about production nodes:
 1. Production nodes use MongoDB (not RethinkDB, PostgreSQL, Couchbase or whatever).
 1. Each production node is set up and managed by an experienced professional system administrator or a team of them.
 1. Each production node in a cluster is managed by a different person or team.
-
-We don't provide a detailed cookbook explaining how to secure a server, or other things that a sysadmin should know. We do provide some templates, but those are just starting points.
--- a/docs/server/source/production-nodes/node-security-and-privacy.md
+++ b/docs/server/source/production-nodes/node-security-and-privacy.md
@ -0,0 +1,11 @@
+# Production Node Security & Privacy
+
+Here are some references about how to secure an Ubuntu 18.04 server:
+
+- [Ubuntu 18.04 - Ubuntu Server Guide - Security](https://help.ubuntu.com/lts/serverguide/security.html.en)
+- [Ubuntu Blog: National Cyber Security Centre publish Ubuntu 18.04 LTS Security Guide](https://blog.ubuntu.com/2018/07/30/national-cyber-security-centre-publish-ubuntu-18-04-lts-security-guide)
+
+Also, here are some recommendations a node operator can follow to enhance the privacy of the data coming to, stored on, and leaving their node:
+
+- Ensure that all data stored on a node is encrypted at rest, e.g. using full disk encryption. This can be provided as a service by the operating system, transparently to BigchainDB, MongoDB and Tendermint.
+- Ensure that all data is encrypted in transit, i.e. enforce using HTTPS for the HTTP API and the Websocket API. This can be done using NGINX or similar, as we do with the BigchainDB Testnet.