Problem: No docs explaining use of MongoDB for querying (#2193)

* Problem: No docs explaining use of MongoDB for querying Solution: Start a new root docs page explaining how a node operator can use the full power of MongoDB's query engine, and can expose as much of that as they like to end users. * Finished first draft of new docs page 'Querying BigchainDB'
2024-10-13 13:34:05 +00:00 · 2018-04-23 19:30:45 +02:00 · 2018-04-23 19:30:45 +02:00 · d066bfe132
commit d066bfe132
parent 99d46605ae
3 changed files with 79 additions and 0 deletions
--- a/docs/root/source/_static/schemaDB.png
+++ b/docs/root/source/_static/schemaDB.png
--- a/docs/root/source/index.rst
+++ b/docs/root/source/index.rst
@ -85,6 +85,7 @@ More About BigchainDB
   diversity
   immutable
   bft
+   query
   assets
   smart-contracts
   transaction-concepts
--- a/docs/root/source/query.rst
+++ b/docs/root/source/query.rst
@ -0,0 +1,78 @@
+Querying BigchainDB
+===================
+
+A node operator can use the full power of MongoDB's query engine to search and query all stored data, including all transactions, assets and metadata.
+The node operator can decide for themselves how much of that query power they expose to external users.
+
+How to Query
+------------
+
+A BigchainDB node operator has full access to their local MongoDB instance, so they can use any of MongoDB's APIs for running queries, including:
+
+- `the mongo Shell <https://docs.mongodb.com/manual/mongo/>`_,
+- one of `the MongoDB drivers <https://docs.mongodb.com/ecosystem/drivers/>`_, such as `PyMongo <https://api.mongodb.com/python/current/>`_, or
+- a third-party tool or driver for doing MongoDB queries, such as RazorSQL.
+
+What Can be Queried?
+--------------------
+
+BigchainDB Server creates several `MongoDB collections <https://docs.mongodb.com/manual/core/databases-and-collections/>`_ in the node's local MongoDB database.
+You can see the list of collections by looking at the ``create_tables`` method in the BigchainDB Server file ``bigchaindb/backend/localmongodb/schema.py``. The most interesting collections are:
+
+- transactions
+- assets
+- metadata
+- blocks
+
+We don't detail what's in each collection here, but the collection names are fairly self-explanatory. You can explore their contents using MongoDB queries. A couple of things worth noting are:
+
+1. The transactions collection doesn't include any ``asset.data`` or ``metadata`` values (JSON documents). Those are all removed and stored separately in the assets and metadata collections, respectively.
+2. The JSON documents stored in the blocks collection are *not* `Tendermint blocks <https://github.com/tendermint/tendermint/blob/master/types/block.go>`_, they are `BigchainDB blocks <https://docs.bigchaindb.com/projects/server/en/latest/data-models/block-model.html>`_.
+3. Votes aren't stored in any MongoDB collection, currently. They are all handled and stored by Tendermint in its own (LevelDB) database.
+
+What a Node Operator Can Expose to External Users
+-------------------------------------------------
+
+Each node operator can decide how they let external users get information from their local MongoDB database. They could expose:
+
+- their local MonogoDB database itself to queries from external users, maybe as a MongoDB user with a role that has limited privileges, e.g. read-only.
+- a limited HTTP API, allowing a restricted set of predefined queries, such as `the HTTP API provided by BigchainDB Server <http://bigchaindb.com/http-api>`_, or a custom HTTP API implemented using Django, Express, Ruby on Rails, or ASP.NET.
+- some other API, such as a GraphQL API. They could do that using custom code or code from a third party.
+
+Each node operator can expose a different level or type of access to their local MongoDB database.
+For example, one node operator might decide to specialize in offering optimized `geospatial queries <https://docs.mongodb.com/manual/reference/operator/query-geospatial/>`_.
+
+Security Considerations
+-----------------------
+
+In BigchainDB version 1.3.0 and earlier, there was one logical MongoDB database, so exposing that database to external users was very risky, and was not recommended.
+"Drop database" would delete that one shared MongoDB database.
+
+In BigchainDB version 2.0.0 and later, each node has its own isolated local MongoDB database.
+Inter-node communications are done using Tendermint protocols, not MongoDB protocols, as illustrated in Figure 1 below.
+If a node's local MongoDB database gets compromised, none of the other MongoDB databases (in the other nodes) will be affected.
+
+.. figure:: _static/schemaDB.png
+   :alt: Diagram of a four-node BigchainDB 2.0 network
+   :align: center
+   
+   Figure 1: A Four-Node BigchainDB 2.0 Network
+
+.. raw:: html
+
+   <br>
+   <br>
+   <br>
+
+Performance and Cost Considerations
+-----------------------------------
+
+Query processing can be quite resource-intensive, so it's a good idea to have MongoDB running in a separate machine from those running BigchainDB Server and Tendermint Core.
+
+A node operator might want to measure the resources used by a query, so they can charge whoever requested the query accordingly.
+
+Some queries can take too long or use too many resources. A node operator should put upper bounds on the resources that a query can use, and halt (or prevent) any query that goes over.
+
+To make MongoDB queries more efficient, one can create `indexes <https://docs.mongodb.com/manual/indexes/>`_. Those indexes might be created by the node operator or by some external users (if the node operator allows that). It's worth noting that indexes aren't free: whenever new data is appended to a collection, the corresponding indexes must be updated. The node operator might want to pass those costs on to whoever created the index. Moreover, in MongoDB, `a single collection can have no more than 64 indexes <https://docs.mongodb.com/manual/reference/limits/#Number-of-Indexes-per-Collection>`_.
+
+One can create a follower node: a node with Tendermint voting power 0. It would still have a copy of all the data, so it could be used as read-only node. A follower node could offer specialized queries as a service without affecting the workload on the voting validators (which can also write). There could even be followers of followers.