Lorenz Herzberger 83ca51c94f
Planetmint tarantool (#169)
* 31 restructue documentation (#138)

* removed korean documentation

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed CN and KOR readme

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* changed to the press theme

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* first changes

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixe H3 vs H1 issues

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added missing png

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added missing file

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed warnings

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* moved documents

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed obsolete files

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed obsolete folder

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed obs. file

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added some final changes

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed obs. reference

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* moved chain migration to election types (#109)

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* Final zenroom (#147)

* zenroom fixes

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* expl. defined the aiohttp package

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* increased version number and fixed a zenroom runtime bug

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added fialing zenroom tx signing test

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* extended test to pass zenrooom validation, but to fail planetmint validation.

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added manual tx crafting

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added zenroom fulfillment verification

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* the last mile before integration

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* zenroom unit tests are passing

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* simplified zenroom unit tests

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* removed obsolte lines from the zenroom tests

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed acceptance tests

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* adjusted zenroom integraiton tests

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed linting errors

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* simplified zenroom unit test

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* increased version number

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* using cryptoconditions without print message

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* increased cc usage to 0.9.9 readded daemon proceses

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* increased version to 0.9.6

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed deployment issue for 0.9.6

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* adjusted get_assets and from_db for tarantool

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* added comment

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* improve usability of zenroom (#159)

* improve usability of zenroom

* * increased version
* fixed test cases
* added changelog

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

Co-authored-by: Jürgen Eckel <juergen@riddleandcode.com>

* migrated to AGPLv3

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* 150 add cryptoconditions documentation (#166)

* added smaller logos fixed reference issue

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed some erros and typos

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* added cryptoconditions reference to the subproject

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* docker all in one now install tarantool

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* added user to integration init.lua

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* updated integration test setup for tarantool

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* removed print statements

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* updated changelog

Signed-off-by: Lorenz Herzberger <lorenzherzberger@gmail.com>

* fixed error messaging

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed exception verification

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

* fixed printing of testdata

Signed-off-by: Jürgen Eckel <juergen@riddleandcode.com>

Co-authored-by: Jürgen Eckel <eckelj@users.noreply.github.com>
Co-authored-by: Lorenz Herzberger <64837895+LaurentDeMontBlanc@users.noreply.github.com>
Co-authored-by: Alberto Lerda <30939098+albertolerda@users.noreply.github.com>
Co-authored-by: Jürgen Eckel <juergen@riddleandcode.com>
2022-07-01 09:15:31 +02:00

148 lines
5.9 KiB
ReStructuredText

.. Copyright © 2020 Interplanetary Database Association e.V.,
Planetmint and IPDB software contributors.
SPDX-License-Identifier: (Apache-2.0 AND CC-BY-4.0)
Code is Apache-2.0 and docs are CC-BY-4.0
.. _cluster-troubleshooting:
Cluster Troubleshooting
=======================
This page describes some basic issues we have faced while deploying and
operating the cluster.
1. MongoDB Restarts
-------------------
We define the following in the ``mongo-ss.yaml`` file:
.. code:: bash
resources:
limits:
cpu: 200m
memory: 5G
When the MongoDB cache occupies a memory greater than 5GB, it is
terminated by the ``kubelet``.
This can usually be verified by logging in to the worker node running MongoDB
container and looking at the syslog (the ``journalctl`` command should usually
work).
This issue is resolved in
`PR #1757 <https://github.com/planetmint/planetmint/pull/1757>`_.
2. 502 Bad Gateway Error on Runscope Tests
------------------------------------------
It means that NGINX could not find the appropriate backed to forward the
requests to. This typically happens when:
#. MongoDB goes down (as described above) and Planetmint, after trying for
``PLANETMINT_DATABASE_MAXTRIES`` times, gives up. The Kubernetes Planetmint
Deployment then restarts the Planetmint pod.
#. Planetmint crashes for some reason. We have seen this happen when updating
Planetmint from one version to the next. This usually means the older
connections to the service gets disconnected; retrying the request one more
time, forwards the connection to the new instance and succeed.
3. Service Unreachable
----------------------
Communication between Kubernetes Services and Deployments fail in
v1.6.6 and before due to a trivial key lookup error for non-existent services
in the ``kubelet``.
This error can be reproduced by restarting any public facing (that is, services
using the cloud load balancer) Kubernetes services, and watching the
``kube-proxy`` failure in its logs.
The solution to this problem is to restart ``kube-proxy`` on the affected
worker/agent node. Login to the worker node and run:
.. code:: bash
docker stop `docker ps | grep k8s_kube-proxy | cut -d" " -f1`
docker logs -f `docker ps | grep k8s_kube-proxy | cut -d" " -f1`
`This issue <https://github.com/kubernetes/kubernetes/issues/48705>`_ is
`fixed in Kubernetes v1.7 <https://github.com/kubernetes/kubernetes/commit/41c4e965c353187889f9b86c3e541b775656dc18>`_.
4. Single Disk Attached to Multiple Mountpoints in a Container
--------------------------------------------------------------
This is currently the issue faced in one of the clusters and being debugged by
the support team at Microsoft.
The issue was first seen on August 29, 2017 on the Test Network and has been
logged in the `Azure/acs-engine repo on GitHub <https://github.com/Azure/acs-engine/issues/1364>`_.
This is apparently fixed in Kubernetes v1.7.2 which include a new disk driver,
but is yet to tested by us.
5. MongoDB Monitoring Agent throws a dial error while connecting to MongoDB
---------------------------------------------------------------------------
You might see something similar to this in the MongoDB Monitoring Agent logs:
.. code:: bash
Failure dialing host without auth. Err: `no reachable servers`
at monitoring-agent/components/dialing.go:278
at monitoring-agent/components/dialing.go:116
at monitoring-agent/components/dialing.go:213
at src/runtime/asm_amd64.s:2086
The first thing to check is if the networking is set up correctly. You can use
the (maybe using the `toolbox` container).
If everything looks fine, it might be a problem with the ``Preferred
Hostnames`` setting in MongoDB Cloud Manager. If you do need to change the
regular expression, ensure that it is correct and saved properly (maybe try
refreshing the MongoDB Cloud Manager web page to see if the setting sticks).
Once you update the regular expression, you will need to remove the deployment
and add it again for the Monitoring Agent to discover and connect to the
MongoDB instance correctly.
More information about this configuration is provided in
:doc:`this document <cloud-manager>`.
6. Create a Persistent Volume from existing Azure disk storage Resource
---------------------------------------------------------------------------
When deleting a k8s cluster, all dynamically-created PVs are deleted, along with the
underlying Azure storage disks (so those can't be used in a new cluster). resources
are also deleted thus cannot be used in a new cluster. This workflow will preserve
the Azure storage disks while deleting the k8s cluster and re-use the same disks on a new
cluster for MongoDB persistent storage without losing any data.
The template to create two PVs for MongoDB Stateful Set (One for MongoDB data store and
the other for MongoDB config store) is located at ``mongodb/mongo-pv.yaml``.
You need to configure ``diskName`` and ``diskURI`` in ``mongodb/mongo-pv.yaml`` file. You can get
these values by logging into your Azure portal and going to ``Resource Groups`` and click on your
relevant resource group. From the list of resources click on the storage account resource and
click the container (usually named as ``vhds``) that contains storage disk blobs that are available
for PVs. Click on the storage disk file that you wish to use for your PV and you will be able to
see ``NAME`` and ``URL`` parameters which you can use for ``diskName`` and ``diskURI`` values in
your template respectively and run the following command to create PVs:
.. code:: bash
$ kubectl --context <context-name> apply -f mongodb/mongo-pv.yaml
.. note::
Please make sure the storage disks you are using are not already being used by any
other PVs. To check the existing PVs in your cluster, run the following command
to get PVs and Storage disk file mapping.
.. code:: bash
$ kubectl --context <context-name> get pv --output yaml