When no address is pined, and balancer ignores the addr Up due to
its current unhealthy state, balancer will be unresponsive forever.
This PR fixes it by doing a full reset when there is no pined addr,
thus re-trigger the Up call.
Previous behavior is when server returns errors, retry
wrapper does not do anything, while passively expecting
balancer to gray-list the isolated endpoint. This is
problematic when multiple endpoints are passed, and
network partition happens.
This patch adds 'endpointError' method to 'balancer' interface
to actively(possibly even before health-check API gets called)
handle RPC errors and gray-list endpoints for the time being,
thus speeding up the endpoint switch.
This is safe in a single-endpoint case, because balancer will
retry no matter what in such case.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
If the balancer update notification loop starts with a downed
connection and endpoints are switched while the old connection is up,
the balancer can potentially wait forever for an up connection without
refreshing the connections to reflect the current endpoints.
Instead, fetch upc/downc together, only caring about a single transition
either from down->up or up->down for each iteration
Simple way to reproduce failures: add time.Sleep(time.Second) to the
beginning of the update notification loop.
The simpleBalancer.Get() blocks grpc.Invoke() even when the Invoke() is called
with the FailFast option. Therefore currently any requests with the
FastFail option actually doesn't fail fast. They get blocked when there is
no endpoints available.
Get() method needs to respect the BlockingWait option when
picks up an endpoint address from the list and fail immediately when the option is
enabled and no endpoint is available.