mirror of
				https://github.com/etcd-io/etcd.git
				synced 2024-09-27 06:25:44 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			65 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			65 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # KV API guarantees
 | ||
| 
 | ||
| etcd is a consistent and durable key value store with [mini-transaction][txn] support. The key value store is exposed through the KV APIs. etcd tries to ensure the strongest consistency and durability guarantees for a distributed system. This specification enumerates the KV API guarantees made by etcd.
 | ||
| 
 | ||
| ### APIs to consider
 | ||
| 
 | ||
| * Read APIs
 | ||
|     * range
 | ||
|     * watch
 | ||
| * Write APIs
 | ||
|     * put
 | ||
|     * delete
 | ||
| * Combination (read-modify-write) APIs
 | ||
|     * txn
 | ||
| 
 | ||
| ### etcd specific definitions
 | ||
| 
 | ||
| #### Operation completed
 | ||
| 
 | ||
| An etcd operation is considered complete when it is committed through consensus, and therefore “executed” -- permanently stored -- by the etcd storage engine. The client knows an operation is completed when it receives a response from the etcd server. Note that the client may be uncertain about the status of an operation if it times out, or there is a network disruption between the client and the etcd member. etcd may also abort operations when there is a leader election. etcd does not send `abort` responses to  clients’ outstanding requests in this event.
 | ||
| 
 | ||
| #### Revision
 | ||
| 
 | ||
| An etcd operation that modifies the key value store is assigned a single increasing revision. A transaction operation might modify the key value store multiple times, but only one revision is assigned. The revision attribute of a key value pair that was modified by the operation has the same value as the revision of the operation. The revision can be used as a logical clock for key value store. A key value pair that has a larger revision is modified after a key value pair with a smaller revision. Two key value pairs that have the same revision are modified by an operation "concurrently".
 | ||
| 
 | ||
| ### Guarantees provided
 | ||
| 
 | ||
| #### Atomicity
 | ||
| 
 | ||
| All API requests are atomic; an operation either completes entirely or not at all. For watch requests, all events generated by one operation will be in one watch response. Watch never observes partial events for a single operation.
 | ||
| 
 | ||
| #### Consistency
 | ||
| 
 | ||
| All API calls ensure [sequential consistency][seq_consistency], the strongest consistency guarantee available from distributed systems. No matter which etcd member server a client makes requests to, a client reads the same events in the same order. If two members complete the same number of operations, the state of the two members is consistent.
 | ||
| 
 | ||
| For watch operations, etcd guarantees to return the same value for the same key across all members for the same revision. For range operations, etcd has a similar guarantee for [linearized][Linearizability] access; serialized access may be behind the quorum state, so that the later revision is not yet available.
 | ||
| 
 | ||
| As with all distributed systems, it is impossible for etcd to ensure [strict consistency][strict_consistency]. etcd does not guarantee that it will return to a read the “most recent” value (as measured by a wall clock when a request is completed) available on any cluster member.
 | ||
| 
 | ||
| #### Isolation
 | ||
| 
 | ||
| etcd ensures [serializable isolation][serializable_isolation], which is the highest isolation level available in distributed systems. Read operations will never observe any intermediate data.
 | ||
| 
 | ||
| #### Durability
 | ||
| 
 | ||
| Any completed operations are durable. All accessible data is also durable data. A read will never return data that has not been made durable.
 | ||
| 
 | ||
| #### Linearizability
 | ||
| 
 | ||
| Linearizability (also known as Atomic Consistency or External Consistency) is a consistency level between strict consistency and sequential consistency. 
 | ||
| 
 | ||
| For linearizability, suppose each operation receives a timestamp from a loosely synchronized global clock. Operations are linearized if and only if they always complete as though they were executed in a sequential order and each operation appears to complete in the order specified by the program. Likewise, if an operation’s timestamp precedes another, that operation must also precede the other operation in the sequence.
 | ||
| 
 | ||
| For example, consider a client completing a write at time point 1 (*t1*). A client issuing a read at *t2* (for *t2* > *t1*) should receive a value at least as recent as the previous write, completed at *t1*. However, the read might actually complete only by *t3*, and the returned value, current at *t2* when the read began, might be "stale" by *t3*.
 | ||
| 
 | ||
| etcd does not ensure linearizability for watch operations. Users are expected to verify the revision of watch responses to ensure correct ordering.
 | ||
| 
 | ||
| etcd ensures linearizability for all other operations by default. Linearizability comes with a cost, however, because linearized requests must go through the Raft consensus process. To obtain lower latencies and higher throughput for read requests, clients can configure a request’s consistency mode to `serializable`, which may access stale data with respect to quorum, but removes the performance penalty of linearized accesses' reliance on live consensus.
 | ||
| 
 | ||
| [seq_consistency]: https://en.wikipedia.org/wiki/Consistency_model#Sequential_consistency
 | ||
| [strict_consistency]: https://en.wikipedia.org/wiki/Consistency_model#Strict_consistency
 | ||
| [serializable_isolation]: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable
 | ||
| [Linearizability]: #Linearizability
 | ||
| [txn]: api.md#transactions
 | 
