gun/web/notes.txt

HOOKS:
 - TRANSPORT:
	Has to basically re-implement set/get/key/all over some transport.
	REMEMBER that you have to subscribe on all set/get/key/all (especially key).


Notes:

Messaging
Every message has who/what/when/where/why/how metadata.
Who is information that identifies the potential keys or locations a reply should be made to, such that a user can receive it no matter where they are
What is the message/delta itself
When is when the message was sent, this may have no relation to a timestamp on a delta transmission
Where is the destination that the message is trying to get to, and the paths it has taken to get there
Why is any arbitrary comment
How is the transformation that caused the message, and potentially the expected transformation to happen

Data
Because data can be in more than one place at a time, its identifier should not be mixed with any whereabouts even if you are planning on only using it within your walled garden.
Instead, the data itself should express its particular locality - information about a statue in Rome should not be identified a being Roman, but express it is Roman.
Therefore, preferrably the identifier for a thing should be globally unique, in case it is ever shared by multiple services - that way conflict won't happen.
This is important because that way any service is forced to do a scan on its own dataset to match with this service's own identifier and link them together - or start fresh if no match exists.
Information is not deleted, mostly just dereferenced - which may eventually cause cold storage, and loss of data as every system has physical limitations.
Arrays cannot be supported because cardinality is actually an expression of the data, not an identifier - for if it were, it would cause concurrency problems resulting in divergence.

Identifiers
Reality seems to be lacking any true identifiers because we perceive things in incredibly mutable states.
Often times attributes that may appear to be static are in fact not - take for instance the color of a car.
One generally assumes that the color will not change, since it is not observed much - however the lifetime of a program may outlive the developer. Even the sun will go dark one day.
This is the power of graphs, particularly in a functional setting - they are searchable, potentially timelessly.
As such identifiers should only be created in response to an attempt to match incoming unfound data with data stored.
If successful, the service is free and should make as many extra identifiers to reference and link that data again as to avoid doing slow scans again.
The most likely candidate for that identifier is the identifier that was given to the service, as there is high probability that the requester will make such requests again.
Throught his means, an identifier can spread like a virus through multiple services, where if it exists it can be retrieved immediately.
Thus, no one identifier should potentially conflict with any other identifier, as that would produce an erroneous lookup and a faulty response.

Do identifiers point to subsets or nodes, or both? Where do identifiers live? Ever in the graph itself? A developer of course would want an identifier to point to both.
However this is a little tricky, because a single node by itself is fairly useless (hermeneutics) because it has no context, especially if it is composed of many parts.
So an identifier referencing a "single node" should technically pull in its relations as well, into the subset. But what then do we return to the developer? The subset or the node?
If you assume the node, then identifiers "point" to different types of data - unless we allow the node the same methods as the subset, where map returns itself.
If we return the subset, it is inconvenient that you then have to scan for the thing you were already identifying, unless the map method also accepts traversal strings.
Identifiers are suppose to intentionally allow humans to think in a document based perspective, however there is no guarantee that the pieces in their system will remain that way.
Take for example a car, you'd assume that the wheel is part of the car and should be referenced as such. But what happens when the tire pops? It is removed and then used as a swing.
Now tell me, the "belongingness" of the tire is now in the hierarchy of the tree that it was hung on. Hierarchy is imposed by directional relations, not as static structures.
So there must be a way to evolve the system, for if we assume the identifier now references the subset, giving context to the original node, we need all of them to be uniquely referenced.
The unique references are the system's internal IDs, not the human friendly identifiers, these must be system-wide guaranteed to be unique.
But if everything has one, no matter how small, then that increases the odds quite substantially of conflicts. Meaning larger IDs need to be generated, which is never pretty.
This also then causes scanning issues, cause now countless tiny nodes are floating around in the system that will get mapped over, that is unpleasant for the developer.
Unless of course there are really good tools that allow them to map with string traversals (known paths) and value traversal (matching structures), in addition to the callback.
Meaning everything should be its own object, but really good search tools should be available to make this easy.
If this is the case, enforce the many-solution, that you alway have a subset and never a single node - a single node is just a subset of one.
Or is it? Or is it the subset of the things that compose the node? Do you hide these from the developer? Expose them?
It seems like subset of one works fine when you already have the larger subset, but not when you are using the initial identifier.
So can we mitigate that by providing really good priority loading tools?

Special Characters
These are reserved for gun, to include meta-data on the data itself.
/ is used as a delimiter in service specific identifiers
# is used to define the transition from a graph identifier, into the nodes themselves
. represents walking the path through a node's scope
_ represents non-human data in the graph that the amnesia machine uses to process stuff
> and < are dedicated to comparing timestamp states, timelessly to avoid malicious abuse (this should always come last, because requests will be rejected without it)
~ is who
* relates to key prefixes
*> lexical constraint
*< lexical constraint
@ is where it should be delivered to.
@> is where it has been or will be rebroadcasted through.
% is for byte constraints.
: is time
$ is value
' is for escaping characters contained within '

{
	who: {}
	,what: {'#": 'email/hi@example.com'}
	,when: {'>': 1406184438192 }
	,where: {}
	,how: '*'
}
{
	_: {
		#: 'email/hi@example.com'
		>: 1406184438192
	}
}

Stream
If we receive a message with data at a symbolic scope that doesn't exist we should still save it.
However if we attempt to read it, we should be unable to access it because the symbols haven't been connected.
That way idempotently, when the symbols are connected, it'll then be readable if they haven't already been overwritten.
Grouping of symbols always inherit and merge, if all of them need to be dereferenced you must explicitly dereference it.
How do we do memory management / clean up then of old dead bad data?

Init
You initialize with a node(s) you want to connect to.

DISK
We might want to compact everything into a journal, with this type of format:
#'ASDF'.'hello'>12743921$"world"
Over the wire though it would include more information, like the ordering of where it has/will-be traversed through.
@>'random','gunjs.herokuapp.com','127.0.0.1:8080':7894657#'ASDF'.'hello'>12743921$"world"
What about ACKs?
@'random':7894657#'ASDF'.'hello'>12743921

There is a limited amount of space on a machine, where we assume it is considered ephemeral as a user viewing device.
Such machines are the default requiring no configuration, machines that are permanent nodes can configure caps.
For instance, the browser would be a looping ephemeral device, where the 5mb localStorage gets recycled.
The localhost server of that device would be permanent and not loop, but halt when storage runs out.
By thus virtue of this, clients should be able to throttle (backpressure) their data intake.
The size of primitive values is fundamentally outside our control and left to the developer or multiparting.
If they decide to have a string that exceeds memory or local disk then their app will always fail regardless of multiparting.
However it is our duty to make sure that the rest of the structured data is streamable, including JSON.
Therefore networks and snapshotting should throttle to a size limit specified by the requesting agent.
Resuming these requests will be based soley on a lexical cursor included in the delivery, to avoid remembering state.
The asynchronicity of these lexical cursor could pose potential concurrency issues, and have to be considered ad hoc.
That is to say, if you are dealing with a large data set, you should always subscribe to updates it and never merely "get" it.
For the edge case of the singleton "get", the lexical cursor will progress until it receives the same or no cursor back from the server and terminate.

total over a range of time: {"alice", "bob", "carl", "david", "ed", "fred", "gary", "harry", "ian", "jake", "kim"}
client requests "users" at limit of 30 characters, server replies with (rounded down) {"alice", "bob", "carl"}.
after a while the client has processed this and goes for the next step, requesting /users?*>=carl&%=30 and {"david", "ed", "fred", "gary"} is returned.
then again /user?*>gary&%=30 with {"harry", "ian", "jake", "kim"} response.
then again /user?*>kim&%=30 with {} response. No subsequent cursor is possible, end.
Note: I do not know yet if the lexical cursor should be open-closed boundary, open-open, closed-closed, or closed-open - decide later.