diff --git a/documentation/markdown/README.md b/documentation/markdown/README.md index fb07060d4..f03786daf 100644 --- a/documentation/markdown/README.md +++ b/documentation/markdown/README.md @@ -38,7 +38,7 @@ the [changelog](https://github.com/CommunitySolidServer/CommunitySolidServer/blo ## What the internals look like * [How the server uses dependency injection](architecture/dependency-injection.md) - * [What the architecture looks like](architecture/architecture.md) + * [What the architecture looks like](architecture/overview.md) ## Making changes diff --git a/documentation/markdown/architecture/architecture.md b/documentation/markdown/architecture/core.md similarity index 53% rename from documentation/markdown/architecture/architecture.md rename to documentation/markdown/architecture/core.md index 0aa956e8b..07da6b32d 100644 --- a/documentation/markdown/architecture/architecture.md +++ b/documentation/markdown/architecture/core.md @@ -1,15 +1,7 @@ -# Architecture overview +# Core building blocks -The initial architecture document the project was started from can be found [here](https://rubenverborgh.github.io/solid-server-architecture/solid-architecture-v1-3-0.pdf). -Many things have been added since the original inception of the project, -but the core ideas within that document are still valid. - -As can be seen from the architecture, an important idea is the modularity of all components. -No actual implementations are defined there, only their interfaces. -Making all the components independent of each other in such a way provides us with an enormous flexibility: -they can all be replaced by a different implementation, without impacting anything else. -This is how we can provide many different configurations for the server, -and why it is impossible to provide ready solutions for all possible combinations. +There are several core building blocks used in many places of the server. +These are described here. ## Handlers A very important building block that gets reused in many places is the `AsyncHandler`. @@ -48,26 +40,3 @@ Internally this means we are mostly handling data as `Readable` objects. We actually use `Guarded` which is an internal format we created to help us with error handling. Such streams can be created using utility functions such as `guardStream` and `guardedStreamFrom`. Similarly, we have a `pipeSafely` to pipe streams in such a way that also helps with errors. - -## Example request -In this section we will give a high level overview of all the components -a request passes through when it enters the server. -This is specifically an LDP request, e.g. a POST request to create a new resource. - -1. The correct `HttpHandler` gets found, responsible for LDP requests. -2. The HTTP request gets parsed into a manageable format, both body and metadata such as headers. -3. The identification credentials of the request, if any, are extracted and parsed to authenticate the calling agent. -4. The request gets authorized or rejected, based on the credentials from step 3 - and the authorization rules of the target resource. -5. Based on the HTTP method, the corresponding method from the `ResourceStore` gets called, - which in the case of a POST request will return the location of the newly created error. -6. The returned data and metadata get converted to an HTTP response and sent back in the `ResponseWriter`. - -In case any of the steps above error, an error will be thrown. -The `ErrorHandler` will convert the error to an HTTP response to be returned. - -Below are sections that go deeper into the specific steps. -Not all steps are covered yet and will be added in the future. - -* [How authentication and authorization work](features/authorization.md) -* [What the `ResourceStore` looks like](features/resource-store.md) diff --git a/documentation/markdown/architecture/features/authorization.md b/documentation/markdown/architecture/features/authorization.md deleted file mode 100644 index 127dfb85e..000000000 --- a/documentation/markdown/architecture/features/authorization.md +++ /dev/null @@ -1,58 +0,0 @@ -# Authorization - -Authorization is usually handled by the `AuthorizingHttpHandler`, -and goes in the following steps: - - 1. Identify the credentials of the agent making the call. - 2. Extract which access modes are needed for which resources. - 3. Reading the permissions the agent has. - 4. Compare the above results to see if the request is allowed. - -## Authentication -There are multiple `CredentialsExtractor`s that each determine identity in a different way. -Potentially multiple extractors can apply, -making a requesting agent have multiple credentials. -The `DPoPWebIdExtractor` is most relevant for the [Solid-OIDC specification](https://solid.github.io/solid-oidc/), -as it parses the access token generated by a Solid Identity Provider. -Besides that there are always the public credentials, which everyone has. -There are also some debug extractors that can be used to simulate credentials, -which can be enabled as different options through the `config/ldp/authentication` imports. - -If successful, a `CredentialsExtractor` will return a key/value map -linking the type of credentials to their specific values. - -## Modes extraction -Access modes are a predefined list of `read`, `write`, `append`, `create` and `delete`. -The `ModesExtractor`s determine which modes will be necessary and for which resources, -based on the request contents. -The `MethodModesExtractor` determines modes based on the HTTP method. -A GET request will always need the `read` mode for example. -Specifically for PATCH requests there are extractors for each supported PATCH type, -such as the `N3PatchModesExtractor`, -which parses the N3 Patch body to know if it will add new data or only delete data. - -## Permission reading -`PermissionReaders` take the input of the above to determine which permissions are available for which credentials. -The modes from the previous step are not yet needed, -but can be used as optimization as we only need to know if we have permission on those modes. -Each reader returns all the information it can find based on the resources and modes it receives. -Those results then get combined in the `UnionPermissionReader`. -In the default configuration the following readers are combined. - -* `PathBasedReader` rejects all permissions for certain paths, to prevent access to internal data. -* `OwnerPermissionReader` grants control permissions to agents that are trying to access data in a pod that they own. -* `AuxiliaryReader` handles all permissions for auxiliary resources by requesting those of the subject resource if necessary. -* `ParentContainerReader` checks the necessary permissions on a parent container when creating or deleting a resource. -* `WebAclAuxiliaryReader` determines permissions on ACL resources by requesting if the subject resource has control permissions. -* `WebAclReader` reads out the relevant ACL resource to read out the defined permissions. - -All of the above is if you have WebACL enabled. -It is also possible to always grant all permissions for debugging reasons -by changing the authorization import to `config/ldp/authorization/allow-all.json`. - -## Authorization -All the results of the previous steps then get combined to either allow or reject a request. -If no permissions are found for a requested mode, -or they are explicitly forbidden, -a 401/403 will be returned, -depending on if the agent was logged in or not. diff --git a/documentation/markdown/architecture/features/cli.md b/documentation/markdown/architecture/features/cli.md new file mode 100644 index 000000000..bcc871a5e --- /dev/null +++ b/documentation/markdown/architecture/features/cli.md @@ -0,0 +1,52 @@ +# Parsing Command line arguments + +When starting the server, the application actually uses Components.js twice to instantiate components. +The first instantiation is used to parse the command line arguments. +These then get converted into Components.js variables and are used to instantiate the actual server. + +## Architecture + +```mermaid +flowchart TD + CliResolver("CliResolver
CliResolver") + CliResolver --> CliResolverArgs + + subgraph CliResolverArgs[" "] + CliExtractor("CliExtractor
YargsCliExtractor") + ShorthandResolver("ShorthandResolver
CombinedShorthandResolver") + end + + ShorthandResolver --> ShorthandResolverArgs + subgraph ShorthandResolverArgs[" "] + BaseUrlExtractor("
BaseUrlExtractor") + KeyExtractor("
KeyExtractor") + AssetPathExtractor("
AssetPathExtractor") + end +``` + +The `CliResolver` (`urn:solid-server-app-setup:default:CliResolver`) is simply a way +to combine both the `CliExtractor` (`urn:solid-server-app-setup:default:CliExtractor`) +and `ShorthandResolver` (`urn:solid-server-app-setup:default:ShorthandResolver`) +into a single object and has no other function. + +Which arguments are supported and which Components.js variables are generated +can depend on the configuration that is being used. +For example, for an HTTPS server additional arguments will be needed to specify the necessary key/cert files. + +## CliResolver +The `CliResolver` converts the incoming string of arguments into a key/value object. +By default, a `YargsCliExtractor` is used, which makes use of the `yargs` library and is configured similarly. + +## ShorthandResolver +The `ShorthandResolver` uses the key/value object that was generated above to generate Components.js variable bindings. +A `CombinedShorthandResolver` combines the results of multiple `ShorthandExtractor` +by mapping their values to specific variables. +For example, a `BaseUrlExtractor` will be used to extract the value for `baseUrl`, +or `port` if no `baseUrl` value is provided, +and use it to generate the value for the variable `urn:solid-server:default:variable:baseUrl`. + +These extractors are also where the default values for the server are defined. +For example, BaseUrlExtractor will be instantiated with a default port of `3000` +which will be used if no port is provided. + +The variables generated here will be used to [initialize the server](initialization.md). diff --git a/documentation/markdown/architecture/features/http-handler.md b/documentation/markdown/architecture/features/http-handler.md new file mode 100644 index 000000000..ef463167d --- /dev/null +++ b/documentation/markdown/architecture/features/http-handler.md @@ -0,0 +1,86 @@ +# Handling HTTP requests +The direction of the arrows was changed slightly here to make the graph readable. +```mermaid +flowchart LR + HttpHandler("HttpHandler
SequenceHandler") + HttpHandler --> HttpHandlerArgs + + subgraph HttpHandlerArgs[" "] + direction LR + Middleware("Middleware
HttpHandler") + WaterfallHandler("
WaterfallHandler") + end + + Middleware --> WaterfallHandler + WaterfallHandler --> WaterfallHandlerArgs + + subgraph WaterfallHandlerArgs[" "] + direction TB + StaticAssetHandler("StaticAssetHandler
StaticAssetHandler") + SetupHandler("SetupHandler
HttpHandler") + OidcHandler("OidcHandler
HttpHandler") + AuthResourceHttpHandler("AuthResourceHttpHandler
HttpHandler") + IdentityProviderHttpHandler("IdentityProviderHttpHandler
HttpHandler") + LdpHandler("LdpHandler
HttpHandler") + end + + StaticAssetHandler --> SetupHandler + SetupHandler --> OidcHandler + OidcHandler --> AuthResourceHttpHandler + AuthResourceHttpHandler --> IdentityProviderHttpHandler + IdentityProviderHttpHandler --> LdpHandler +``` + +The `HttpHandler` is responsible for handling an incoming HTTP request. +The request will always first go through the `Middleware`, +where certain required headers will be added such as CORS headers. + +After that it will go through the list in the `WaterfallHandler` +to find the first handler that understands the request, +with the `LdpHandler` at the bottom being the catch-all default. + +## StaticAssetHandler +The `urn:solid-server:default:StaticAssetHandler` matches exact URLs to static assets which require no further logic. +An example of this is the favicon, where the `/favicon.ico` URL +is directed to the favicon file at `/templates/images/favicon.ico`. +It can also map entire folders to a specific path, such as `/.well-known/css/styles/` which contains all stylesheets. + +## SetupHandler +The `urn:solid-server:default:SetupHandler` is responsible +for redirecting all requests to `/setup` until setup is finished, +thereby ensuring that setup needs to be finished before anything else can be done on the server, +and handling the actual setup request that is sent to `/setup`. +Once setup is finished, this handler will reject all requests and thus no longer be relevant. + +If the server is configured to not have setup enabled, +the corresponding identifier will point to a handler that always rejects all requests. + +## OidcHandler +The `urn:solid-server:default:OidcHandler` handles all requests related +to the Solid-OIDC [specification](https://solid.github.io/solid-oidc/). +The OIDC component is configured to work on the `/.oidc/` subpath, +so this handler catches all those requests and sends them to the internal OIDC library that is used. + +## AuthResourceHttpHandler +The `urn:solid-server:default:AuthResourceHttpHandler` is identical +to the `urn:solid-server:default:LdpHandler` which will be discussed below, +but only handles resources relevant for authorization. + +In practice this means that is your server is configured +to use [Web Access Control](https://solidproject.org/TR/wac) for authorization, +this handler will catch all requests targeting `.acl` resources. + +The reason these already need to be handled here is so these can also be used +to allow authorization on the following handler(s). +More on this can be found in the [identity provider](../../../usage/identity-provider/#access) documentation + +## IdentityProviderHttpHandler +The `urn:solid-server:default:IdentityProviderHttpHandler` handles everything +related to our custom identity provider API, such as registering, logging in, returning the relevant HTML pages, etc. +All these requests are identified by being on the `/idp/` subpath. +More information on the API can be found in the [identity provider](../../../usage/identity-provider) documentation + +## LdpHandler +Once a request reaches the `urn:solid-server:default:LdpHandler`, +the server assumes this is a standard Solid request according to the Solid protocol. +A detailed description of what happens then can be found [here](protocol/overview.md) diff --git a/documentation/markdown/architecture/features/initialization.md b/documentation/markdown/architecture/features/initialization.md new file mode 100644 index 000000000..9309037f0 --- /dev/null +++ b/documentation/markdown/architecture/features/initialization.md @@ -0,0 +1,124 @@ +# Server initialization + +When starting the server, multiple Initializers trigger to set up everything correctly, +the last one of which starts listening to the specified port. +Similarly, when stopping the server several Finalizers trigger to clean up where necessary, +although the latter only happens when starting the server through code. + +## App +```mermaid +flowchart TD + App("App
App") + App --> AppArgs + + subgraph AppArgs[" "] + Initializer("Initializer
Initializer") + AppFinalizer("Finalizer
Finalizer") + end +``` + +`App` (`urn:solid-server:default:App`) is the main component that gets instantiated by Components.js. +Every other component should be able to trace an instantiation path back to it if it also wants to be instantiated. + +It's only function is to contain an `Initializer` and `Finalizer` +which get called by calling `start`/`stop` respectively. + +## Initializer +```mermaid +flowchart TD + Initializer("Initializer
SequenceHandler") + Initializer --> InitializerArgs + + subgraph InitializerArgs[" "] + direction LR + LoggerInitializer("LoggerInitializer
LoggerInitializer") + PrimaryInitializer("PrimaryInitializer
ProcessHandler") + WorkerInitializer("WorkerInitializer
ProcessHandler") + end + + LoggerInitializer --> PrimaryInitializer + PrimaryInitializer --> WorkerInitializer +``` + +The very first thing that needs to happen is initializing the logger. +Before this other classes will be unable to use logging. + +The `PrimaryInitializer` will only trigger once, in the primary worker thread, +while the `WorkerInitializer` will trigger for every worker thread. +Although if your server setup is single-threaded, which is the default, +there is no relevant difference between those two. + +### PrimaryInitializer +```mermaid +flowchart TD + PrimaryInitializer("PrimaryInitializer
ProcessHandler") + PrimaryInitializer --> PrimarySequenceInitializer("PrimarySequenceInitializer
SequenceHandler") + PrimarySequenceInitializer --> PrimarySequenceInitializerArgs + + subgraph PrimarySequenceInitializerArgs[" "] + direction LR + CleanupInitializer("CleanupInitializer
SequenceHandler") + PrimaryParallelInitializer("PrimaryParallelInitializer
ParallelHandler") + WorkerManager("WorkerManager
WorkerManager") + end + + CleanupInitializer --> PrimaryParallelInitializer + PrimaryParallelInitializer --> WorkerManager +``` +The above is a simplification of all the initializers that are present in the `PrimaryInitializer` +as there are several smaller initializers that also trigger but are less relevant here. + +The `CleanupInitializer` is an initializer that cleans up anything +that might have remained from a previous server start +and could impact behaviour. +Relevant components in other parts of the configuration are responsible for adding themselves to this array if needed. +An example of this is file-based locking components which might need to remove any dangling locking files. + +The `PrimaryParallelInitializer` can be used to add any initializers to that have to happen in the primary process. +This makes it easier for users to add initializers by being able to append to its handlers. + +The `WorkerManager` is responsible for setting up the worker threads, if any. + +### WorkerInitializer +```mermaid +flowchart TD + WorkerInitializer("WorkerInitializer
ProcessHandler") + WorkerInitializer --> WorkerSequenceInitializer("WorkerSequenceInitializer
SequenceHandler") + WorkerSequenceInitializer --> WorkerSequenceInitializerArgs + + subgraph WorkerSequenceInitializerArgs[" "] + direction LR + WorkerParallelInitializer("WorkerParallelInitializer
ParallelHandler") + ServerInitializer("ServerInitializer
ServerInitializer") + end + + WorkerParallelInitializer --> ServerInitializer +``` +The `WorkerInitializer` is quite similar to the `PrimaryInitializer` but triggers once per worker thread. +Like the `PrimaryParallelInitializer`, the `WorkerParallelInitializer` can be used +to add any custom initializers that need to run. + +### ServerInitializer +The `ServerInitializer` is the initializer that finally starts up the server by listening to the relevant port, +once all the initialization described above is finished. +This is an example of a component that differs based on some of the choices made during configuration. +```mermaid +flowchart TD + ServerInitializer("ServerInitializer
ServerInitializer") + ServerInitializer --> WebSocketServerFactory("ServerFactory
WebSocketServerFactory") + WebSocketServerFactory --> BaseHttpServerFactory("
BaseHttpServerFactory") + BaseHttpServerFactory --> HttpHandler("HttpHandler
HttpHandler") + + ServerInitializer2("ServerInitializer
ServerInitializer") + ServerInitializer2 ---> BaseHttpServerFactory2("ServerFactory
BaseHttpServerFactory") + BaseHttpServerFactory2 --> HttpHandler2("HttpHandler
HttpHandler") +``` + +Depending on if the configurations necessary for websockets are imported or not, +the `urn:solid-server:default:ServerFactory` identifier will point to a different resource. +There will always be a `BaseHttpServerFactory` that starts the HTTP(S) server, +but there might also be a `WebSocketServerFactory` wrapped around it to handle websocket support. +Although not indicated here, the parameters for initializing the `BaseHttpServerFactory` +might also differ in case an HTTPS configuration is imported. + +The `HttpHandler` it takes as input is responsible for how [HTTP requests get resolved](http-handler.md). diff --git a/documentation/markdown/architecture/features/protocol/authorization.md b/documentation/markdown/architecture/features/protocol/authorization.md new file mode 100644 index 000000000..b908b4ba5 --- /dev/null +++ b/documentation/markdown/architecture/features/protocol/authorization.md @@ -0,0 +1,163 @@ +# Authorization + +```mermaid +flowchart TD + AuthorizingHttpHandler("
AuthorizingHttpHandler") + AuthorizingHttpHandler --> AuthorizingHttpHandlerArgs + + subgraph AuthorizingHttpHandlerArgs[" "] + CredentialsExtractor("CredentialsExtractor
CredentialsExtractor") + ModesExtractor("ModesExtractor
ModesExtractor") + PermissionReader("PermissionReader
PermissionReader") + Authorizer("Authorizer
PermissionBasedAuthorizer") + OperationHttpHandler("
OperationHttpHandler") + end +``` + +Authorization is usually handled by the `AuthorizingHttpHandler`, +which receives a parsed HTTP request in the form of an `Operation`. +It goes through the following steps: + +1. A `CredentialsExtractor` identifies the credentials of the agent making the call. +2. A `ModesExtractor` finds which access modes are needed for which resources. +3. A `PermissionReader` determines the permissions the agent has on the targeted resources. +4. The above results are compared in an `Authorizer`. +5. If the request is allowed, call the `OperationHttpHandler`, otherwise throw an error. + +## Authentication +There are multiple `CredentialsExtractor`s that each determine identity in a different way. +Potentially multiple extractors can apply, +making a requesting agent have multiple credentials. + +The diagram below shows the default configuration if authentication is enabled. + +```mermaid +flowchart TD + CredentialsExtractor("CredentialsExtractor
UnionCredentialsExtractor") + CredentialsExtractor --> CredentialsExtractorArgs + + subgraph CredentialsExtractorArgs[" "] + WaterfallHandler("
WaterfallHandler") + PublicCredentialsExtractor("
PublicCredentialsExtractor") + end + + WaterfallHandler --> WaterfallHandlerArgs + subgraph WaterfallHandlerArgs[" "] + direction LR + DPoPWebIdExtractor("
DPoPWebIdExtractor") --> BearerWebIdExtractor("
BearerWebIdExtractor") + end +``` + +Both of the WebID extractors make use of +the (`access-token-verifier`)[https://github.com/CommunitySolidServer/access-token-verifier] library +to parse incoming tokens based on the [Solid-OIDC specification](https://solid.github.io/solid-oidc/). +Besides those there are always the public credentials, which everyone has. +All these credentials then get combined into a single union object. + +If successful, a `CredentialsExtractor` will return a key/value map +linking the type of credentials to their specific values. + +There are also debug configuration options available that can be used to simulate credentials. +These can be enabled as different options through the `config/ldp/authentication` imports. + +## Modes extraction +Access modes are a predefined list of `read`, `write`, `append`, `create` and `delete`. +The `ModesExtractor` determine which modes will be necessary and for which resources, +based on the request contents. + +```mermaid +flowchart TD + ModesExtractor("ModesExtractor
IntermediateCreateExtractor") + ModesExtractor --> HttpModesExtractor("HttpModesExtractor
WaterfallHandler") + + HttpModesExtractor --> HttpModesExtractorArgs + + subgraph HttpModesExtractorArgs[" "] + direction LR + PatchModesExtractor("PatchModesExtractor
ModesExtractor") --> MethodModesExtractor("
MethodModesExtractor") + end +``` + +The `IntermediateCreateExtractor` is responsible if requests try to create intermediate containers with a single request. +E.g., a PUT request to `/foo/bar/baz` should create both the `/foo/` and `/foo/bar/` containers in case they do not exist yet. +This extractor makes sure that `create` permissions are also checked on those containers. + +Modes can usually be determined based on just the HTTP methods, +which is what the `MethodModesExtractor` does. +A GET request will always need the `read` mode for example. + +The only exception are PATCH requests, +where the necessary modes depend on the body and the PATCH type. + +```mermaid +flowchart TD + PatchModesExtractor("PatchModesExtractor
WaterfallHandler") --> PatchModesExtractorArgs + subgraph PatchModesExtractorArgs[" "] + N3PatchModesExtractor("
N3PatchModesExtractor") + SparqlUpdateModesExtractor("
SparqlUpdateModesExtractor") + end +``` + +The server supports both N3 Patch and SPARQL Update PATCH requests. +In both cases it will parse the bodies to determine what the impact would be of the request and what modes it requires. + +## Permission reading +`PermissionReaders` take the input of the above to determine which permissions are available for which credentials. +The modes from the previous step are not yet needed, +but can be used as optimization as we only need to know if we have permission on those modes. +Each reader returns all the information it can find based on the resources and modes it receives. +In the default configuration the following readers are combined when WebACL is enabled as authorization method. +In case authorization is disabled by changing the authorization import to `config/ldp/authorization/allow-all.json`, +this diagram is just a class that always returns all permissions. + +```mermaid +flowchart TD + PermissionReader("PermissionReader
AuxiliaryReader") + PermissionReader --> UnionPermissionReader("
UnionPermissionReader") + UnionPermissionReader --> UnionPermissionReaderArgs + + subgraph UnionPermissionReaderArgs[" "] + PathBasedReader("PathBasedReader
PathBasedReader") + OwnerPermissionReader("OwnerPermissionReader
OwnerPermissionReader") + WrappedWebAclReader("WrappedWebAclReader
ParentContainerReader") + end + + WrappedWebAclReader --> WebAclAuxiliaryReader("WebAclAuxiliaryReader
WebAclAuxiliaryReader") + WebAclAuxiliaryReader --> WebAclReader("WebAclReader
WebAclReader") +``` + +The first thing that happens is that if the target is an auxiliary resource that uses the authorization of its subject resource, +the `AuxiliaryReader` inserts that identifier instead. +An example of this is if the requests targets the metadata of a resource. + +The `UnionPermissionReader` then combines the results of its readers into a single permission object. +If one reader rejects a specific mode and another allows it, the rejection takes priority. + +The `PathBasedReader` rejects all permissions for certain paths. +This is used to prevent access to the internal data of the server. + +The `OwnerPermissionReader` makes sure owners always have control access +to the [pods they created on the server](../../../../usage/identity-provider/#pod). +Users will always be able to modify the ACL resources in their pod, +even if they accidentally removed their own access. + +The final readers are specifically relevant for the WebACL algorithm. +The `ParentContainerReader` checks the permissions on a parent resource if required: +creating a resource requires `append` permissions on the parent container, +while deleting a resource requires `write` permissions there. + +In case the target is an ACL resource, `control` permissions need to be checked, +no matter what mode was generated by the `ModesExtractor`. +The `WebAclAuxiliaryReader` makes sure this conversion happens. + +Finally, the `WebAclReader` implements +the [efffective ACL resource algorithm](https://solidproject.org/TR/2021/wac-20210711#effective-acl-resource) +and returns the permissions it finds in that resource. +In case no ACL resource is found this indicates a configuration error and no permissions will be granted. + +## Authorization +All the results of the previous steps then get combined in the `PermissionBasedAuthorizer` to either allow or reject a request. +If no permissions are found for a requested mode, +or they are explicitly forbidden, +a 401/403 will be returned, +depending on if the agent was logged in or not. diff --git a/documentation/markdown/architecture/features/protocol/overview.md b/documentation/markdown/architecture/features/protocol/overview.md new file mode 100644 index 000000000..07305063f --- /dev/null +++ b/documentation/markdown/architecture/features/protocol/overview.md @@ -0,0 +1,30 @@ +# Solid protocol +The `LdpHandler`, named as a reference to the Linked Data Platform specification, +chains several handlers together, each with their own specific purpose, to fully resolve the HTTP request. +It specifically handles Solid requests as described +in the protocol [specification](https://solidproject.org/TR/protocol), +e.g. a POST request to create a new resource. + +Below is a simplified view of how these handlers are linked. + +```mermaid +flowchart LR + LdpHandler("LdpHandler
ParsingHttphandler") + LdpHandler --> AuthorizingHttpHandler("
AuthorizingHttpHandler") + AuthorizingHttpHandler --> OperationHandler("OperationHandler
OperationHandler") + OperationHandler --> ResourceStore("ResourceStore
ResourceStore") +``` + +A standard request would go through the following steps: + +1. The `ParsingHttphandler` parses the HTTP request into a manageable format, both body and metadata such as headers. +2. The `AuthorizingHttpHandler` verifies if the request is authorized to access the targeted resource. +3. The `OperationHandler` determines which action is required based on the HTTP method. +4. The `ResourceStore` does all the relevant data work. +5. The `ParsingHttphandler` eventually receives the response data, or an error, and handles the output. + +Below are sections that go deeper into the specific steps. + +* [How input gets parsed and output gets returned](parsing.md) +* [How authentication and authorization work](authorization.md) +* [What the `ResourceStore` looks like](resource-store.md) diff --git a/documentation/markdown/architecture/features/protocol/parsing.md b/documentation/markdown/architecture/features/protocol/parsing.md new file mode 100644 index 000000000..26ae63ade --- /dev/null +++ b/documentation/markdown/architecture/features/protocol/parsing.md @@ -0,0 +1,102 @@ +# Parsing and responding to HTTP requests +```mermaid +flowchart TD + ParsingHttphandler("
ParsingHttphandler") + ParsingHttphandler --> ParsingHttphandlerArgs + + subgraph ParsingHttphandlerArgs[" "] + RequestParser("RequestParser
BasicRequestParser") + AuthorizingHttpHandler("
AuthorizingHttpHandler") + ErrorHandler("ErrorHandler
ErrorHandler") + ResponseWriter("ResponseWriter
BasicResponseWriter") + end +``` + +A `ParsingHttpHandler` handles both the parsing of the input data, and the serializing of the output data. +It follows these 3 steps: + +1. Use the `RequestParser` to convert the incoming data into an `Operation`. +2. Send the `Operation` to the `AuthorizingHttpHandler` to receive either a `Representation` if the operation was a success, + or an `Error` in case something went wrong. + * In case of an error the `ErrorHandler` will convert the `Error` into a `ResponseDescription`. +3. Use the `ResponseWriter` to output the `ResponseDescription` as an HTTP response. + +## Parsing the request +```mermaid +flowchart TD + RequestParser("RequestParser
BasicRequestParser") --> RequestParserArgs + subgraph RequestParserArgs[" "] + TargetExtractor("TargetExtractor
OriginalUrlExtractor") + PreferenceParser("PreferenceParser
AcceptPreferenceParser") + MetadataParser("MetadataParser
MetadataParser") + BodyParser("
Bodyparser") + Conditions("
BasicConditionsParser") + end + + OriginalUrlExtractor --> IdentifierStrategy("IdentifierStrategy
IdentifierStrategy") +``` +The `BasicRequestParser` is mostly an aggregator of multiple smaller parsers that each handle a very specific part. + +### URL +This is a single class, the `OriginalUrlExtractor`, but fulfills the very important role +of making sure input URLs are handled consistently. + +The query parameters will always be completely removed from the URL. + +There is also an algorithm to make sure all URLs have a "canonical" version as for example both `&` and `%26` +can be interpreted in the same way. +Specifically all special characters will be encoded into their percent encoding. + +The `IdentifierStrategy` it gets as input is used to determine if the resulting URL is within the scope of the server. +This can differ depending on if the server uses subdomains or not. + +The resulting identifier will be stored in the `target` field of an `Operation` object. + +### Preferences +The `AcceptPreferenceParser` parses the `Accept` header and all the relevant `Accept-*` headers. +These will all be put into the `preferences` field of an `Operation` object. +These will later be used to handle the content negotiation. + +For example, when sending an `Accept: text/turtle; q=0.9` header, +this wil result in the preferences object `{ type: { 'text/turtle': 0.9 } }`. + +### Headers +Several other headers can have relevant metadata, +such as the `Content-Type` header, +or the `Link: ; rel="type"` header +which is used to indicate to the server that a request intends to create a container. + +Such headers are converted to RDF triples and stored in the `RepresentationMetadata` object, +which will be part of the `body` field in the `Operation`. + +The default `MetadataParser` is a `ParallelHandler` that contains several smaller parsers, +each looking at a specific header. + +### Body +In case of most requests, the input data stream is used directly in the `body` field of the `Operation`, +with a few minor checks to make sure the HTTP specification is being followed. + +In the case of PATCH requests though, +there are several specific body parsers that will convert the request +into a JavaScript object containing all the necessary information to execute such a PATCH. +Several validation checks will already take place there as well. + +### Conditions +The `BasicConditionsParser` parses everything related to conditions headers, +such as `if-none-match` or `if-modified-since`, +and stores the relevant information in the `conditions` field of the `Operation`. +These will later be used to make sure the request should be aborted or not. + +## Sending the response +In case a request is successful, the `AuthorizingHttpHandler` will return a `ResponseDescription`, +and if not it will throw an error. + +In case an error gets thrown, this will be caught by the `ErrorHandler` and converted into a `ResponseDescription`. +The request preferences will be used to make sure the serialization is one that is preferred. + +Either way we will have a `ResponseDescription`, +which will be sent to the `BasicResponseWriter` to convert into output headers, data and a status code. + +To convert the metadata into headers, it uses a `MetadataWriter`, +which functions as the reverse of the `MetadataParser` mentioned above: +it has multiple writers which each convert certain metadata into a specific header. diff --git a/documentation/markdown/architecture/features/resource-store.md b/documentation/markdown/architecture/features/protocol/resource-store.md similarity index 98% rename from documentation/markdown/architecture/features/resource-store.md rename to documentation/markdown/architecture/features/protocol/resource-store.md index 5fc76985f..7736d99eb 100644 --- a/documentation/markdown/architecture/features/resource-store.md +++ b/documentation/markdown/architecture/features/protocol/resource-store.md @@ -1,6 +1,4 @@ # Resource store -Once an LDP request passes authorization, it will be passed to the `ResourceStore`. - The interface of a `ResourceStore` is mostly a 1-to-1 mapping of the HTTP methods: * GET: `getRepresentation` diff --git a/documentation/markdown/architecture/overview.md b/documentation/markdown/architecture/overview.md new file mode 100644 index 000000000..2a6e8432b --- /dev/null +++ b/documentation/markdown/architecture/overview.md @@ -0,0 +1,62 @@ +# Architecture overview + +The initial architecture document the project was started from can be found +[here](https://rubenverborgh.github.io/solid-server-architecture/solid-architecture-v1-3-0.pdf). +Many things have been added since the original inception of the project, +but the core ideas within that document are still valid. + +As can be seen from the architecture, an important idea is the modularity of all components. +No actual implementations are defined there, only their interfaces. +Making all the components independent of each other in such a way provides us with an enormous flexibility: +they can all be replaced by a different implementation, without impacting anything else. +This is how we can provide many different configurations for the server, +and why it is impossible to provide ready solutions for all possible combinations. + +## Architecture diagrams + +Having a modular architecture makes it more difficult to give a complete architecture overview. +We will limit ourselves to the more commonly used default configurations we provide, +and in certain cases we might give examples of what differences there are +based on what configurations are being imported. + +To do this we will make use of architecture diagrams. +We will use an example below to explain the formatting used throughout the architecture documentation: + +```mermaid +flowchart TD + LdpHandler("LdpHandler
ParsingHttphandler") + LdpHandler --> LdpHandlerArgs + + subgraph LdpHandlerArgs[" "] + RequestParser("RequestParser
BasicRequestParser") + Auth("
AuthorizingHttpHandler") + ErrorHandler("ErrorHandler
ErrorHandler") + ResponseWriter("ResponseWriter
BasicResponseWriter") + end +``` + +Below is a summary of how to interpret such diagrams: + +* Rounded red box: component instantiated in the Components.js [configuration](dependency-injection.md). + * First line: + * **Bold text**: shorthand of the instance identifier. In case the full URI is not specified, + it can usually be found by prepending `urn:solid-server:default:` to the shorthand identifier. + * (empty): this instance has no identifier and is defined in the same place as its parent. + * Second line: + * Regular text: The class of this instance. + * _Italic text_: The interface of this instance. + Will be used if the actual class is not relevant for the explanation or can differ. +* Square grey box: the parameters of the linked instance. +* Arrow: links an instance to its parameters. Can also be used to indicate the order of parameters if relevant. + +For example, in the above, **LdpHandler** is a shorthand for the actual identifier `urn:solid-server:default:LdpHandler` +and is an instance of `ParsingHttpHandler`. It has 4 parameters, +one of which has no identifier but is an instance of `AuthorizingHttpHandler`. + +# Features +Below are the sections that go deeper into the features of the server and how those work. + +* [How Command Line Arguments are parsed and used](features/cli.md) +* [How the server is initialized and started](features/initialization.md) +* [How HTTP requests are handled](features/http-handler.md) +* [How the server handles a standard Solid request](features/protocol/overview.md) diff --git a/documentation/markdown/usage/identity-provider.md b/documentation/markdown/usage/identity-provider.md index 95aecc279..ce9ea3d1c 100644 --- a/documentation/markdown/usage/identity-provider.md +++ b/documentation/markdown/usage/identity-provider.md @@ -14,6 +14,7 @@ The links here assume the server is hosted at `http://localhost:3000/`. To register an account, you can go to `http://localhost:3000/idp/register/` if this feature is enabled, which it is on all configurations we provide. Currently our registration page ties 3 features together on the same page: + * Creating an account on the server. * Creating or linking a WebID to your account. * Creating a pod on the server. diff --git a/documentation/mkdocs.yml b/documentation/mkdocs.yml index 8d73187a4..4d01e22cf 100644 --- a/documentation/mkdocs.yml +++ b/documentation/mkdocs.yml @@ -54,7 +54,12 @@ markdown_extensions: - pymdownx.highlight - pymdownx.superfences - pymdownx.smartsymbols - + - pymdownx.superfences: + custom_fences: + # need to fork the theme to make changes https://github.com/squidfunk/mkdocs-material/issues/3665#issuecomment-1060019924 + - name: mermaid + class: mermaid + format: !!python/name:pymdownx.superfences.fence_code_format extra: version: @@ -79,11 +84,18 @@ nav: - Client credentials: usage/client-credentials.md - Seeding pods: usage/seeding-pods.md - Architecture: - - Architecture: architecture/architecture.md + - Overview: architecture/overview.md - Dependency injection: architecture/dependency-injection.md + - Core: architecture/core.md - Features: - - Authorization: architecture/features/authorization.md - - Resource Store: architecture/features/resource-store.md + - Command line arguments: architecture/features/cli.md + - Server initialization: architecture/features/initialization.md + - HTTP requests: architecture/features/http-handler.md + - Solid protocol: + - Overview: architecture/features/protocol/overview.md + - Parsing: architecture/features/protocol/parsing.md + - Authorization: architecture/features/protocol/authorization.md + - Resource Store: architecture/features/protocol/resource-store.md - Contributing: - Pull requests: contributing/making-changes.md - Releases: contributing/release.md @@ -91,3 +103,4 @@ nav: # To write documentation locally, execute the next line and browse to http://localhost:8000 # docker run --rm -it -p 8000:8000 -v ${PWD}/documentation:/docs squidfunk/mkdocs-material +# Alternatively, install `mkdocs` and `mkdocs-material` using `pip`, browse to the documentation folder and run `mkdocs serve`