|
rippled
|
This document describes mechanics of the HTTPDownloader, a class that performs the task of downloading shards from remote web servers via HTTP. The downloader utilizes a strand (boost::asio::io_service::strand) to ensure that downloads are never executed concurrently. Hence, if a download is in progress when another download is initiated, the second download will be queued and invoked only when the first download is completed.
In March 2020 the downloader was modified to include some key features:
This document was created to document the changes introduced by this change.
Much of the shard downloading process concerns the following classes:
HTTPDownloader
This is a generic class designed for serially executing downloads via HTTP.
ShardArchiveHandler
This class uses the HTTPDownloader to fetch shards from remote web servers. Additionally, the archive handler performs validity checks on the downloaded files and imports the validated files into the local shard store.
The ShardArchiveHandler exposes a simple public interface:
When a client submits a download_shard command via the RPC interface, each of the requested files is registered with the handler via the add method. After all the files have been registered, the handler's start method is invoked, which in turn creates an instance of the HTTPDownloader and begins the first download. When the download is completed, the downloader invokes the handler's complete method, which will initiate the download of the next file, or simply return if there are no more downloads to process. When complete is invoked with no remaining files to be downloaded, the handler and downloader are not destroyed automatically, but persist for the duration of the application to assist with graceful shutdowns.
DatabaseBody
This class defines a custom message body type, allowing an http::response_parser to write to an SQLite database rather than to a flat file. This class is discussed in further detail in the Recovery section.
This section describes in greater detail how the shutdown and recovery features of the downloader are implemented in C++ using the boost::asio framework.
The variables shown here are members of the HTTPDownloader class and will be used in the following code examples.
A graceful shutdown begins when the stop() method of the ShardArchiveHandler is invoked:
Inside of HTTPDownloader::stop(), if a download is currently in progress, the stop_ member variable is set and the thread waits for the download to stop:
The graceful shutdown is realized when the thread executing the download polls stop_ after this variable has been set to true. Polling occurs while the file is being downloaded, in between calls to async_read_some(). The stop takes effect when the socket is closed and the handler function ( do_session() ) is exited.
Persisting the current state of both the archive handler and the downloader is achieved by leveraging an SQLite database rather than flat files, as the database protects against data corruption that could result from a system crash.
Although HTTPDownloader is a generic class that could be used to download a variety of file types, currently it is used exclusively by the ShardArchiveHandler to download shards. In order to provide resilience, the ShardArchiveHandler will use an SQLite database to preserve its current state whenever there are active, paused, or queued downloads. The shard_db section in the configuration file allows users to specify the location of the database to use for this purpose.
| Index | URL |
|---|---|
| 1 | https://example.com/1.tar.lz4 |
| 2 | https://example.com/2.tar.lz4 |
| 5 | https://example.com/5.tar.lz4 |
While the archive handler maintains a list of all partial and queued downloads, the HTTPDownloader stores the raw bytes of the file currently being downloaded. The partially downloaded file will be represented as one or more BLOB entries in an SQLite database. As the maximum size of a BLOB entry is currently limited to roughly 2.1 GB, a 5 GB shard file for instance will occupy three database entries upon completion.
Since downloads execute serially by design, the entries in this table always correspond to the contents of a single file.
| Bytes | size | Part |
|---|---|---|
| 0x... | 2147483647 | 0 |
| 0x... | 2147483647 | 1 |
| 0x... | 705032706 | 2 |
The download_path field of the shard_db entry is used to determine where to store the recovery database. If this field is omitted, the path field will be used instead.
When resuming downloads after a shutdown, crash, or other interruption, the HTTPDownloader will utilize the range field of the HTTP header to download only the remainder of the partially downloaded file.
Previously, the HTTPDownloader leveraged an http::response_parser instantiated with an http::file_body. The file_body class declares a nested type, reader, which does the task of writing HTTP message payloads (constituting a requested file) to the filesystem. In order for the http::response_parser to interface with the database, we implement a custom body type that declares a nested reader type which has been outfitted to persist octects received from the remote host to a local SQLite database. The code snippet below illustrates the customization points available to user-defined body types:
Note that the DatabaseBody class is specifically designed to work with asio and follows asio conventions.
The method invoked to write data to the filesystem (or SQLite database in our case) has the following signature:
This sequence diagram demonstrates a scenario wherein the ShardArchiveHandler leverages the state persisted in the database to recover from a crash and resume the requested downloads.

This diagram illustrates the various states of the Shard Downloader module.
1.8.17