Files
rippled/md_ripple_net_ShardDownloader.html
2023-04-13 17:45:50 +00:00

298 lines
18 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.17"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>rippled: Shard Downloader</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">rippled
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.17 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
var searchBox = new SearchBox("searchBox", "search",false,'Search');
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
$(function() {
initMenu('',true,false,'search.php','Search');
$(document).ready(function() { init_search(); });
});
/* @license-end */</script>
<div id="main-nav"></div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
</div><!-- top -->
<div class="PageDoc"><div class="header">
<div class="headertitle">
<div class="title">Shard Downloader </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h1><a class="anchor" id="autotoc_md157"></a>
Overview</h1>
<p>This document describes mechanics of the <code>HTTPDownloader</code>, a class that performs the task of downloading shards from remote web servers via HTTP. The downloader utilizes a strand (<code>boost::asio::io_service::strand</code>) to ensure that downloads are never executed concurrently. Hence, if a download is in progress when another download is initiated, the second download will be queued and invoked only when the first download is completed.</p>
<h1><a class="anchor" id="autotoc_md158"></a>
Motivation</h1>
<p>In March 2020 the downloader was modified to include some key features:</p>
<ul>
<li>The ability to stop downloads during a graceful shutdown.</li>
<li>The ability to resume partial downloads after a crash or shutdown.</li>
</ul>
<p>This document was created to document the changes introduced by this change.</p>
<h1><a class="anchor" id="autotoc_md159"></a>
Classes</h1>
<p>Much of the shard downloading process concerns the following classes:</p>
<ul>
<li><p class="startli"><code>HTTPDownloader</code></p>
<p class="startli">This is a generic class designed for serially executing downloads via HTTP.</p>
</li>
<li><p class="startli"><code>ShardArchiveHandler</code></p>
<p class="startli">This class uses the <code>HTTPDownloader</code> to fetch shards from remote web servers. Additionally, the archive handler performs validity checks on the downloaded files and imports the validated files into the local shard store.</p>
<p class="startli">The <code>ShardArchiveHandler</code> exposes a simple public interface:</p>
</li>
</ul>
<div class="fragment"><div class="line">{C++}</div>
<div class="line"> /** Add an archive to be downloaded and imported.</div>
<div class="line"> @param shardIndex the index of the shard to be imported.</div>
<div class="line"> @param url the location of the archive.</div>
<div class="line"> @return `true` if successfully added.</div>
<div class="line"> @note Returns false if called while downloading.</div>
<div class="line"> */</div>
<div class="line"> bool</div>
<div class="line"> add(std::uint32_t shardIndex, std::pair&lt;parsedURL, std::string&gt;&amp;&amp; url);</div>
<div class="line"> </div>
<div class="line"> /** Starts downloading and importing archives. */</div>
<div class="line"> bool</div>
<div class="line"> start();</div>
</div><!-- fragment --><p>When a client submits a <code>download_shard</code> command via the RPC interface, each of the requested files is registered with the handler via the <code>add</code> method. After all the files have been registered, the handler's <code>start</code> method is invoked, which in turn creates an instance of the <code>HTTPDownloader</code> and begins the first download. When the download is completed, the downloader invokes the handler's <code>complete</code> method, which will initiate the download of the next file, or simply return if there are no more downloads to process. When <code>complete</code> is invoked with no remaining files to be downloaded, the handler and downloader are not destroyed automatically, but persist for the duration of the application to assist with graceful shutdowns.</p>
<ul>
<li><p class="startli"><code>DatabaseBody</code></p>
<p class="startli">This class defines a custom message body type, allowing an <code><a href="http::response_parser">http::response_parser</a></code> to write to an SQLite database rather than to a flat file. This class is discussed in further detail in the Recovery section.</p>
</li>
</ul>
<h1><a class="anchor" id="autotoc_md160"></a>
Graceful Shutdowns &amp; Recovery</h1>
<p>This section describes in greater detail how the shutdown and recovery features of the downloader are implemented in C++ using the <code><a class="el" href="namespaceboost_1_1asio.html">boost::asio</a></code> framework.</p>
<h4><a class="anchor" id="autotoc_md161"></a>
Member Variables:</h4>
<p>The variables shown here are members of the <code>HTTPDownloader</code> class and will be used in the following code examples.</p>
<div class="fragment"><div class="line"> {c++}</div>
<div class="line">std::unique_ptr&lt;HTTPStream&gt; stream_;</div>
<div class="line">std::condition_variable c_;</div>
<div class="line">std::atomic&lt;bool&gt; stop_;</div>
</div><!-- fragment --><h2><a class="anchor" id="autotoc_md162"></a>
Graceful Shutdowns</h2>
<h4><a class="anchor" id="autotoc_md163"></a>
Thread 1:</h4>
<p>A graceful shutdown begins when the <code>stop()</code> method of the <code>ShardArchiveHandler</code> is invoked:</p>
<div class="fragment"><div class="line"> {c++}</div>
<div class="line">void</div>
<div class="line">ShardArchiveHandler::stop()</div>
<div class="line">{</div>
<div class="line"> std::lock_guard&lt;std::mutex&gt; lock(m_);</div>
<div class="line"> </div>
<div class="line"> if (downloader_)</div>
<div class="line"> {</div>
<div class="line"> downloader_-&gt;stop();</div>
<div class="line"> downloader_.reset();</div>
<div class="line"> }</div>
<div class="line"> </div>
<div class="line"> stopped();</div>
<div class="line">}</div>
</div><!-- fragment --><p>Inside of <code>HTTPDownloader::stop()</code>, if a download is currently in progress, the <code>stop_</code> member variable is set and the thread waits for the download to stop:</p>
<div class="fragment"><div class="line"> {c++}</div>
<div class="line">void</div>
<div class="line">HTTPDownloader::stop()</div>
<div class="line">{</div>
<div class="line"> std::unique_lock lock(m_);</div>
<div class="line"> </div>
<div class="line"> stop_ = true;</div>
<div class="line"> </div>
<div class="line"> if(sessionActive_)</div>
<div class="line"> {</div>
<div class="line"> // Wait for the handler to exit.</div>
<div class="line"> c_.wait(lock,</div>
<div class="line"> [this]()</div>
<div class="line"> {</div>
<div class="line"> return !sessionActive_;</div>
<div class="line"> });</div>
<div class="line"> }</div>
<div class="line">}</div>
</div><!-- fragment --><h4><a class="anchor" id="autotoc_md164"></a>
Thread 2:</h4>
<p>The graceful shutdown is realized when the thread executing the download polls <code>stop_</code> after this variable has been set to <code>true</code>. Polling occurs while the file is being downloaded, in between calls to <code>async_read_some()</code>. The stop takes effect when the socket is closed and the handler function ( <code>do_session()</code> ) is exited.</p>
<div class="fragment"><div class="line"> {c++}</div>
<div class="line">void HTTPDownloader::do_session()</div>
<div class="line">{</div>
<div class="line"> </div>
<div class="line"> // (Connection initialization logic) . . .</div>
<div class="line"> </div>
<div class="line"> </div>
<div class="line"> // (In between calls to async_read_some):</div>
<div class="line"> if(stop_.load())</div>
<div class="line"> {</div>
<div class="line"> close(p);</div>
<div class="line"> return exit();</div>
<div class="line"> }</div>
<div class="line"> </div>
<div class="line"> // . . .</div>
<div class="line"> </div>
<div class="line"> break;</div>
<div class="line">}</div>
</div><!-- fragment --><h2><a class="anchor" id="autotoc_md165"></a>
Recovery</h2>
<p>Persisting the current state of both the archive handler and the downloader is achieved by leveraging an SQLite database rather than flat files, as the database protects against data corruption that could result from a system crash.</p>
<h4><a class="anchor" id="autotoc_md166"></a>
ShardArchiveHandler</h4>
<p>Although <code>HTTPDownloader</code> is a generic class that could be used to download a variety of file types, currently it is used exclusively by the <code>ShardArchiveHandler</code> to download shards. In order to provide resilience, the <code>ShardArchiveHandler</code> will use an SQLite database to preserve its current state whenever there are active, paused, or queued downloads. The <code>shard_db</code> section in the configuration file allows users to specify the location of the database to use for this purpose.</p>
<h5>SQLite Table Format</h5>
<table class="markdownTable">
<tr class="markdownTableHead">
<th class="markdownTableHeadCenter">Index </th><th class="markdownTableHeadCenter">URL </th></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter">1 </td><td class="markdownTableBodyCenter">https://example.com/1.tar.lz4 </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyCenter">2 </td><td class="markdownTableBodyCenter">https://example.com/2.tar.lz4 </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter">5 </td><td class="markdownTableBodyCenter">https://example.com/5.tar.lz4 </td></tr>
</table>
<h4><a class="anchor" id="autotoc_md167"></a>
HTTPDownloader</h4>
<p>While the archive handler maintains a list of all partial and queued downloads, the <code>HTTPDownloader</code> stores the raw bytes of the file currently being downloaded. The partially downloaded file will be represented as one or more <code>BLOB</code> entries in an SQLite database. As the maximum size of a <code>BLOB</code> entry is currently limited to roughly 2.1 GB, a 5 GB shard file for instance will occupy three database entries upon completion.</p>
<h5>SQLite Table Format</h5>
<p>Since downloads execute serially by design, the entries in this table always correspond to the contents of a single file.</p>
<table class="markdownTable">
<tr class="markdownTableHead">
<th class="markdownTableHeadCenter">Bytes </th><th class="markdownTableHeadCenter">size </th><th class="markdownTableHeadCenter">Part </th></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter">0x... </td><td class="markdownTableBodyCenter">2147483647 </td><td class="markdownTableBodyCenter">0 </td></tr>
<tr class="markdownTableRowEven">
<td class="markdownTableBodyCenter">0x... </td><td class="markdownTableBodyCenter">2147483647 </td><td class="markdownTableBodyCenter">1 </td></tr>
<tr class="markdownTableRowOdd">
<td class="markdownTableBodyCenter">0x... </td><td class="markdownTableBodyCenter">705032706 </td><td class="markdownTableBodyCenter">2 </td></tr>
</table>
<h4><a class="anchor" id="autotoc_md168"></a>
Config File Entry</h4>
<p>The <code>download_path</code> field of the <code>shard_db</code> entry is used to determine where to store the recovery database. If this field is omitted, the <code>path</code> field will be used instead.</p>
<div class="fragment"><div class="line"># This is the persistent datastore for shards. It is important for the health</div>
<div class="line"># of the network that rippled operators shard as much as practical.</div>
<div class="line"># NuDB requires SSD storage. Helpful information can be found on</div>
<div class="line"># https://xrpl.org/history-sharding.html</div>
<div class="line">[shard_db]</div>
<div class="line">type=NuDB</div>
<div class="line">path=/var/lib/rippled/db/shards/nudb</div>
<div class="line">download_path=/var/lib/rippled/db/shards/</div>
<div class="line">max_historical_shards=50</div>
</div><!-- fragment --><h4><a class="anchor" id="autotoc_md169"></a>
Resuming Partial Downloads</h4>
<p>When resuming downloads after a shutdown, crash, or other interruption, the <code>HTTPDownloader</code> will utilize the <code>range</code> field of the HTTP header to download only the remainder of the partially downloaded file.</p>
<div class="fragment"><div class="line"> {C++}</div>
<div class="line">auto downloaded = getPartialFileSize();</div>
<div class="line">auto total = getTotalFileSize();</div>
<div class="line"> </div>
<div class="line">http::request&lt;http::file_body&gt; req {http::verb::head,</div>
<div class="line"> target,</div>
<div class="line"> version};</div>
<div class="line"> </div>
<div class="line">if (downloaded &lt; total)</div>
<div class="line">{</div>
<div class="line"> // If we already downloaded 1000 bytes to the database,</div>
<div class="line"> // the range header will look like:</div>
<div class="line"> // Range: &quot;bytes=1000-&quot;</div>
<div class="line"> req.set(http::field::range, &quot;bytes=&quot; + to_string(downloaded) + &quot;-&quot;);</div>
<div class="line">}</div>
<div class="line">else if(downloaded == total)</div>
<div class="line">{</div>
<div class="line"> // Download is already complete. (Interruption must</div>
<div class="line"> // have occurred after file was downloaded but before</div>
<div class="line"> // the state file was updated.)</div>
<div class="line">}</div>
<div class="line">else</div>
<div class="line">{</div>
<div class="line"> // The size of the partially downloaded file exceeds</div>
<div class="line"> // the total download size. Error condition. Handle</div>
<div class="line"> // appropriately.</div>
<div class="line">}</div>
</div><!-- fragment --><h4><a class="anchor" id="autotoc_md170"></a>
DatabaseBody</h4>
<p>Previously, the <code>HTTPDownloader</code> leveraged an <code><a href="http::response_parser">http::response_parser</a></code> instantiated with an <code><a href="http::file_body">http::file_body</a></code>. The <code>file_body</code> class declares a nested type, <code>reader</code>, which does the task of writing HTTP message payloads (constituting a requested file) to the filesystem. In order for the <code><a href="http::response_parser">http::response_parser</a></code> to interface with the database, we implement a custom body type that declares a nested <code>reader</code> type which has been outfitted to persist octects received from the remote host to a local SQLite database. The code snippet below illustrates the customization points available to user-defined body types:</p>
<div class="fragment"><div class="line"> {C++}</div>
<div class="line">/// Defines a Body type</div>
<div class="line">struct body</div>
<div class="line">{</div>
<div class="line"> /// This determines the return type of the `message::body` member function</div>
<div class="line"> using value_type = ...;</div>
<div class="line"> </div>
<div class="line"> /** An optional function, returns the body&#39;s payload size (which may be</div>
<div class="line"> * zero) */</div>
<div class="line"> static</div>
<div class="line"> std::uint64_t</div>
<div class="line"> size(value_type const&amp; v);</div>
<div class="line"> </div>
<div class="line"> /// The algorithm used for extracting buffers</div>
<div class="line"> class reader;</div>
<div class="line"> </div>
<div class="line"> /// The algorithm used for inserting buffers</div>
<div class="line"> class writer;</div>
<div class="line">}</div>
</div><!-- fragment --><p>Note that the <code>DatabaseBody</code> class is specifically designed to work with <code>asio</code> and follows <code>asio</code> conventions.</p>
<p>The method invoked to write data to the filesystem (or SQLite database in our case) has the following signature:</p>
<div class="fragment"><div class="line"> {C++}</div>
<div class="line">std::size_t</div>
<div class="line">body::reader::put(ConstBufferSequence const&amp; buffers, error_code&amp; ec);</div>
</div><!-- fragment --><h1><a class="anchor" id="autotoc_md171"></a>
Sequence Diagram</h1>
<p>This sequence diagram demonstrates a scenario wherein the <code>ShardArchiveHandler</code> leverages the state persisted in the database to recover from a crash and resume the requested downloads.</p>
<p><img src="./images/interrupt_sequence.png" alt="alt_text" title="Resuming downloads post abort" class="inline"/></p>
<h1><a class="anchor" id="autotoc_md172"></a>
State Diagram</h1>
<p>This diagram illustrates the various states of the Shard Downloader module.</p>
<p><img src="./images/states.png" alt="alt_text" title="Shard Downloader states" class="inline"/> </p>
</div></div><!-- contents -->
</div><!-- PageDoc -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.17
</small></address>
</body>
</html>