inc'ing feedback

2025-11-20 11:45:50 +00:00 · 2018-11-29 08:57:33 -08:00
parent 74aca12c9c
commit be8e9c5808
1 changed files with 70 additions and 23 deletions
--- a/content/tutorials/manage-the-rippled-server/installation/capacity-planning.md
+++ b/content/tutorials/manage-the-rippled-server/installation/capacity-planning.md
@@ -17,39 +17,85 @@ To meet your `rippled` capacity requirements, you must address these technical f

 ## Configuration Settings

-Ripple recommends using these configuration guidelines to optimize performance of your `rippled` server. You can set the following parameters in the `rippled.cfg` file.
+Ripple recommends using these configuration guidelines to optimize performance of your `rippled` server.
+
+You can set the following parameters in the `rippled.cfg` file used for your `rippled` server. You can access an example configuration file, `rippled-example.cfg`, in the [`cfg` directory](https://github.com/ripple/rippled/blob/develop/cfg/rippled-example.cfg) in the `rippled` GitHub repo.


 ### Node Size

-The `node_size` parameter determines the size of database caches and how much RAM you want the `rippled` server to be able to use at once. ***TODO: Edit stated correctly? What does "use at once" mean? The amount of RAM it can use to make a single read or write call? Do we need to say "at once"? For a single rippled server, are there multiple database caches - or just a single database cache? When we talk about Node Size, we are talking about configuring memory, and not storage, correct? Just clarifying for myself bc the two seem to get a little intertwined for me in this doc. For example, when we talk about a database cache - this is stored in RAM? But when we talk about the ledger store - that is stored in "storage," is that right? What is in the database cache vs the ledger store? The available RAM configured below has nothing to do with hardware storage, correct?*** Larger database caches decrease disk I/O requirements at a cost of higher memory requirements. ***TODO: Does it matter that SSDs don't have disks? Throughout, should we just refer to "storage" instead of "disks" or "disk storage"? For example, here we would refer to "storage I/O" instead? Does it matter?*** Ripple recommends you always use the largest database cache your available memory can support. See the following table for recommended settings.
+Set the `node_size` parameter to define your `rippled` server's database cache size, as well as how much RAM you want the server to be able to use. Pick the appropriate size based on your server's expected load and available memory. ***TODO: correctly stated? The `tiny`, `huge`, etc values are setting the database cache size. When we list an Available RAM value in the table below - are we saying that this is how much RAM rippled will use? Or are we saying that if you select HUGE, you need to have at least 32GB available RAM for rippled?***
+
+The `node_size` you define determines how big of an object to treat as one "node" in the ledger's underlying tree structure. 
+
+It doesn't directly configure storage or RAM, but it affects how much RAM your `rippled` server uses and how fast it can fetch things from disk. For example, a larger node size decreases disk I/O requirements, but increases memory requirements.  ***TODO: Thank you for your patient explanations. I am so close to understanding this. The current doc says that node_size defines the  But one more question - and I'm sorry if I'm making this more complicated than it is - but by "node," we usually mean an instance of a rippled server. But in this case, are we talking about "node" as a unit of information or data to be retrieved or stored, the size of which we are defining based on this setting? The current doc says that this value sets the size of the database cache.***
+
+Ripple recommends you always use the largest database cache your available memory can support. See the following table for recommended settings.

 #### Recommendation

 | Available RAM for `rippled` | `node_size` value | Notes                      |
 |:----------------------------|:------------------|:---------------------------|
-| < 8GB                       | `tiny`            | If not specified, this is the default value. Not recommended. The delivered `rippled-example.cfg` has this value set to `medium`. ***TODO: In rippled-example.cfg, we recommend starting with the default value (tiny) and working upward, if necessary. Should we remove this "Not recommended." text here because we are recommending `tiny` as a starting value? I added the info about the delivered rippled-example.cfg file setting the configuration to `medium` so folks understand that if they don't change anything - they will get `medium`. Usually, when you hear that something is a default value, you assume that is what you'll get if you don't change anything. In this case, you'll need to change the delivered value (medium) to get the default value (tiny).*** |
-| 8GB                         | `small`           | Recommended for test servers.   |
-| 16GB                        | `medium`          |                            |
-| 32GB                        | `large` or `huge` | Recommended for production servers. ***TODO: Do we want to say anything about the difference between `large` and `huge`? Either can be set with an available RAM of 32GB -- how are they different? Later in the doc we say we recommend `huge` for production servers.*** |
+| < 8GB                       | `tiny`            | Not recommended. If not specified in `rippled.cfg`, `tiny` is the default value. |
+| 8GB                         | `small`           | Recommended for test servers. |
+| 16GB                        | `medium`          | The example `rippled-example.cfg` has its `node_size` value set to `medium`. |
+| 32GB                        | `large`           | Ripple recommends using `huge` instead. `large` uses less memory than `huge`, but increases disk access requirements, which decreases performance. ***TODO: Included this row because `large` is an option in `rippled-example.cfg` and I think we need to acknowledge it here. Is the Available RAM value correct? We say that `large` uses less memory than `huge`, but the RAM values are the same.*** |
+| 32GB                        | `huge`            | Recommended for production servers. |
+
+For `node_size` troubleshooting, see [Bad node_size value](server-wont-start.html#bad-node-size-value).


 ### Node DB Type

-The `type` field in the `[node_db]` stanza of the `rippled.cfg` file sets the type of key-value database (or key-value store) that `rippled` uses to persist the XRP Ledger in the ledger store. ***TODO: what do we mean by persisting "the XRP Ledger in the ledger store"? By calling it the XRP Ledger it makes it sound like you are storing a full-history, which is not always the case. This content doesn't explicitly mention that this configuration also applies to the database cache (memory) - should it? Sorry if this is a dumb question - but this DB selection impacts storage, as well as RAM (memory), is that right?*** You can set the value to either `RocksDB` or `NuDB`. ***TODO: Updated the example values to the values I see in rippled-example.cfg - just for consistency and faster recognition - though the config does say these are case-insensitive.***
+The `type` field in the `[node_db]` stanza of the `rippled.cfg` file sets the type of key-value store that `rippled` uses to hold ledger data on disk. This setting does not directly configure RAM settings, but the choice of key-value store has important implications for RAM usage because of the different ways these technologies cache and index data for fast lookup.

-`rippled` offers a history sharding feature that allows you to store a randomized range of ledgers in a separate shard store. You may want to configure the shard store to use a different type of key-value database than the one you defined for the ledger store using the `[node_db]` stanza. For more information about how to use this feature, see [History Sharding](https://ripple.com/build/history-sharding/#shard-store-configuration).
+You can set the value to either `RocksDB` or `NuDB`.

 #### RocksDB vs NuDB

-The default backend data store is RocksDB, which is optimized for spinning disks. ***TODO: I need to test this but by default do we mean that if you don't set a node_db value, does it use RocksDB? Or by default do we mean that this is the value set in rippled-example.cfg? Just making sure.*** RocksDB requires approximately one-third less disk storage than NuDB and provides better I/O latency. However, the better I/O latency comes as result of the large amount of RAM RocksDB requires to store data indexes. ***TODO: I edited this section to clear up some ambiguity that was confusing to me. I hope that what I've come up with is still accurate? Can we change "default backend data store" to "default key-value database type"? I just want to correlate the node_db stanza name with how we refer to it here in the doc. When we talk about "data indexes" - is this akin to the database cache? Meaning, while storage is used for the ledger store -- memory is used for the database cache (aka data indexes?)***
+In the example `rippled-example.cfg` file, the `type` field in the `[node_db]` stanza is set to `RocksDB`, which is optimized for spinning disks. RocksDB requires approximately one-third less disk storage than NuDB and provides better I/O latency. However, the better I/O latency comes as result of the large amount of RAM RocksDB requires to store data indexes. ***TODO: in this case, if you don't set a type value, rippled will not run. there is no default value set in the code.***

 NuDB, on the other hand, has nearly constant performance and memory footprint regardless of the amount of data being [stored](#storage). NuDB _requires_ a solid-state drive, but uses much less RAM than RocksDB to access a large database.

-Ripple recommends using RocksDB for validators. `rippled` servers that operate as validators should keep only a few days' worth of [historical data](#historical-data) or less. For all other uses, Ripple recommends using NuDB.
+Validators should be configured to use RocksDB and to store no more than about 300,000 ledgers (approximately two weeks' worth of [historical data](#historical-data) in the ledger store. For other production servers, Ripple recommends using NuDB, with an amount of historical data configured based on business needs. Machines with only spinning disks (not recommended) must use RocksDB.
+
+***TODO: When we talk about "data indexes" - is this akin to the database cache? Meaning, while storage is used for the ledger store -- memory is used for the database cache (aka data indexes?)***

 ***TODO: Based on my confusion above about the difference between storage and memory -- it would be great to standardize how we talk about what is being stored. Here we call what we are storing a "few days' worth of data", as well as something called the "ledger store". In the rippled.cfg, we refer to the "persistent datastore for rippled." In other areas of this doc and inputs to this doc, we talk about "data indexes," "persisting the XRP Ledger in the ledger store", a "database cache," and a "backend data store." Perhaps when we are talking about storage we can talk about the ledger store (or a few days' worth of ledger store data) and when we are talking about memory we can talk about the database cache? Any thoughts?***

+***TODO: from rome:
+
+I'm not entirely sure what you're advocating regarding the terminology, but I suspect your confusion may come from trying to separate the the persistent stores (on disk) from the transient storage (in RAM) that they require for indexes and cache. "Persist" is specifically referring to storing data for the long term (that is, on disk). The ledger store handles persisting, caching, and indexing the data to whatever extent the chosen tech is designed to do. To a large extent, the amount of RAM you need is directly correlated to how much you have stored on disk. (The size of your phone book depends on how many people are listed.)
+
+I agree that there are a lot of almost interchangeable terms in use here like "ledger store", "key-value store", "node backend", "ledger database", etc. but I think untangling them and reducing the number of unique terms used may be beyond our control since those terms are also used as config options and API methods, and the names used in those places are really not that distinct or consistent. I'll try to give my breakdown of terms though (my preferred terms in bold):
+
+ledger store / node store / node DB / ledger DB - The (mandatory) thing that stores current and historical ledgers (well, their contents) in a continuous history as they're produced.
+
+shard store / shard DB - A place to store chunks of historical ledgers
+
+key-value store / node backend / key-value database - The technology that actually stores the data in the ledger store or shard store. The two current choices are RocksDB and NuDB. (In the past, rippled supported other storage technologies too.)***
+
+***TODO: from ryan: Saying "ledgers from the last few days" would probably add clarity.
+
+I think the basic breakdown is:
+
+"ledger store" -- Stores hashes of ledgers
+"persistent datastore for rippled" -- Data that's stored for a medium to long-term period. As opposed to ledger storage, which can be deleted periodically in the short-term (depending on use-cases)
+"data indexes" -- How RocksDB stores hashes
+"database cache" -- same thing as ledger store?
+"backend data store" -- same thing as ledger store?
+Perhaps when we are talking about storage we can talk about the ledger store (or a few days' worth of ledger store data) and when we are talking about memory we can talk about the database cache? Any thoughts?
+
+I think this makes sense. Instead of database cache, we might want to say "key-value store", but I might be confused as to the difference there.***
+
+***TODO: from rome: Database cache and indexes are different things. Both are involved in fetching records quickly.
+
+Index: A list of all the records you have, and possibly some details about them. For example, if you frequently have to look up people with a specific hair color, you might keep an index of people with brown hair in your yellow pages, so you can do that without calling every person in the book and asking them for their hair color. The problem is, the more indexes you keep, the more huge yellow books (indexes) you have occupying your office (RAM). Depending on how many indexes of how many records you keep, at some point you may have so many books in your office that you can't even fit enough many clients in your office to hold actual meetings. That's what happens if you use RocksDB with a big database and a small amount of RAM.
+
+Cache: A convenient place where you keep a subset of things you expect to access soon. There are many different kinds of caches but for these key-value stores, it's most likely stored in RAM. So cache may contain entire records/blocks/objects/whatever that you expect to need to fetch soon, and is probably an order of magnitude faster than going all the way to disk to fetch those things. The trick is correctly predicting which things you're going to access again soon—stuff you saw recently, stuff that's sequentially after or just nearby stuff you recently looked up, etc. Going back to our yellow pages analogy, cache is almost like a waiting room space right next to your office. Cache also takes up space (as I mentioned, probably in RAM), so you don't want to dedicate so many square feet to the waiting room that you can't actually invite many people into the office proper.
+
+RocksDB has a pretty thorough and efficient system for managing both of them. NuDB takes a different approach entirely: it says, "look, we have fast elevators and everyone lives pretty close to my office, so I'll just call them as I need them and keep as much office space as I can for actual work". If your elevators are actually slow (spinning disks) then you're going to spend a lot of time waiting around for people to show up in your office.***
+
 RocksDB has performance-related configuration options you can modify to achieve maximum transaction processing throughput. (NuDB does not have performance-related configuration options.) Here is an example of the recommended configuration for a `rippled` server using RocksDB:

 ```
@@ -63,11 +109,16 @@ cache_mb=512
 path={path_to_ledger_store}
 ```

+#### History Sharding
+
+`rippled` offers a history sharding feature that allows you to store a randomized range of ledgers in a separate shard store. You can use the `[shard_db]` stanza to configure the shard store to use a different type of key-value store than the one you defined for the ledger store using the `[node_db]` stanza. For more information about how to use this feature, see [History Sharding](history-sharding.html).
+
+
 ### Historical Data

 The amount of historical data that a `rippled` server keeps online is a major contributor to required storage space. At the time of writing (2018-10-29), a `rippled` server stores about 12GB of data per day and requires 8.4TB to store the full history of the XRP Ledger. You can expect this amount to grow as transaction volume increases across the XRP Ledger network. You can control how much data you keep with the `online_delete` and `advisory_delete` fields.

-Online deletion enables pruning of `rippled` ledgers from databases without any disruption of service. It removes only records that are not part of the current ledgers. Without online deletion, those databases grow without bounds. Freeing disk space requires stopping the process and manually removing database files. For more information, see [`[node_db]`: `online_delete`](https://github.com/ripple/rippled/blob/develop/cfg/rippled-example.cfg#L832).
+Online deletion enables purging of `rippled` ledgers from databases without any disruption of service. It removes only records that are not part of the current ledgers. ***TODO: what do we mean by "current ledgers"?*** Without online deletion, those databases grow without bounds. Freeing disk space requires stopping the process and manually removing database files. For more information, see [`[node_db]`: `online_delete`](https://github.com/ripple/rippled/blob/develop/cfg/rippled-example.cfg#L832).

 <!-- {# ***TODO***: Add link to online_delete section, when complete, per https://ripplelabs.atlassian.net/browse/DOC-1313  #} -->

@@ -81,7 +132,7 @@ The default `rippled.cfg` file sets the logging verbosity to `warning` in the `[

 ## Network and Hardware

-Each `rippled` server in the XRP Ledger network performs all of the transaction processing work of the network. ***TODO: Is this true? What do we mean by "transaction processing" -- do we mean validation? Do we mean storing transaction history? I ask because I don't think "each rippled server in the XRP Ledger network" participates in the same way. Depending on the answer - aren't there some servers on the network that don't participate in "transaction processing"?*** It is unknown when volumes will approach maximum network capacity. ***TODO: This is true - but do we need to say it?*** Therefore, the baseline hardware for production `rippled` servers should be similar to that used in Ripple's [performance testing](https://ripple.com/dev-blog/demonstrably-scalable-blockchain/). ***TODO: Not sure about the word "Therefore" here - it's saying that because we don't know when volumes will approach max network capacity - you could use this configuration. I'm not sure if that make sense. I may be misunderstanding what we mean.***
+Each `rippled` server in the XRP Ledger network performs all of the transaction processing work of the network. Therefore, the baseline hardware for production `rippled` servers should be similar to that used in Ripple's [performance testing](https://ripple.com/dev-blog/demonstrably-scalable-blockchain/).


 ### Recommendation
@@ -90,17 +141,15 @@ For best performance in enterprise production environments, Ripple recommends ru

 - Operating System: Ubuntu 16.04+
 - CPU: Intel Xeon 3+ GHz processor with 4 cores and hyperthreading enabled
- Disk: SSD ***TODO: instead of "Disk" - refer to "Storage" instead?***
+- Disk: SSD (7000+ writes/second, 10,000+ reads/second)
 - RAM:
 	- For testing: 8GB+
 	- For production: 32GB
 - Network: Enterprise data center network with a gigabit network interface on the host

-***TODO: reordered the sections below to match the order of the bullets above, just for ease of tracking for the writer and the reader.***
-
 #### CPU Utilization and Virtualization

-Ripple performance engineering has determined that bare metal servers achieve maximum throughput. However, it is likely that hypervisors cause minimal degradation in performance. ***TODO: What are we saying about hypervisors here? Are we saying that you can run `rippled` on a virtual machine managed by a hypervisor with minimal degradation?***
+You'll get the best performance on bare metal, but virtual machines can perform nearly as well as long as the host hardware has high enough specs.

 #### Storage

@@ -113,19 +162,17 @@ SSD storage should support several thousand of both read and write IOPS. The max

 ##### Amazon Web Services

-***TODO: I moved this to the Storage section so you see this info about AWS in the most relevant context. Is this okay?***
+Amazon Web Services (AWS) is a popular virtualized hosting environment. You can run `rippled` in AWS, but Ripple does not recommend using Elastic Block Storage (EBS). Elastic Block Storage's maximum number of IOPS (5,000) is insufficient for `rippled`'s heaviest loads, despite being very expensive.

-Amazon Web Services (AWS) is a popular virtualized hosting environment. You can run `rippled` in AWS, but Ripple does not recommend using Elastic Block Storage (EBS). ***TODO: When we say "run rippled in AWS" - do we mean AWS EC2?***  Elastic Block Storage's maximum number of IOPS (5,000) is insufficient for `rippled`'s heaviest loads, despite being very expensive.
+AWS instance stores (`ephemeral` storage) do not have these constraints. Therefore, Ripple recommends deploying `rippled` servers with host types such as `M3` that have instance storage. The `database_path` and `node_db` path should each reside on instance storage.

-AWS instance stores (`ephemeral` storage) do not have these constraints. ***TODO: AWS EC2 instance stores?*** Therefore, Ripple recommends deploying `rippled` servers with host types such as `M3` that have instance storage. The `database_path` and `node_db` path should each reside on instance storage.
-
-**Caution:** AWS instance storage is not guaranteed to provide durability in the event of hard drive failure. Further, data that is lost when the instance stops and restarts (but not when just rebooted). This loss can be acceptable for a `rippled` server because an individual server can usually re-acquire that data from its peer servers.
+**Caution:** AWS instance storage is not guaranteed to provide durability in the event of hard drive failure. You also lose data when you stop/start or reboot the instance. The latter type of data loss can be acceptable for a `rippled` server because an individual server can usually re-acquire the lost data from its peer servers.

 #### RAM/Memory

-Memory requirements are mainly a function of the `node_size` configuration setting and the amount of client traffic retrieving historical data. As mentioned, production servers should maximize performance and set this parameter to `huge`. ***TODO: In Node Size - Recommendation section above, we recommend `large` or `huge` for production. Here we say that production servers should use huge.***
+Memory requirements are mainly a function of the `node_size` configuration setting and the amount of client traffic retrieving historical data. As mentioned, production servers should maximize performance and set this parameter to `huge`.

-You can set the `node_size` parameter lower to use less memory, but you should only do this for testing. With a `node_size` of `medium`, a `rippled` server can be reasonably stable in a test Linux system with as little as 8GB of RAM. ***TODO: In the Node Size - Recommendation section above, we match a node size of `medium` with 16GB of RAM. Here we match `medium` with 8GB of RAM. But the text here seems to be suggesting something more like a node size of `low` with 16GB of RAM, where you have more memory, but are using the node_size to use less memory than you have. How can we resolve what seem to me to be inconsistencies?***
+You can set the `node_size` parameter lower to use less memory, but you should only do this for testing. With a `node_size` of `medium`, a `rippled` server can be reasonably stable in a test Linux system with 16GB of RAM.

 #### Network