diff --git a/content/tutorials/manage-the-rippled-server/troubleshooting/diagnosing-problems.md b/content/tutorials/manage-the-rippled-server/troubleshooting/diagnosing-problems.md index 46fff9d274..f49183809a 100644 --- a/content/tutorials/manage-the-rippled-server/troubleshooting/diagnosing-problems.md +++ b/content/tutorials/manage-the-rippled-server/troubleshooting/diagnosing-problems.md @@ -14,14 +14,17 @@ You can use the commandline to get server status information from the local `rip rippled server_info ``` -The response to this command has a lot of information, which is documented along with the [server info method][]. +The response to this command has a lot of information, which is documented along with the [server_info method][]. For troubleshooting purposes, the most important fields are (from most commonly used to least): -- **`server_state`** - Most of the time, this field should show either `full` or `proposing` depending on whether it is [configured as a validator](run-rippled-as-a-validator.html). The value `connected` means that the server can communicate with the rest of the peer-to-peer network, but it does not yet have enough data to track progress of the shared ledger state. Normally, syncing to the state of the rest of the ledger takes about 5-15 minutes. - - If your server remains in the `connected` state for hours after starting, or returns to the `connected` state after being in the `full` or `proposing` states, that usually indicates that your server cannot keep up with the rest of the network. The most common bottlenecks are disk I/O and network bandwidth. +- **`server_state`** - Most of the time, this field should show either `full` or `proposing` depending on whether it is [configured as a validator](run-rippled-as-a-validator.html). The value `connected` means that the server can communicate with the rest of the peer-to-peer network, but it does not yet have enough data to track progress of the shared ledger state. Normally, syncing to the state of the rest of the ledger takes about 5-15 minutes after starting. + + - If your server remains in the `connected` state for hours, or returns to the `connected` state after being in the `full` or `proposing` states, that usually indicates that your server cannot keep up with the rest of the network. The most common bottlenecks are disk I/O and network bandwidth. - **`complete_ledgers`** - This field shows which [ledger indexes](basic-data-types.html#ledger-index) your server has complete ledger data for. Healthy servers usually have a single range of recent ledgers, such as `"12133424-12133858"`. + - If you have a disjoint set of complete ledgers such as `"11845721-12133420,12133424-12133858"`, that could indicate that your server has had intermittent outages or has temporarily fallen out of sync with the rest of the network. The most common causes for this are insufficient disk I/O or network bandwidth. + - Normally, a `rippled` server downloads recent ledger history from its peers. If gaps in your ledger history persist for more than a few hours, you may not be connected to any peers who have the missing data. If this occurs, you can force your server to try and peer with one of Ripple's full-history public servers by adding the following stanza to your config file and restarting: [ips_fixed] @@ -30,6 +33,7 @@ For troubleshooting purposes, the most important fields are (from most commonly - **`amendment_blocked`** - This field is normally omitted from the `server_info` response. If this field appears with the value `true`, then the network has approved an [amendment](amendments.html) for which your server doesn't have an implementation. Most likely, you can fix this by [updating rippled](update-rippled.html) to the latest version. You can also use the [feature method][] to see what amendment IDs are currently enabled and which one(s) your server does and does not support. - **`peers`** - This field indicates how many other servers in the XRP Ledger peer-to-peer network your server is connected to. Healthy servers typically show between 5 and 50 peers, unless explicitly configured to connect only to certain peers. + - If you have 0 peers, your server may be unable to contact the network, or your system clock may be wrong. (Ripple recommends running an [NTP](http://www.ntp.org/) daemon on all servers to keep their clocks synced.) ### No Response from Server @@ -48,8 +52,8 @@ The following message indicates that the `rippled` executable wasn't able to con This generally indicates one of several problems: - The `rippled` server is just starting up, or is not running at all. Check the status of the service; if it is running, wait a few seconds and try again. -- You need to pass different parameters to the `rippled` commandline client to connect to your server. -- The `rippled` server is configured not to accept JSON-RPC connections. +- You may need to pass different [parameters to the `rippled` commandline client](commandline-usage.html#client-mode-options) to connect to your server. +- The `rippled` server may not be configured not to accept JSON-RPC connections. ## Check the server log @@ -58,9 +62,13 @@ While running, `rippled` servers write information to a debug log. The location You can control the verbosity of the debug log with the [log_level method][]. The default config file sets the `log_level` to severity "warning" for all categories of log messages. (See the `[rpc_startup]` stanza of the config file for settings.) +It is normal for a `rippled` the server to print many warning-level (`WRN`) messages during startup and a few warning-level messages from time to time later on. You can **safely ignore** most warnings in the first 5 to 15 minutes of server startup. + +The following sections describe some of the most common types of log messages and how to interpret them. + ### Crashes -Messages in the log that indicate runtime errors can indicate that the server crashed. These messages usually start with a message such as one of the following examples: +Messages in the log that mention runtime errors can indicate that the server crashed. These messages usually start with a message such as one of the following examples: ``` Throw @@ -74,27 +82,42 @@ If your server always crashes on startup, see [Server Won't Start](server-wont-s If your server crashes randomly during operation or as a result of particular commands, make sure you are [updated](updating-rippled.html) to the latest `rippled` version. If you are on the latest version and your server is still crashing, check the following: -- Is your server running out of memory? On some systems, `rippled` may be terminated by the Out Of Memory (OOM) Killer or another monitor process -- If your server is running in a shared environment, are other users or administrators causing the machine or service to be restarted? +- Is your server running out of memory? On some systems, `rippled` may be terminated by the Out Of Memory (OOM) Killer or another monitor process. +- If your server is running in a shared environment, are other users or administrators causing the machine or service to be restarted? For example, some hosted providers automatically kill any service that uses a large amount of a shared machine's resources for an extended period of time. - Does your server meet the [minimum requirements](install-rippled.html#minimum-system-requirements) to run `rippled`? What about the [recommendations for production servers](capacity-planning.html#recommendation-1)? If none of the above apply, please report the issue to Ripple as a security-sensitive bug. If Ripple can reproduce the crash, you may be eligible for a bounty. See for details. -### Benign Warnings +### Connection reset by peer -During server startup, it is normal for the server to print many warning-level (`WRN`) messages. During server operation, it is normal to print warning-level messages occasionally. - -You can **safely ignore** messages such as the following: +The following log message indicates that a peer `rippled` server closed a connection: ```text 2018-Aug-28 22:55:41.738765510 Peer:WRN [012] onReadMessage: Connection reset by peer -2018-Aug-28 22:55:58.316094260 Validations:WRN Val for 2137ACEFC0D137EFA1D84C2524A39032802E4B74F93C130A289CD87C9C565011 trusted/full from nHUeUNSn3zce2xQZWNghQvd9WRH6FWEnCBKYVJu2vAizMxnXegfJ signing key n9KcRZYHLU9rhGVwB9e4wEMYsxXvUfgFxtmX25pc1QPNgweqzQf5 already validated sequence at or past 12133663 src=1 +``` + +Losing connections from time to time is normal for any peer-to-peer network. **Occasional messages of this kind do not indicate a problem.** + +A large number of these messages around the same time may indicate a problem, such as: + +- Your internet connection to one or more specific peers was cut off +- Your server may have been overloading the peer with requests, causing it to drop your server + + +### No hash for fetch pack + +```text 2018-Aug-28 22:56:21.397076850 LedgerMaster:ERR No hash for fetch pack. Missing Index 7159808 -2018-Aug-28 22:56:22.256065549 Validations:WRN Unable to determine hash of ancestor seq=3 from ledger hash=00B1E512EF558F2FD9A0A6C263B3D922297F26A55AEB56A009341A22895B516E seq=12133675 -2018-Aug-28 22:56:22.368460130 LedgerConsensus:WRN View of consensus changed during open status=open, mode=proposing -2018-Aug-28 22:56:22.368468202 LedgerConsensus:WRN 96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661 to 00B1E512EF558F2FD9A0A6C263B3D922297F26A55AEB56A009341A22895B516E -2018-Aug-28 22:56:22.368499966 LedgerConsensus:WRN {"accepted":true,"account_hash":"89A821400087101F1BF2D2B912C6A9F2788CC715590E8FA5710F2D10BF5E3C03","close_flags":0,"close_time":588812130,"close_time_human":"2018-Aug-28 22:55:30.000000000","close_time_resolution":30,"closed":true,"hash":"96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661","ledger_hash":"96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661","ledger_index":"3","parent_close_time":588812070,"parent_hash":"5F5CB224644F080BC8E1CC10E126D62E9D7F9BE1C64AD0565881E99E3F64688A","seqNum":"3","totalCoins":"100000000000000000","total_coins":"100000000000000000","transaction_hash":"0000000000000000000000000000000000000000000000000000000000000000"} +``` + +***TODO: how serious is this?*** https://github.com/ripple/rippled/blob/8a02903fa5eda4daa10972800d2598b9542b02d2/src/ripple/app/ledger/impl/LedgerMaster.cpp#L526 + +(If not serious, may be worth dropping down one or two severity levels.) + +### LoadMonitor Job + +```text 2018-Aug-28 22:56:36.180827973 LoadMonitor:WRN Job: gotFetchPack run: 11566ms wait: 0ms 2018-Aug-28 22:56:36.180970431 LoadMonitor:WRN Job: processLedgerData run: 0ms wait: 11566ms 2018-Aug-28 22:56:36.181053831 LoadMonitor:WRN Job: AcquisitionDone run: 0ms wait: 11566ms @@ -102,7 +125,73 @@ You can **safely ignore** messages such as the following: 2018-Aug-28 22:56:36.181169931 LoadMonitor:WRN Job: AcquisitionDone run: 0ms wait: 11566ms ``` -***(TODO: waiting for the C++ team to verify that all of the above messages are indeed benign.)*** +***TODO: how serious is this?*** + + +### View of consensus changed during open + +Log messages such as the following occur when a server is not in sync with the rest of the network: + +```text +2018-Aug-28 22:56:22.368460130 LedgerConsensus:WRN View of consensus changed during open status=open, mode=proposing +2018-Aug-28 22:56:22.368468202 LedgerConsensus:WRN 96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661 to 00B1E512EF558F2FD9A0A6C263B3D922297F26A55AEB56A009341A22895B516E +2018-Aug-28 22:56:22.368499966 LedgerConsensus:WRN {"accepted":true,"account_hash":"89A821400087101F1BF2D2B912C6A9F2788CC715590E8FA5710F2D10BF5E3C03","close_flags":0,"close_time":588812130,"close_time_human":"2018-Aug-28 22:55:30.000000000","close_time_resolution":30,"closed":true,"hash":"96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661","ledger_hash":"96A8DF9ECF5E9D087BAE9DDDE38C197D3C1C6FB842C7BB770F8929E56CC71661","ledger_index":"3","parent_close_time":588812070,"parent_hash":"5F5CB224644F080BC8E1CC10E126D62E9D7F9BE1C64AD0565881E99E3F64688A","seqNum":"3","totalCoins":"100000000000000000","total_coins":"100000000000000000","transaction_hash":"0000000000000000000000000000000000000000000000000000000000000000"} +``` + +During the first 5 to 15 minutes after the server starts up, it is normal for it to be out of sync with the rest of the network and print messages such as these. If the server writes these messages long after starting up, it could indicate a problem. Common causes include unreliable network connections and insufficient hardware specs. + + +### Already validated sequence at or past + +Log messages such as the following indicate that a server received validations for different ledger sequences out of order. + +```text +2018-Aug-28 22:55:58.316094260 Validations:WRN Val for 2137ACEFC0D137EFA1D84C2524A39032802E4B74F93C130A289CD87C9C565011 trusted/full from nHUeUNSn3zce2xQZWNghQvd9WRH6FWEnCBKYVJu2vAizMxnXegfJ signing key n9KcRZYHLU9rhGVwB9e4wEMYsxXvUfgFxtmX25pc1QPNgweqzQf5 already validated sequence at or past 12133663 src=1 +``` + +Occasional messages of this type do not usually indicate a problem. If this type of message occurs frequently with the same sending validator, it could indicate a problem, including any of the following (roughly in order of most to least likely): + +- The server writing the message is having network issues +- The validator described in the message is having network issues +- The validator described in the message is behaving maliciously + + +### Unable to determine hash of ancestor + +Log messages such as the following occur when the server sees a validation message from a peer and it does not know the parent ledger version that server is building on. This is normal when a server is syncing to the network. + +```text +2018-Aug-28 22:56:22.256065549 Validations:WRN Unable to determine hash of ancestor seq=3 from ledger hash=00B1E512EF558F2FD9A0A6C263B3D922297F26A55AEB56A009341A22895B516E seq=12133675 +``` + +If this message occurs frequently outside of the first 5 to 15 minutes after starting the server, it could indicate a problem. + + +### InboundLedger Want hash + +Log messages such as the following indicate that the server is requesting ledger data from other servers: + +```text +InboundLedger:WRN Want: 5AE53B5E39E6388DBACD0959E5F5A0FCAF0E0DCBA45D9AB15120E8CDD21E019B +``` + +This is normal if your server is syncing, backfilling, or downloading [history shards](history-sharding.html). + + +### InboundLedger 11 timeouts for ledger + +```text +InboundLedger:WRN 11 timeouts for ledger 8265938 +``` + +This indicates that your server is having trouble requesting specific ledger data from its peers. If the [ledger index](basic-data-types.html#ledger-index) is much lower than the most recent validated ledger's index as reported by the [server_info method][], this probably indicates that your server is downloading a [history shard](history-sharding.html). + +This is not strictly a problem, but if you want to acquire ledger history faster, you can configure `rippled` to connect to peers with full history by adding or editing the `[ips_fixed]` config stanza and restarting the server. For example, to always try to connect to one of Ripple's full-history servers: + +``` +[ips_fixed] +s2.ripple.com 51235 +``` diff --git a/dactyl-config.yml b/dactyl-config.yml index fb458e585f..4564fd98b7 100644 --- a/dactyl-config.yml +++ b/dactyl-config.yml @@ -962,6 +962,7 @@ pages: doc_type: Tutorials category: Manage the rippled Server subcategory: Troubleshooting rippled + blurb: Collect information to identify the cause of problems. targets: - local @@ -971,6 +972,7 @@ pages: doc_type: Tutorials category: Manage the rippled Server subcategory: Troubleshooting rippled + blurb: A collection of problems that would cause a rippled server not to start, and how to fix them. targets: - local