mirror of
https://github.com/XRPLF/clio.git
synced 2025-12-06 17:27:58 +00:00
109 lines
5.5 KiB
Markdown
109 lines
5.5 KiB
Markdown
Reporting mode is a special operating mode of rippled, designed to handle RPCs
|
|
for validated data. A server running in reporting mode does not connect to the
|
|
p2p network, but rather extracts validated data from a node that is connected
|
|
to the p2p network. To run rippled in reporting mode, you must also run a
|
|
separate rippled node in p2p mode, to use as an ETL source. Multiple reporting
|
|
nodes can share access to the same network accessible databases (Postgres and
|
|
Cassandra); at any given time, only one reporting node will be performing ETL
|
|
and writing to the databases, while the others simply read from the databases.
|
|
A server running in reporting mode will forward any requests that require access
|
|
to the p2p network to a p2p node.
|
|
|
|
# Reporting ETL
|
|
A single reporting node has one or more ETL sources, specified in the config
|
|
file. A reporting node will subscribe to the "ledgers" stream of each of the ETL
|
|
sources. This stream sends a message whenever a new ledger is validated. Upon
|
|
receiving a message on the stream, reporting will then fetch the data associated
|
|
with the newly validated ledger from one of the ETL sources. The fetch is
|
|
performed via a gRPC request ("GetLedger"). This request returns the ledger
|
|
header, transactions+metadata blobs, and every ledger object
|
|
added/modified/deleted as part of this ledger. ETL then writes all of this data
|
|
to the databases, and moves on to the next ledger. ETL does not apply
|
|
transactions, but rather extracts the already computed results of those
|
|
transactions (all of the added/modified/deleted SHAMap leaf nodes of the state
|
|
tree). The new SHAMap inner nodes are computed by the ETL writer; this computation mainly
|
|
involves manipulating child pointers and recomputing hashes, logic which is
|
|
buried inside of SHAMap.
|
|
|
|
If the database is entirely empty, ETL must download an entire ledger in full
|
|
(as opposed to just the diff, as described above). This download is done via the
|
|
"GetLedgerData" gRPC request. "GetLedgerData" allows clients to page through an
|
|
entire ledger over several RPC calls. ETL will page through an entire ledger,
|
|
and write each object to the database.
|
|
|
|
If the database is not empty, the reporting node will first come up in a "soft"
|
|
read-only mode. In read-only mode, the server does not perform ETL and simply
|
|
publishes new ledgers as they are written to the database.
|
|
If the database is not updated within a certain time period
|
|
(currently hard coded at 20 seconds), the reporting node will begin the ETL
|
|
process and start writing to the database. Postgres will report an error when
|
|
trying to write a record with a key that already exists. ETL uses this error to
|
|
determine that another process is writing to the database, and subsequently
|
|
falls back to a soft read-only mode. Reporting nodes can also operate in strict
|
|
read-only mode, in which case they will never write to the database.
|
|
|
|
# Database Nuances
|
|
The database schema for reporting mode does not allow any history gaps.
|
|
Attempting to write a ledger to a non-empty database where the previous ledger
|
|
does not exist will return an error.
|
|
|
|
The databases must be set up prior to running reporting mode. This requires
|
|
creating the Postgres database, and setting up the Cassandra keyspace. Reporting
|
|
mode will create the objects table in Cassandra if the table does not yet exist.
|
|
|
|
Creating the Postgres database:
|
|
```
|
|
$ psql -h [host] -U [user]
|
|
postgres=# create database [database];
|
|
```
|
|
Creating the keyspace:
|
|
```
|
|
$ cqlsh [host] [port]
|
|
> CREATE KEYSPACE rippled WITH REPLICATION =
|
|
{'class' : 'SimpleStrategy', 'replication_factor' : 3 };
|
|
```
|
|
A replication factor of 3 is recommended. However, when running locally, only a
|
|
replication factor of 1 is supported.
|
|
|
|
Online delete is not supported by reporting mode and must be done manually. The
|
|
easiest way to do this would be to setup a second Cassandra keyspace and
|
|
Postgres database, bring up a single reporting mode instance that uses those
|
|
databases, and start ETL at a ledger of your choosing (via --startReporting on
|
|
the command line). Once this node is caught up, the other databases can be
|
|
deleted.
|
|
|
|
To delete:
|
|
```
|
|
$ psql -h [host] -U [user] -d [database]
|
|
reporting=$ truncate table ledgers cascade;
|
|
```
|
|
```
|
|
$ cqlsh [host] [port]
|
|
> truncate table objects;
|
|
```
|
|
# Proxy
|
|
RPCs that require access to the p2p network and/or the open ledger are forwarded
|
|
from the reporting node to one of the ETL sources. The request is not processed
|
|
prior to forwarding, and the response is delivered as-is to the client.
|
|
Reporting will forward any requests that always require p2p/open ledger access
|
|
(fee and submit, for instance). In addition, any request that explicitly
|
|
requests data from the open or closed ledger (via setting
|
|
"ledger_index":"current" or "ledger_index":"closed"), will be forwarded to a
|
|
p2p node.
|
|
|
|
For the stream "transactions_proposed" (AKA "rt_transactions"), reporting
|
|
subscribes to the "transactions_proposed" streams of each ETL source, and then
|
|
forwards those messages to any clients subscribed to the same stream on the
|
|
reporting node. A reporting node will subscribe to the stream on each ETL
|
|
source, but will only forward the messages from one of the streams at any given
|
|
time (to avoid sending the same message more than once to the same client).
|
|
|
|
# API changes
|
|
A reporting node defaults to only returning validated data. If a ledger is not
|
|
specified, the most recently validated ledger is used. This is in contrast to
|
|
the normal rippled behavior, where the open ledger is used by default.
|
|
|
|
Reporting will reject all subscribe requests for streams "server", "manifests",
|
|
"validations", "peer_status" and "consensus".
|
|
|