mirror of
https://github.com/XRPLF/clio.git
synced 2025-11-19 11:15:50 +00:00
Update backend README
This commit is contained in:
@@ -1,108 +1,26 @@
|
||||
Reporting mode is a special operating mode of rippled, designed to handle RPCs
|
||||
for validated data. A server running in reporting mode does not connect to the
|
||||
p2p network, but rather extracts validated data from a node that is connected
|
||||
to the p2p network. To run rippled in reporting mode, you must also run a
|
||||
separate rippled node in p2p mode, to use as an ETL source. Multiple reporting
|
||||
nodes can share access to the same network accessible databases (Postgres and
|
||||
Cassandra); at any given time, only one reporting node will be performing ETL
|
||||
and writing to the databases, while the others simply read from the databases.
|
||||
A server running in reporting mode will forward any requests that require access
|
||||
to the p2p network to a p2p node.
|
||||
The backend is clio's view into the database. The database could be either PostgreSQL or Cassandra.
|
||||
Multiple clio servers can share access to the same database.
|
||||
|
||||
# Reporting ETL
|
||||
A single reporting node has one or more ETL sources, specified in the config
|
||||
file. A reporting node will subscribe to the "ledgers" stream of each of the ETL
|
||||
sources. This stream sends a message whenever a new ledger is validated. Upon
|
||||
receiving a message on the stream, reporting will then fetch the data associated
|
||||
with the newly validated ledger from one of the ETL sources. The fetch is
|
||||
performed via a gRPC request ("GetLedger"). This request returns the ledger
|
||||
header, transactions+metadata blobs, and every ledger object
|
||||
added/modified/deleted as part of this ledger. ETL then writes all of this data
|
||||
to the databases, and moves on to the next ledger. ETL does not apply
|
||||
transactions, but rather extracts the already computed results of those
|
||||
transactions (all of the added/modified/deleted SHAMap leaf nodes of the state
|
||||
tree). The new SHAMap inner nodes are computed by the ETL writer; this computation mainly
|
||||
involves manipulating child pointers and recomputing hashes, logic which is
|
||||
buried inside of SHAMap.
|
||||
`BackendInterface`, and it's derived classes, store very little state. The read methods go directly to the database,
|
||||
and generally don't access any internal data structures. Nearly all of the methods are const.
|
||||
|
||||
If the database is entirely empty, ETL must download an entire ledger in full
|
||||
(as opposed to just the diff, as described above). This download is done via the
|
||||
"GetLedgerData" gRPC request. "GetLedgerData" allows clients to page through an
|
||||
entire ledger over several RPC calls. ETL will page through an entire ledger,
|
||||
and write each object to the database.
|
||||
The data model used by clio is called the flat map data model. The flat map data model does not store any
|
||||
SHAMap inner nodes, and instead only stores the raw ledger objects contained in the leaf node. Ledger objects
|
||||
are stored in the database with a compound key of `(object_id, ledger_sequence)`, where `ledger_sequence` is the
|
||||
ledger in which the object was created or modified. Objects are then fetched using an inequality operation,
|
||||
such as `SELECT * FROM objects WHERE object_id = id AND ledger_sequence <= seq`, where `seq` is the ledger
|
||||
in which we are trying to look up the object. When an object is deleted, we write an empty blob.
|
||||
|
||||
If the database is not empty, the reporting node will first come up in a "soft"
|
||||
read-only mode. In read-only mode, the server does not perform ETL and simply
|
||||
publishes new ledgers as they are written to the database.
|
||||
If the database is not updated within a certain time period
|
||||
(currently hard coded at 20 seconds), the reporting node will begin the ETL
|
||||
process and start writing to the database. Postgres will report an error when
|
||||
trying to write a record with a key that already exists. ETL uses this error to
|
||||
determine that another process is writing to the database, and subsequently
|
||||
falls back to a soft read-only mode. Reporting nodes can also operate in strict
|
||||
read-only mode, in which case they will never write to the database.
|
||||
Transactions are stored in a separate table, where the key is the hash.
|
||||
|
||||
# Database Nuances
|
||||
The database schema for reporting mode does not allow any history gaps.
|
||||
Attempting to write a ledger to a non-empty database where the previous ledger
|
||||
does not exist will return an error.
|
||||
Ledger headers are stored in their own table.
|
||||
|
||||
The databases must be set up prior to running reporting mode. This requires
|
||||
creating the Postgres database, and setting up the Cassandra keyspace. Reporting
|
||||
mode will create the objects table in Cassandra if the table does not yet exist.
|
||||
The account_tx table maps accounts to a list of transactions that affect them.
|
||||
|
||||
Creating the Postgres database:
|
||||
```
|
||||
$ psql -h [host] -U [user]
|
||||
postgres=# create database [database];
|
||||
```
|
||||
Creating the keyspace:
|
||||
```
|
||||
$ cqlsh [host] [port]
|
||||
> CREATE KEYSPACE rippled WITH REPLICATION =
|
||||
{'class' : 'SimpleStrategy', 'replication_factor' : 3 };
|
||||
```
|
||||
A replication factor of 3 is recommended. However, when running locally, only a
|
||||
replication factor of 1 is supported.
|
||||
|
||||
Online delete is not supported by reporting mode and must be done manually. The
|
||||
easiest way to do this would be to setup a second Cassandra keyspace and
|
||||
Postgres database, bring up a single reporting mode instance that uses those
|
||||
databases, and start ETL at a ledger of your choosing (via --startReporting on
|
||||
the command line). Once this node is caught up, the other databases can be
|
||||
deleted.
|
||||
|
||||
To delete:
|
||||
```
|
||||
$ psql -h [host] -U [user] -d [database]
|
||||
reporting=$ truncate table ledgers cascade;
|
||||
```
|
||||
```
|
||||
$ cqlsh [host] [port]
|
||||
> truncate table objects;
|
||||
```
|
||||
# Proxy
|
||||
RPCs that require access to the p2p network and/or the open ledger are forwarded
|
||||
from the reporting node to one of the ETL sources. The request is not processed
|
||||
prior to forwarding, and the response is delivered as-is to the client.
|
||||
Reporting will forward any requests that always require p2p/open ledger access
|
||||
(fee and submit, for instance). In addition, any request that explicitly
|
||||
requests data from the open or closed ledger (via setting
|
||||
"ledger_index":"current" or "ledger_index":"closed"), will be forwarded to a
|
||||
p2p node.
|
||||
|
||||
For the stream "transactions_proposed" (AKA "rt_transactions"), reporting
|
||||
subscribes to the "transactions_proposed" streams of each ETL source, and then
|
||||
forwards those messages to any clients subscribed to the same stream on the
|
||||
reporting node. A reporting node will subscribe to the stream on each ETL
|
||||
source, but will only forward the messages from one of the streams at any given
|
||||
time (to avoid sending the same message more than once to the same client).
|
||||
|
||||
# API changes
|
||||
A reporting node defaults to only returning validated data. If a ledger is not
|
||||
specified, the most recently validated ledger is used. This is in contrast to
|
||||
the normal rippled behavior, where the open ledger is used by default.
|
||||
|
||||
Reporting will reject all subscribe requests for streams "server", "manifests",
|
||||
"validations", "peer_status" and "consensus".
|
||||
### Backend Indexer
|
||||
|
||||
With the elimination of SHAMap inner nodes, iterating across a ledger becomes difficult. In order to iterate,
|
||||
a keys table is maintained, which keeps a collection of all keys in a ledger. This table has one record for every
|
||||
million ledgers, where each record has all of the keys in that ledger, as well as all of the keys that were deleted
|
||||
between that ledger and the prior ledger written to the keys table. Most of this logic is contained in `BackendIndexer`.
|
||||
|
||||
Reference in New Issue
Block a user