From 1583108a51ea7dc6da827cfa8f4076415a893c4b Mon Sep 17 00:00:00 2001 From: CJ Cobb <46455409+cjcobb23@users.noreply.github.com> Date: Wed, 23 Jun 2021 11:19:04 -0400 Subject: [PATCH] Create etl folder README --- src/etl/README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 src/etl/README.md diff --git a/src/etl/README.md b/src/etl/README.md new file mode 100644 index 00000000..fe314ebc --- /dev/null +++ b/src/etl/README.md @@ -0,0 +1,29 @@ +A single clio node has one or more ETL sources, specified in the config +file. clio will subscribe to the `ledgers` stream of each of the ETL +sources. This stream sends a message whenever a new ledger is validated. Upon +receiving a message on the stream, clio will then fetch the data associated +with the newly validated ledger from one of the ETL sources. The fetch is +performed via a gRPC request (`GetLedger`). This request returns the ledger +header, transactions+metadata blobs, and every ledger object +added/modified/deleted as part of this ledger. ETL then writes all of this data +to the databases, and moves on to the next ledger. ETL does not apply +transactions, but rather extracts the already computed results of those +transactions (all of the added/modified/deleted SHAMap leaf nodes of the state +tree). + +If the database is entirely empty, ETL must download an entire ledger in full +(as opposed to just the diff, as described above). This download is done via the +`GetLedgerData` gRPC request. `GetLedgerData` allows clients to page through an +entire ledger over several RPC calls. ETL will page through an entire ledger, +and write each object to the database. + +If the database is not empty, clio will first come up in a "soft" +read-only mode. In read-only mode, the server does not perform ETL and simply +publishes new ledgers as they are written to the database. +If the database is not updated within a certain time period +(currently hard coded at 20 seconds), clio will begin the ETL +process and start writing to the database. Postgres will report an error when +trying to write a record with a key that already exists. ETL uses this error to +determine that another process is writing to the database, and subsequently +falls back to a soft read-only mode. clio can also operate in strict +read-only mode, in which case they will never write to the database.