Merge branch 'master' into verify-new2

This commit is contained in:
Shawn Xie
2023-05-05 12:53:19 -04:00
committed by GitHub
6 changed files with 185 additions and 84 deletions

View File

@@ -12,6 +12,9 @@ if(VERBOSE)
set(FETCHCONTENT_QUIET FALSE CACHE STRING "Verbose FetchContent()")
endif()
#c++20 removed std::result_of but boost 1.75 is still using it.
add_definitions(-DBOOST_ASIO_HAS_STD_INVOKE_RESULT=1)
add_library(clio)
target_compile_features(clio PUBLIC cxx_std_20)
target_include_directories(clio PUBLIC src)

View File

@@ -1,30 +1,81 @@
# CLIO MIGRATOR (ONE OFF!)
This tool is a (really) hacky way of migrating some data from
This tool allows you to backfill data from
[clio](https://github.com/XRPLF/clio) due to the [specific pull request
313](https://github.com/XRPLF/clio/pull/313) in that repo.
Specifically, it is meant to migrate NFT data such that:
* The new `nf_token_uris` table is populated with all URIs for all NFTs known
* The new `issuer_nf_tokens_v2` table is populated with all NFTs known
* The old `issuer_nf_tokens` table is dropped. This table was never used prior
to the above-referenced PR, so it is very safe to drop.
- The new `nf_token_uris` table is populated with all URIs for all NFTs known
- The new `issuer_nf_tokens_v2` table is populated with all NFTs known
- The old `issuer_nf_tokens` table is dropped. This table was never used prior
to the above-referenced PR, so it is very safe to drop.
## How to use
This tool should be used as follows, with regard to the above update:
1) Stop serving requests from your clio
2) Stop your clio and upgrade it to the version after the after PR
3) Start your clio
4) Now, your clio is writing new data correctly. This tool will update your
old data, while your new clio is running.
5) Run this tool, using the _exact_ same config as what you are using for your
production clio.
6) Once this tool terminates successfully, you can resume serving requests
from your clio.
1. __Compile or download the new version of `clio`__, but don't run it just yet.
2. __Stop serving requests from your existing `clio`__. If you need to achieve zero downtime, you have two options:
- Temporarily point your traffic to someone else's `clio` that has already performed this
migration. The XRPL Foundation should have performed this on their servers before this
release. Ask in our Discord what server to point traffic to.
- Create a new temporary `clio` instance running _the prior release_ and make sure
that its config.json specifies `read_only: true`. You can safely serve data
from this separate instance.
3. __Stop your `clio` and restart it, running the new version__. Now, your `clio` is writing new data correctly. This tool will update your
old data, while your upgraded `clio` is running and writing new ledgers.
5. __Run this tool__, using the _exact_ same config as what you are using for your
production `clio`.
6. __Once this tool terminates successfully__, you can resume serving requests
from your `clio`.
## Compiling
## Notes on timing
The amount of time that this migration takes depends greatly on what your data
looks like. This migration migrates data in three steps:
1. __Transaction loading__
- Pull all successful transactions that relate to NFTs.
The hashes of these transactions are stored in the `nf_token_transctions` table.
- For each of these transactions, discard any that were posted after the
migration started
- For each of these transactions, discard any that are not NFTokenMint
transactions
- For any remaning transactions, pull the associated NFT data from them and
write them to the database.
2. __Initial ledger loading__ We need to also scan all objects in the initial
ledger, looking for any NFTokenPage objects that would not have an associated
transaction recorded.
- Pull all objects from the initial ledger
- For each object, if it is not an NFTokenPage, discard it.
- Otherwise, load all NFTs stored in the NFTokenPage
3. __Drop the old (and unused) `issuer_nf_tokens` table__. This step is completely
safe, since this table is not used for anything in clio. It was meant to drive
a clio-only API called `nfts_by_issuer`, which is still in development.
However, we decided that for performance reasons its schema needed to change
to the schema we have in `issuer_nf_tokens_v2`. Since the API in question is
not yet part of clio, removing this table will not affect anything.
Step 1 is highly performance optimized. If you have a full-history clio
set-up, this migration make take only a few minutes. We tested it on a
full-history server and it completed in about 9 minutes.
However Step 2 is not well-optimized and unfortuntely cannot be. If you have a
clio server whose `start_sequence` is relatively recent (even if the
`start_sequence` indicates a ledger prior to NFTs being enabled on your
network), the migration will take longer. We tested it on a clio with a
`start_sequence` of about 1 week prior to testing and it completed in about 6
hours.
As a result, we recommend _assuming_ the worst case: that this migration will take about 8
hours.
## Compiling and running
Git-clone this project to your server. Then from the top-level directory:
```
@@ -44,3 +95,4 @@ This migration will take a few hours to complete. After this completes, it is op
```
./clio_verifier <config path>
```

View File

@@ -111,7 +111,7 @@ synchronous(F&& f)
* R is the currently executing coroutine that is about to get passed.
* If corountine types do not match, the current one's type is stored.
*/
using R = typename std::result_of<F(boost::asio::yield_context&)>::type;
using R = typename boost::result_of<F(boost::asio::yield_context&)>::type;
if constexpr (!std::is_same<R, void>::value)
{
/**

View File

@@ -550,7 +550,7 @@ CassandraBackend::fetchTransactions(
std::vector<TransactionAndMetadata> results{numHashes};
std::vector<std::shared_ptr<ReadCallbackData<result_type>>> cbs;
cbs.reserve(numHashes);
auto timeDiff = util::timed([&]() {
[[maybe_unused]] auto timeDiff = util::timed([&]() {
for (std::size_t i = 0; i < hashes.size(); ++i)
{
CassandraStatement statement{selectTransaction_};
@@ -580,9 +580,9 @@ CassandraBackend::fetchTransactions(
throw DatabaseTimeout();
}
log_.debug() << "Fetched " << numHashes
<< " transactions from Cassandra in " << timeDiff
<< " milliseconds";
// log_.debug() << "Fetched " << numHashes
// << " transactions from Cassandra in " << timeDiff
// << " milliseconds";
return results;
}

View File

@@ -12,49 +12,63 @@
static std::uint32_t const MAX_RETRIES = 5;
static std::chrono::seconds const WAIT_TIME = std::chrono::seconds(60);
static std::uint32_t const NFT_WRITE_BATCH_SIZE = 10000;
static void
wait(boost::asio::steady_timer& timer, std::string const reason)
wait(boost::asio::steady_timer& timer, std::string const& reason)
{
BOOST_LOG_TRIVIAL(info) << reason << ". Waiting";
BOOST_LOG_TRIVIAL(info) << reason << ". Waiting then retrying";
timer.expires_after(WAIT_TIME);
timer.wait();
BOOST_LOG_TRIVIAL(info) << "Done";
BOOST_LOG_TRIVIAL(info) << "Done waiting";
}
static void
static std::vector<NFTsData>
doNFTWrite(
std::vector<NFTsData>& nfts,
Backend::CassandraBackend& backend,
std::string const tag)
std::string const& tag)
{
if (nfts.size() <= 0)
return;
if (nfts.size() == 0)
return nfts;
auto const size = nfts.size();
backend.writeNFTs(std::move(nfts));
backend.sync();
BOOST_LOG_TRIVIAL(info) << tag << ": Wrote " << size << " records";
return {};
}
static std::optional<Backend::TransactionAndMetadata>
doTryFetchTransaction(
static std::vector<NFTsData>
maybeDoNFTWrite(
std::vector<NFTsData>& nfts,
Backend::CassandraBackend& backend,
std::string const& tag)
{
if (nfts.size() < NFT_WRITE_BATCH_SIZE)
return nfts;
return doNFTWrite(nfts, backend, tag);
}
static std::vector<Backend::TransactionAndMetadata>
doTryFetchTransactions(
boost::asio::steady_timer& timer,
Backend::CassandraBackend& backend,
ripple::uint256 const& hash,
std::vector<ripple::uint256> const& hashes,
boost::asio::yield_context& yield,
std::uint32_t const attempts = 0)
{
try
{
return backend.fetchTransaction(hash, yield);
return backend.fetchTransactions(hashes, yield);
}
catch (Backend::DatabaseTimeout const& e)
{
if (attempts >= MAX_RETRIES)
throw e;
wait(timer, "Transaction read error");
return doTryFetchTransaction(timer, backend, hash, yield, attempts + 1);
wait(timer, "Transactions read error");
return doTryFetchTransactions(
timer, backend, hashes, yield, attempts + 1);
}
}
@@ -69,7 +83,7 @@ doTryFetchLedgerPage(
{
try
{
return backend.fetchLedgerPage(cursor, sequence, 2000, false, yield);
return backend.fetchLedgerPage(cursor, sequence, 10000, false, yield);
}
catch (Backend::DatabaseTimeout const& e)
{
@@ -104,24 +118,12 @@ doTryGetTxPageResult(
}
static void
doMigration(
doMigrationStepOne(
Backend::CassandraBackend& backend,
boost::asio::steady_timer& timer,
boost::asio::yield_context& yield)
boost::asio::yield_context& yield,
Backend::LedgerRange const& ledgerRange)
{
BOOST_LOG_TRIVIAL(info) << "Beginning migration";
auto const ledgerRange = backend.hardFetchLedgerRangeNoThrow(yield);
/*
* Step 0 - If we haven't downloaded the initial ledger yet, just short
* circuit.
*/
if (!ledgerRange)
{
BOOST_LOG_TRIVIAL(info) << "There is no data to migrate";
return;
}
/*
* Step 1 - Look at all NFT transactions recorded in
* `nf_token_transactions` and reload any NFTokenMint transactions. These
@@ -130,6 +132,9 @@ doMigration(
* the tokens in `nf_tokens` because we also want to cover the extreme
* edge case of a token that is re-minted with a different URI.
*/
std::string const stepTag = "Step 1 - transaction loading";
std::vector<NFTsData> toWrite;
std::stringstream query;
query << "SELECT hash FROM " << backend.tablePrefix()
<< "nf_token_transactions";
@@ -140,11 +145,11 @@ doMigration(
// For all NFT txs, paginated in groups of 1000...
while (morePages)
{
std::vector<NFTsData> toWrite;
CassResult const* result =
doTryGetTxPageResult(nftTxQuery, timer, backend);
std::vector<ripple::uint256> txHashes;
// For each tx in page...
CassIterator* txPageIterator = cass_iterator_from_result(result);
while (cass_iterator_next(txPageIterator))
@@ -165,37 +170,29 @@ doMigration(
"Could not retrieve hash from nf_token_transactions");
}
auto const txHash = ripple::uint256::fromVoid(buf);
auto const tx =
doTryFetchTransaction(timer, backend, txHash, yield);
if (!tx)
{
cass_iterator_free(txPageIterator);
cass_result_free(result);
cass_statement_free(nftTxQuery);
std::stringstream ss;
ss << "Could not fetch tx with hash "
<< ripple::to_string(txHash);
throw std::runtime_error(ss.str());
txHashes.push_back(ripple::uint256::fromVoid(buf));
}
// Not really sure how cassandra paging works, but we want to skip
// any transactions that were loaded since the migration started
if (tx->ledgerSequence > ledgerRange->maxSequence)
auto const txs =
doTryFetchTransactions(timer, backend, txHashes, yield);
for (auto const& tx : txs)
{
if (tx.ledgerSequence > ledgerRange.maxSequence)
continue;
ripple::STTx const sttx{ripple::SerialIter{
tx->transaction.data(), tx->transaction.size()}};
tx.transaction.data(), tx.transaction.size()}};
if (sttx.getTxnType() != ripple::TxType::ttNFTOKEN_MINT)
continue;
ripple::TxMeta const txMeta{
sttx.getTransactionID(), tx->ledgerSequence, tx->metadata};
sttx.getTransactionID(), tx.ledgerSequence, tx.metadata};
toWrite.push_back(
std::get<1>(getNFTDataFromTx(txMeta, sttx)).value());
}
doNFTWrite(toWrite, backend, "TX");
toWrite = maybeDoNFTWrite(toWrite, backend, stepTag);
morePages = cass_result_has_more_pages(result);
if (morePages)
@@ -205,8 +202,16 @@ doMigration(
}
cass_statement_free(nftTxQuery);
BOOST_LOG_TRIVIAL(info) << "\nDone with transaction loading!\n";
doNFTWrite(toWrite, backend, stepTag);
}
static void
doMigrationStepTwo(
Backend::CassandraBackend& backend,
boost::asio::steady_timer& timer,
boost::asio::yield_context& yield,
Backend::LedgerRange const& ledgerRange)
{
/*
* Step 2 - Pull every object from our initial ledger and load all NFTs
* found in any NFTokenPage object. Prior to this migration, we were not
@@ -214,32 +219,43 @@ doMigration(
* missed. This will also record the URI of any NFTs minted prior to the
* start sequence.
*/
std::string const stepTag = "Step 2 - initial ledger loading";
std::vector<NFTsData> toWrite;
std::optional<ripple::uint256> cursor;
// For each object page in initial ledger
do
{
auto const page = doTryFetchLedgerPage(
timer, backend, cursor, ledgerRange->minSequence, yield);
timer, backend, cursor, ledgerRange.minSequence, yield);
// For each object in page
for (auto const& object : page.objects)
{
std::vector<NFTsData> toWrite = getNFTDataFromObj(
ledgerRange->minSequence,
ripple::to_string(object.key),
auto const objectNFTs = getNFTDataFromObj(
ledgerRange.minSequence,
std::string(object.key.begin(), object.key.end()),
std::string(object.blob.begin(), object.blob.end()));
doNFTWrite(toWrite, backend, "OBJ");
toWrite.insert(toWrite.end(), objectNFTs.begin(), objectNFTs.end());
}
toWrite = maybeDoNFTWrite(toWrite, backend, stepTag);
cursor = page.cursor;
} while (cursor.has_value());
BOOST_LOG_TRIVIAL(info) << "\nDone with object loading!\n";
doNFTWrite(toWrite, backend, stepTag);
}
static bool
doMigrationStepThree(Backend::CassandraBackend& backend)
{
/*
* Step 3 - Drop the old `issuer_nf_tokens` table, which is replaced by
* `issuer_nf_tokens_v2`. Normally, we should probably not drop old tables
* in migrations, but here it is safe since the old table wasn't yet being
* used to serve any data anyway.
*/
query.str("");
std::stringstream query;
query << "DROP TABLE " << backend.tablePrefix() << "issuer_nf_tokens";
CassStatement* issuerDropTableQuery =
cass_statement_new(query.str().c_str(), 0);
@@ -249,12 +265,42 @@ doMigration(
cass_future_free(fut);
cass_statement_free(issuerDropTableQuery);
backend.sync();
if (rc != CASS_OK)
BOOST_LOG_TRIVIAL(warning) << "\nCould not drop old issuer_nf_tokens "
return rc == CASS_OK;
}
static void
doMigration(
Backend::CassandraBackend& backend,
boost::asio::steady_timer& timer,
boost::asio::yield_context& yield)
{
BOOST_LOG_TRIVIAL(info) << "Beginning migration";
auto const ledgerRange = backend.hardFetchLedgerRangeNoThrow(yield);
/*
* Step 0 - If we haven't downloaded the initial ledger yet, just short
* circuit.
*/
if (!ledgerRange)
{
BOOST_LOG_TRIVIAL(info) << "There is no data to migrate";
return;
}
doMigrationStepOne(backend, timer, yield, *ledgerRange);
BOOST_LOG_TRIVIAL(info) << "\nStep 1 done!\n";
doMigrationStepTwo(backend, timer, yield, *ledgerRange);
BOOST_LOG_TRIVIAL(info) << "\nStep 2 done!\n";
auto const stepThreeResult = doMigrationStepThree(backend);
BOOST_LOG_TRIVIAL(info) << "\nStep 3 done!";
if (stepThreeResult)
BOOST_LOG_TRIVIAL(info) << "Dropped old 'issuer_nf_tokens' table!\n";
else
BOOST_LOG_TRIVIAL(warning) << "Could not drop old issuer_nf_tokens "
"table. If it still exists, "
"you should drop it yourself\n";
else
BOOST_LOG_TRIVIAL(info) << "\nDropped old 'issuer_nf_tokens' table!\n";
BOOST_LOG_TRIVIAL(info)
<< "\nCompleted migration from " << ledgerRange->minSequence << " to "

View File

@@ -178,7 +178,7 @@ public:
{
for (auto const& handler : handlers)
{
handlerMap_[handler.method] = move(handler);
handlerMap_[handler.method] = std::move(handler);
}
}