mirror of
https://github.com/XRPLF/rippled.git
synced 2025-11-20 11:05:54 +00:00
Add new RocksDBQuickFactory for benchmarking:
This new factory is intended for benchmarking against the existing RocksDBFactory and has the following differences. * Does not use BatchWriter * Disables WAL for writes to memtable * Uses a hash index in blocks * Uses RocksDB OptimizeFor… functions See Benchmarks.md for further discussion of some of the issues raised by investigation of RocksDB performance.
This commit is contained in:
committed by
Vinnie Falco
parent
6540804571
commit
a1f46e84b8
@@ -2520,6 +2520,11 @@
|
||||
</ClCompile>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\backend\RocksDBFactory.h">
|
||||
</ClInclude>
|
||||
<ClCompile Include="..\..\src\ripple\nodestore\backend\RocksDBQuickFactory.cpp">
|
||||
<ExcludedFromBuild>True</ExcludedFromBuild>
|
||||
</ClCompile>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\backend\RocksDBQuickFactory.h">
|
||||
</ClInclude>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\Database.h">
|
||||
</ClInclude>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\DummyScheduler.h">
|
||||
|
||||
@@ -3561,6 +3561,12 @@
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\backend\RocksDBFactory.h">
|
||||
<Filter>ripple\nodestore\backend</Filter>
|
||||
</ClInclude>
|
||||
<ClCompile Include="..\..\src\ripple\nodestore\backend\RocksDBQuickFactory.cpp">
|
||||
<Filter>ripple\nodestore\backend</Filter>
|
||||
</ClCompile>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\backend\RocksDBQuickFactory.h">
|
||||
<Filter>ripple\nodestore\backend</Filter>
|
||||
</ClInclude>
|
||||
<ClInclude Include="..\..\src\ripple\nodestore\Database.h">
|
||||
<Filter>ripple\nodestore</Filter>
|
||||
</ClInclude>
|
||||
|
||||
45
src/ripple/nodestore/Benchmarks.md
Normal file
45
src/ripple/nodestore/Benchmarks.md
Normal file
@@ -0,0 +1,45 @@
|
||||
#Benchmarks
|
||||
|
||||
```
|
||||
$rippled --unittest=NodeStoreTiming --unittest-arg="type=rocksdbquick,style=level,num_objects=2000000"
|
||||
|
||||
ripple.bench.NodeStoreTiming repeatableObject
|
||||
Batch Insert Fetch 50/50 Fetch Missing Fetch Random Inserts Ordered Fetch
|
||||
59.53 12.67 6.04 11.33 25.55 52.15 type=rocksdbquick,style=level,num_objects=2000000
|
||||
|
||||
$rippled --unittest=NodeStoreTiming --unittest-arg="type=rocksdbquick,style=level,num_objects=2000000"
|
||||
|
||||
ripple.bench.NodeStoreTiming repeatableObject
|
||||
Batch Insert Fetch 50/50 Fetch Missing Fetch Random Inserts Ordered Fetch
|
||||
44.29 27.45 5.95 20.47 23.58 53.60 type=rocksdbquick,style=level,num_objects=2000000
|
||||
```
|
||||
|
||||
```
|
||||
$rippled --unittest=NodeStoreTiming --unittest-arg="type=rocksdb,num_objects=2000000,open_files=2000,filter_bits=12,cache_mb=256,file_size_mb=8,file_size_mult=2"
|
||||
|
||||
ripple.bench.NodeStoreTiming repeatableObject
|
||||
Batch Insert Fetch 50/50 Fetch Missing Fetch Random Inserts Ordered Fetch
|
||||
377.61 30.62 10.05 17.41 201.73 64.46 type=rocksdb,num_objects=2000000,open_files=2000,filter_bits=12,cache_mb=256,file_size_mb=8,file_size_mult=2
|
||||
|
||||
$rippled --unittest=NodeStoreTiming --unittest-arg="type=rocksdb,num_objects=2000000,open_files=2000,filter_bits=12,cache_mb=256,file_size_mb=8,file_size_mult=2"
|
||||
|
||||
ripple.bench.NodeStoreTiming repeatableObject
|
||||
Batch Insert Fetch 50/50 Fetch Missing Fetch Random Inserts Ordered Fetch
|
||||
405.83 29.48 11.29 25.81 209.05 55.75 type=rocksdb,num_objects=2000000,open_files=2000,filter_bits=12,cache_mb=256,file_size_mb=8,file_size_mult=2
|
||||
```
|
||||
|
||||
##Discussion
|
||||
|
||||
RocksDBQuickFactory is intended to provide a testbed for comparing potential rocksdb performance with the existing recommended configuration in rippled.cfg. Through various executions and profiling some conclusions are presented below.
|
||||
|
||||
* If the write ahead log is enabled, insert speed soon clogs up under load. The BatchWriter class intends to stop this from blocking the main threads by queuing up writes and running them in a separate thread. However, rocksdb already has separate threads dedicated to flushing the memtable to disk and the memtable is itself an in-memory queue. The result is two queues with a guarantee of durability in between. However if the memtable was used as the sole queue and the rocksdb::Flush() call was manually triggered at opportune moments, possibly just after ledger close, then that would provide similar, but more predictable guarantees. It would also remove an unneeded thread and unnecessary memory usage. An alternative point of view is that because there will always be many other rippled instances running there is no need for such guarantees. The nodes will always be available from another peer.
|
||||
|
||||
* Lookup in a block was previously using binary search. With rippled's use case it is highly unlikely that two adjacent key/values will ever be requested one after the other. Therefore hash indexing of blocks makes much more sense. Rocksdb has a number of options for hash indexing both memtables and blocks and these need more testing to find the best choice.
|
||||
|
||||
* The current Database implementation has two forms of caching, so the LRU cache of blocks at Factory level does not make any sense. However, if the hash indexing and potentially the new [bloom filter](http://rocksdb.org/blog/1427/new-bloom-filter-format/) can provide faster lookup for non-existent keys, then potentially the caching could exist at Factory level.
|
||||
|
||||
* Multiple runs of the benchmarks can yield surprisingly different results. This can perhaps be attributed to the asynchronous nature of rocksdb's compaction process. The benchmarks are artifical and create highly unlikely write load to create the dataset to measure different read access patterns. Therefore multiple runs of the benchmarks are required to get a feel for the effectiveness of the changes. This contrasts sharply with the keyvadb benchmarking were highly repeatable timings were discovered. Also realistically sized datasets are required to get a correct insight. The number of 2,000,000 key/values (actually 4,000,000 after the two insert benchmarks complete) is too low to get a full picture.
|
||||
|
||||
* An interesting side effect of running the benchmarks in a profiler was that a clear pattern of what RocksDB does under the hood was observable. This led to the decision to trial hash indexing and also the discovery of the native CRC32 instruction not being used.
|
||||
|
||||
* Important point to note that is if this factory is tested with an existing set of sst files none of the old sst files will benefit from indexing changes until they are compacted at a future point in time.
|
||||
356
src/ripple/nodestore/backend/RocksDBQuickFactory.cpp
Normal file
356
src/ripple/nodestore/backend/RocksDBQuickFactory.cpp
Normal file
@@ -0,0 +1,356 @@
|
||||
//------------------------------------------------------------------------------
|
||||
/*
|
||||
This file is part of rippled: https://github.com/ripple/rippled
|
||||
Copyright (c) 2012, 2013 Ripple Labs Inc.
|
||||
|
||||
Permission to use, copy, modify, and/or distribute this software for any
|
||||
purpose with or without fee is hereby granted, provided that the above
|
||||
copyright notice and this permission notice appear in all copies.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
||||
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
||||
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
||||
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
||||
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
||||
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
||||
*/
|
||||
//==============================================================================
|
||||
|
||||
#if RIPPLE_ROCKSDB_AVAILABLE
|
||||
|
||||
#include <ripple/core/Config.h>
|
||||
#include <beast/threads/Thread.h>
|
||||
#include <atomic>
|
||||
|
||||
namespace ripple {
|
||||
namespace NodeStore {
|
||||
|
||||
class RockDBQuickEnv : public rocksdb::EnvWrapper
|
||||
{
|
||||
public:
|
||||
RockDBQuickEnv ()
|
||||
: EnvWrapper (rocksdb::Env::Default())
|
||||
{
|
||||
}
|
||||
|
||||
struct ThreadParams
|
||||
{
|
||||
ThreadParams (void (*f_)(void*), void* a_)
|
||||
: f (f_)
|
||||
, a (a_)
|
||||
{
|
||||
}
|
||||
|
||||
void (*f)(void*);
|
||||
void* a;
|
||||
};
|
||||
|
||||
static
|
||||
void
|
||||
thread_entry (void* ptr)
|
||||
{
|
||||
ThreadParams* const p (reinterpret_cast <ThreadParams*> (ptr));
|
||||
void (*f)(void*) = p->f;
|
||||
void* a (p->a);
|
||||
delete p;
|
||||
|
||||
static std::atomic <std::size_t> n;
|
||||
std::size_t const id (++n);
|
||||
std::stringstream ss;
|
||||
ss << "rocksdb #" << id;
|
||||
beast::Thread::setCurrentThreadName (ss.str());
|
||||
|
||||
(*f)(a);
|
||||
}
|
||||
|
||||
void
|
||||
StartThread (void (*f)(void*), void* a)
|
||||
{
|
||||
ThreadParams* const p (new ThreadParams (f, a));
|
||||
EnvWrapper::StartThread (&RockDBQuickEnv::thread_entry, p);
|
||||
}
|
||||
};
|
||||
|
||||
//------------------------------------------------------------------------------
|
||||
|
||||
class RocksDBQuickBackend
|
||||
: public Backend
|
||||
, public beast::LeakChecked <RocksDBQuickBackend>
|
||||
{
|
||||
public:
|
||||
beast::Journal m_journal;
|
||||
size_t const m_keyBytes;
|
||||
std::string m_name;
|
||||
std::unique_ptr <rocksdb::DB> m_db;
|
||||
|
||||
RocksDBQuickBackend (int keyBytes, Parameters const& keyValues,
|
||||
Scheduler& scheduler, beast::Journal journal, RockDBQuickEnv* env)
|
||||
: m_journal (journal)
|
||||
, m_keyBytes (keyBytes)
|
||||
, m_name (keyValues ["path"].toStdString ())
|
||||
{
|
||||
if (m_name.empty())
|
||||
throw std::runtime_error ("Missing path in RocksDBFactory backend");
|
||||
|
||||
// Defaults
|
||||
std::uint64_t budget = 512 * 1024 * 1024; // 512MB
|
||||
std::string style("level");
|
||||
std::uint64_t threads=4;
|
||||
|
||||
if (!keyValues["budget"].isEmpty())
|
||||
budget = keyValues["budget"].getIntValue();
|
||||
|
||||
if (!keyValues["style"].isEmpty())
|
||||
style = keyValues["style"].toStdString();
|
||||
|
||||
if (!keyValues["threads"].isEmpty())
|
||||
threads = keyValues["threads"].getIntValue();
|
||||
|
||||
|
||||
// Set options
|
||||
rocksdb::Options options;
|
||||
options.create_if_missing = true;
|
||||
options.env = env;
|
||||
|
||||
if (style == "level")
|
||||
options.OptimizeLevelStyleCompaction(budget);
|
||||
|
||||
if (style == "universal")
|
||||
options.OptimizeUniversalStyleCompaction(budget);
|
||||
|
||||
if (style == "point")
|
||||
options.OptimizeForPointLookup(budget / 1024 / 1024); // In MB
|
||||
|
||||
options.IncreaseParallelism(threads);
|
||||
|
||||
// Allows hash indexes in blocks
|
||||
options.prefix_extractor.reset(rocksdb::NewNoopTransform());
|
||||
|
||||
// overrride OptimizeLevelStyleCompaction
|
||||
options.min_write_buffer_number_to_merge = 1;
|
||||
|
||||
rocksdb::BlockBasedTableOptions table_options;
|
||||
// Use hash index
|
||||
table_options.index_type =
|
||||
rocksdb::BlockBasedTableOptions::kHashSearch;
|
||||
table_options.filter_policy.reset(
|
||||
rocksdb::NewBloomFilterPolicy(10));
|
||||
options.table_factory.reset(
|
||||
NewBlockBasedTableFactory(table_options));
|
||||
|
||||
// Higher values make reads slower
|
||||
// table_options.block_size = 4096;
|
||||
|
||||
// No point when DatabaseImp has a cache
|
||||
// table_options.block_cache =
|
||||
// rocksdb::NewLRUCache(64 * 1024 * 1024);
|
||||
|
||||
options.memtable_factory.reset(rocksdb::NewHashSkipListRepFactory());
|
||||
// Alternative:
|
||||
// options.memtable_factory.reset(
|
||||
// rocksdb::NewHashCuckooRepFactory(options.write_buffer_size));
|
||||
|
||||
rocksdb::DB* db = nullptr;
|
||||
|
||||
rocksdb::Status status = rocksdb::DB::Open (options, m_name, &db);
|
||||
if (!status.ok () || !db)
|
||||
throw std::runtime_error (std::string("Unable to open/create RocksDB: ") + status.ToString());
|
||||
|
||||
m_db.reset (db);
|
||||
}
|
||||
|
||||
~RocksDBQuickBackend ()
|
||||
{
|
||||
}
|
||||
|
||||
std::string
|
||||
getName()
|
||||
{
|
||||
return m_name;
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
|
||||
Status
|
||||
fetch (void const* key, NodeObject::Ptr* pObject)
|
||||
{
|
||||
pObject->reset ();
|
||||
|
||||
Status status (ok);
|
||||
|
||||
rocksdb::ReadOptions const options;
|
||||
rocksdb::Slice const slice (static_cast <char const*> (key), m_keyBytes);
|
||||
|
||||
std::string string;
|
||||
|
||||
rocksdb::Status getStatus = m_db->Get (options, slice, &string);
|
||||
|
||||
if (getStatus.ok ())
|
||||
{
|
||||
DecodedBlob decoded (key, string.data (), string.size ());
|
||||
|
||||
if (decoded.wasOk ())
|
||||
{
|
||||
*pObject = decoded.createObject ();
|
||||
}
|
||||
else
|
||||
{
|
||||
// Decoding failed, probably corrupted!
|
||||
//
|
||||
status = dataCorrupt;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
if (getStatus.IsCorruption ())
|
||||
{
|
||||
status = dataCorrupt;
|
||||
}
|
||||
else if (getStatus.IsNotFound ())
|
||||
{
|
||||
status = notFound;
|
||||
}
|
||||
else
|
||||
{
|
||||
status = Status (customCode + getStatus.code());
|
||||
|
||||
m_journal.error << getStatus.ToString ();
|
||||
}
|
||||
}
|
||||
|
||||
return status;
|
||||
}
|
||||
|
||||
void
|
||||
store (NodeObject::ref object)
|
||||
{
|
||||
storeBatch(Batch{object});
|
||||
}
|
||||
|
||||
void
|
||||
storeBatch (Batch const& batch)
|
||||
{
|
||||
rocksdb::WriteBatch wb;
|
||||
|
||||
EncodedBlob encoded;
|
||||
|
||||
for (auto const& e : batch)
|
||||
{
|
||||
encoded.prepare (e);
|
||||
|
||||
wb.Put(
|
||||
rocksdb::Slice(reinterpret_cast<char const*>(encoded.getKey()),
|
||||
m_keyBytes),
|
||||
rocksdb::Slice(reinterpret_cast<char const*>(encoded.getData()),
|
||||
encoded.getSize()));
|
||||
}
|
||||
|
||||
rocksdb::WriteOptions options;
|
||||
|
||||
// Crucial to ensure good write speed and non-blocking writes to memtable
|
||||
options.disableWAL = true;
|
||||
|
||||
auto ret = m_db->Write (options, &wb);
|
||||
|
||||
if (!ret.ok ())
|
||||
throw std::runtime_error ("storeBatch failed: " + ret.ToString());
|
||||
}
|
||||
|
||||
void
|
||||
for_each (std::function <void(NodeObject::Ptr)> f)
|
||||
{
|
||||
rocksdb::ReadOptions const options;
|
||||
|
||||
std::unique_ptr <rocksdb::Iterator> it (m_db->NewIterator (options));
|
||||
|
||||
for (it->SeekToFirst (); it->Valid (); it->Next ())
|
||||
{
|
||||
if (it->key ().size () == m_keyBytes)
|
||||
{
|
||||
DecodedBlob decoded (it->key ().data (),
|
||||
it->value ().data (),
|
||||
it->value ().size ());
|
||||
|
||||
if (decoded.wasOk ())
|
||||
{
|
||||
f (decoded.createObject ());
|
||||
}
|
||||
else
|
||||
{
|
||||
// Uh oh, corrupted data!
|
||||
if (m_journal.fatal) m_journal.fatal <<
|
||||
"Corrupt NodeObject #" << uint256 (it->key ().data ());
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
// VFALCO NOTE What does it mean to find an
|
||||
// incorrectly sized key? Corruption?
|
||||
if (m_journal.fatal) m_journal.fatal <<
|
||||
"Bad key size = " << it->key ().size ();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
getWriteLoad ()
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
|
||||
void
|
||||
writeBatch (Batch const& batch)
|
||||
{
|
||||
storeBatch (batch);
|
||||
}
|
||||
};
|
||||
|
||||
//------------------------------------------------------------------------------
|
||||
|
||||
class RocksDBQuickFactory : public Factory
|
||||
{
|
||||
public:
|
||||
std::shared_ptr <rocksdb::Cache> m_lruCache;
|
||||
RockDBQuickEnv m_env;
|
||||
|
||||
RocksDBQuickFactory ()
|
||||
{
|
||||
}
|
||||
|
||||
~RocksDBQuickFactory ()
|
||||
{
|
||||
}
|
||||
|
||||
std::string
|
||||
getName () const
|
||||
{
|
||||
return "RocksDBQuick";
|
||||
}
|
||||
|
||||
std::unique_ptr <Backend>
|
||||
createInstance (
|
||||
size_t keyBytes,
|
||||
Parameters const& keyValues,
|
||||
Scheduler& scheduler,
|
||||
beast::Journal journal)
|
||||
{
|
||||
return std::make_unique <RocksDBQuickBackend> (
|
||||
keyBytes, keyValues, scheduler, journal, &m_env);
|
||||
}
|
||||
};
|
||||
|
||||
//------------------------------------------------------------------------------
|
||||
|
||||
std::unique_ptr <Factory>
|
||||
make_RocksDBQuickFactory ()
|
||||
{
|
||||
return std::make_unique <RocksDBQuickFactory> ();
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
#endif
|
||||
40
src/ripple/nodestore/backend/RocksDBQuickFactory.h
Normal file
40
src/ripple/nodestore/backend/RocksDBQuickFactory.h
Normal file
@@ -0,0 +1,40 @@
|
||||
//------------------------------------------------------------------------------
|
||||
/*
|
||||
This file is part of rippled: https://github.com/ripple/rippled
|
||||
Copyright (c) 2012, 2013 Ripple Labs Inc.
|
||||
|
||||
Permission to use, copy, modify, and/or distribute this software for any
|
||||
purpose with or without fee is hereby granted, provided that the above
|
||||
copyright notice and this permission notice appear in all copies.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
||||
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
||||
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
||||
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
||||
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
||||
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
||||
*/
|
||||
//==============================================================================
|
||||
|
||||
#ifndef RIPPLE_NODESTORE_ROCKSDBQUICKFACTORY_H_INCLUDED
|
||||
#define RIPPLE_NODESTORE_ROCKSDBQUICKFACTORY_H_INCLUDED
|
||||
|
||||
#if RIPPLE_ROCKSDB_AVAILABLE
|
||||
|
||||
#include <ripple/nodestore/Factory.h>
|
||||
|
||||
namespace ripple {
|
||||
namespace NodeStore {
|
||||
|
||||
/** Factory to produce experimental RocksDB backends for the NodeStore.
|
||||
@see Database
|
||||
*/
|
||||
std::unique_ptr <Factory> make_RocksDBQuickFactory ();
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
#endif
|
||||
@@ -63,6 +63,7 @@ public:
|
||||
|
||||
#if RIPPLE_ROCKSDB_AVAILABLE
|
||||
add_factory (make_RocksDBFactory ());
|
||||
add_factory (make_RocksDBQuickFactory ());
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
@@ -45,6 +45,8 @@
|
||||
#include <ripple/nodestore/backend/NullFactory.cpp>
|
||||
#include <ripple/nodestore/backend/RocksDBFactory.h>
|
||||
#include <ripple/nodestore/backend/RocksDBFactory.cpp>
|
||||
#include <ripple/nodestore/backend/RocksDBQuickFactory.h>
|
||||
#include <ripple/nodestore/backend/RocksDBQuickFactory.cpp>
|
||||
|
||||
#include <ripple/nodestore/impl/Backend.cpp>
|
||||
#include <ripple/nodestore/impl/BatchWriter.cpp>
|
||||
|
||||
Reference in New Issue
Block a user