The memory allocation patterns of Json::Value benefit greatly
from the slabbed allocator. This commit adds a global slabbed
allocator dedicated to `Json::Value`.
Real-world data indicates that only 2% of allocation requests
are over 72 bytes long. The remaining 98% of allocations fall
into the following 3 buckets, calculated across 9,500,000,000
allocation calls:
[ 1, 32]: 17% of all allocations
[33, 48]: 27% of all allocations
[49, 72]: 57% of all allocations
This commit should result in improved performance for servers
that have JSON-heavy workloads, typically those servicing RPC
and WebSocket workloads, and less memory fragmentation.
The primary motivation of this commit was simplification, although
it may not always be successful at that.
The serialization and deserialization interfaces are split up into
separate classes, hopefully, providing interfaces that are leaner,
cleaner, and less error-prone.
The deserializer is now effectively zero-copy, even for types like
variable-length blobs.
The serializer now comes in two flavors: one that includes a large
stack-based buffer built into the serializer, and a second variant
that writes directly to an external buffer. If used properly, this
can avoid unnecessary copying and memory allocations.
Lastly, multiple ST* constructors have been consolidated, allowing
for more uniform and readable construction of objects, simplifying
code, reducing duplication and (hopefully) avoiding complexity and
unnecessary copying of data.
The primary change introduced in this commit is partitioning of
the `TaggedCache`, with each partition being indepedent of each
other, making it possible to potentially perform multiple cache
operations in parallel. In particular, the `sweep` operation is
now parallelized by default on systems with at least four cores
present.
The `TaggedCache` could also be instantiated in 'key-only' mode
which complicated the interface significantly but was only used
by a single consumer (the `FullBelowCache`). This commit splits
the 'key-only' functionality of `TaggedCache`, and incorporates
directly into `FullBelowCache`, resulting in simple and cleaner
interfaces for both `TaggedCache` and `FullBelowCache` but at a
cost: some code duplication.
Lastly, this commit includes a medley of changes, including the
restructuring of `Transaction`, reducing its size by 48 bytes.
Even with TLS encrypted connections, it is possible for a determined
attacker to mount certain types of relatively easy man-in-the-middle
attacks which, if successful, could allow an attacker to tamper with
messages exchanged between endpoints.
The risk can be mitigated if each side has a certificate issued by a
CA that the other side trusts. In the context of a decentralized and
permissionless network, this is neither reasonable nor desirable.
To prevent this problem all we need is to allow the two endpoints, A
and B, to be able to independently verify that they are connected to
each other over a single end-to-end TLS session, instead of separate
TLS sessions which the attacker bridges.
The protocol level handshake implements this security check by using
digital signatures: each endpoint derives a fingerprint from the TLS
session, which it signs with the private key associated with its own
node identity. This strongly binds the TLS session to the identities
of the two endpoints of the session.
This commit introduces a new fingerprint derivation that uses modern
and standardized TLS exporter functionality, instead of the existing
derivation whch uses OpenSSL APIs that are non-standard, and derives
different "incoming" and "outgoing" security cookies.
Lastly, this commit refines the "self-connection" check to allow for
the detection of accidental instances of node identity sharing. This
check was first introduced with #4195 but was partially reverted due
to a bug with #4438. By using distinct security cookies for incoming
and outgoing connections, an attacker is no longer able to claim the
identity of its peer by echoing its security cookie.
The change is backwards compatible and servers with this commit will
still generate and verify old-style fingerprints, in addition to the
new style fingerprints.
For a fuller discussion on this topic, please see:
https://github.com/openssl/openssl/issues/5509https://github.com/ripple/rippled/issues/2413
This commit was previously introduced as #3929, which was closed. If
merged, it also fixes#2413 (which had been closed as a 'WONTFIX').
The `Ledger` class contains two `SHAMap` instances: the state and
transaction maps. Previously, the maps were dynamically allocated using
`std::make_shared` despite the fact that they did not require lifetime
management separate from the lifetime of the `Ledger` instance to which
they belong.
The two `SHAMap` instances are now regular member variables. Some smart
pointers and dynamic memory allocation was avoided by using stack-based
alternatives.
Commit 3 of 3 in #4218.
The existing slab allocator has significant performance advantages over
normal dynamic memory allocation codepaths (e.g. via `new` or `malloc`)
but sacrifices the ability to release memory back to the system.
As a result, otherwise transient spikes in memory usage might result in
increased memory usage for the lifetime of the process.
This commit retains the lock-free management of individual slabs, while
also making it possible to reclamation memory from slabs that are empty
and improving the slab selection strategy to favor fuller slabs.
The commit also adjusts the sizes of slabs used to back `SHAMapItem` to
better match the characteristics of the XRP Ledger "mainnet."
The `SHAMapItem` class contains a variable-sized buffer that
holds the serialized data associated with a particular item
inside a `SHAMap`.
Prior to this commit, the buffer for the serialized data was
allocated separately. Coupled with the fact that most instances
of `SHAMapItem` were wrapped around a `std::shared_ptr` meant
that an instantiation might result in up to three separate
memory allocations.
This commit switches away from `std::shared_ptr` for `SHAMapItem`
and uses `boost::intrusive_ptr` instead, allowing the reference
count for an instance to live inside the instance itself. Coupled
with using a slab-based allocator to optimize memory allocation
for the most commonly sized buffers, the net result is significant
memory savings. In testing, the reduction in memory usage hovers
between 400MB and 650MB. Other scenarios might result in larger
savings.
In performance testing with NFTs, this commit reduces memory size by
about 15% sustained over long duration.
Commit 2 of 3 in #4218.
When instantiating a large amount of fixed-sized objects on the heap
the overhead that dynamic memory allocation APIs impose will quickly
become significant.
In some cases, allocating a large amount of memory at once and using
a slabbing allocator to carve the large block into fixed-sized units
that are used to service requests for memory out will help to reduce
memory fragmentation significantly and, potentially, improve overall
performance.
This commit introduces a new `SlabAllocator<>` class that exposes an
API that is _similar_ to the C++ concept of an `Allocator` but it is
not meant to be a general-purpose allocator.
It should not be used unless profiling and analysis of specific memory
allocation patterns indicates that the additional complexity introduced
will improve the performance of the system overall, and subsequent
profiling proves it.
A helper class, `SlabAllocatorSet<>` simplifies handling of variably
sized objects that benefit from slab allocations.
This commit incorporates improvements suggested by Greg Popovitch
(@greg7mdp).
Commit 1 of 3 in #4218.
This refactor was primarily aimed at reducing the size of
objects derived from TimeoutCounter, by improving packing
of structures. Other potential improvements also surfaced
during this process and where implemented.
The existing code attempted to restrict the instantiation of `Coro`
only to a subset of helper functions, by using the `Coro_create_t`
helper structure. But the structure was public, which limited the
effectiveness of this method.
This commit uses a private type, fixing the issue.
This commit cleans up and modernizes the JobQueue but does not change
the queueing logic. It focuses on simplifying the code by eliminating
awkward code constructs, like "invalid jobs" and the need for default
constructors.
It leverages modern C++ to initialize tables and data structures at
compile time and replaces `std:map` instances with directly indexed
arrays.
Lastly, it restructures the load tracking infrastructure and reduces
the need for dynamic memory allocations by supporting move semantics
and value types.
The existing thread pool code uses several layers of indirection which
uses a custom lock-free stack, and offers functionality that supports
features that are never used (e.g. the ability to dynamically adjust
the number of threads in the pool).
This refactoring aims to simplify the code, making it easier to reason
about (although lock-free multi-threaded code is always tricky) what
is happening, and reduce the latency of the thread pool internals.