Compare commits

...

88 Commits

Author SHA1 Message Date
JCW
af315c0c0a Test blake3 2025-06-26 13:27:55 +01:00
JCW
ecfbe28837 Add blake 3 and skip some unit tests 2025-06-25 15:01:44 +01:00
JCW
033b8cc9e5 Fix a PR comment 2025-06-25 11:30:02 +01:00
JCW
5319edffb0 Fix comments 2025-06-25 11:28:04 +01:00
JCW
4cd5273b44 Add XRPL_ABANDON 2025-06-25 10:38:28 +01:00
Jingchen
e9d46f0bfc Remove OwnerPaysFee as it's never fully supported (#5435)
The OwnerPaysFee amendment was never fully supported, and this change removes the feature to the extent possible.
2025-06-24 18:56:58 +00:00
Bart
42fd74b77b Removes release notes from codebase (#5508) 2025-06-24 13:10:00 -04:00
tequ
c55ea56c5e Add nftoken_id, nftoken_ids, offer_id to meta for transaction stream (#5230) 2025-06-24 09:02:22 -04:00
Michael Legleux
1e01cd34f7 Set version to 2.5.0 2025-06-23 10:13:01 -07:00
Alex Kremer
e2fa5c1b7c chore: Change libXRPL check conan remote to dev (#5482)
This change aligns the Conan remote used by the libXRPL Clio compatibility check workflow with the recent changes applied to Clio.
2025-06-20 17:02:16 +00:00
Ed Hennis
fc0984d286 Require a message on "Application::signalStop" (#5255)
This change adds a message parameter to Application::signalStop for extra context.
2025-06-20 16:24:34 +00:00
Valentin Balaschenko
8b3dcd41f7 refactor: Change getNodeFat Missing Node State Tree error into warning (#5455) 2025-06-20 15:44:42 +00:00
Denis Angell
8f2f5310e2 Fix: Improve error handling in Batch RPC response (#5503) 2025-06-18 17:46:45 -04:00
Michael Legleux
edb4f0342c Set version to 2.5.0-rc2 2025-06-11 17:10:45 -07:00
yinyiqian1
ea17abb92a fix: Ensure delegate tests do not silently fail with batch (#5476)
The tests that ensure `tfInnerBatchTxn` won't block delegated transactions silently fail in `Delegate_test.cpp`. This change removes these cases from that file and adds them to `Batch_test.cpp` instead where they do not silently fail, because there the batch delegate results are explicitly checked. Moving them to that file further avoids refactoring many helper functions.
2025-06-11 13:21:24 +08:00
Mayukha Vadari
35a40a8e62 fix: Improve multi-sign usage of simulate (#5479)
This change allows users to submit simulate requests from a multi-sign account without needing to specify the accounts that are doing the multi-signing, and fixes an error with simulate that allowed double-"signed" (both single-sign and multi-sign public keys are provided) transactions.
2025-06-10 14:47:27 +08:00
Ed Hennis
d494bf45b2 refactor: Collapse some split log messages into one (#5347)
Multi-line log messages are hard to work with. Writing these handful of related messages as one message should make the log a tiny bit easier to manage.
2025-06-06 16:01:02 +00:00
Vlad
8bf4a5cbff chore: Remove external project build cores division (#5475)
The CMake statements that make it seem as if the number of cores used to build external project dependencies is halved don't actually do anything. This change removes these statements.
2025-06-05 13:37:30 +00:00
Denis Angell
58c2c82a30 fix: Amendment-guard TokenEscrow preclaim and expand tests (#5473)
This change amendment-guards the preclaim for `TokenEscrow`, as well as expands tests to increase code coverage.
2025-06-05 12:54:45 +00:00
Michael Legleux
11edaa441d Set version to 2.5.0-rc1 (#5472) 2025-06-04 17:55:23 +00:00
yinyiqian1
a5e953b191 fix: Add tecNO_DELEGATE_PERMISSION and fix flags (#5465)
* Adds `tecNO_DELEGATE_PERMISSION` for unauthorized transactions sent by a delegated account.
* Returns `tecNO_TARGET` instead of `terNO_ACCOUNT` for the `DelegateSet` transaction if the delegated account does not exist.
* Fixes `tfFullyCanonicalSig` and `tfInnerBatchTxn` blocking transactions issue by adding `tfUniversal` in the permission related masks in `txFlags.h`
2025-06-03 22:20:29 +00:00
Mark Travis
506ae12a8c Increase network i/o capacity (#5464)
The change increases the default network I/O worker thread pool size from 2 to 6. This will improve stability, as worker thread saturation correlates to desyncs, particularly on high-traffic peers, such as hubs.
2025-06-03 21:33:09 +00:00
Ayaz Salikhov
0310c5cbe0 fix: Specify transitive_headers when building with Conan 2 (#5462)
To be able to consume `rippled` in Conan 2, the recipe should specify transitive_headers for external libraries that are present in the exported header files. This change remains compatibility with Conan 1, where this flag was not present.
2025-06-03 17:33:32 +00:00
Denis Angell
053e1af7ff Add support for XLS-85 Token Escrow (#5185)
- Specification: https://github.com/XRPLF/XRPL-Standards/pull/272
- Amendment: `TokenEscrow`
- Enables escrowing of IOU and MPT tokens in addition to native XRP.
- Allows accounts to lock issued tokens (IOU/MPT) in escrow objects, with support for freeze, authorization, and transfer rates.
- Adds new ledger fields (`sfLockedAmount`, `sfIssuerNode`, etc.) to track locked balances for IOU and MPT escrows.
- Updates EscrowCreate, EscrowFinish, and EscrowCancel transaction logic to support IOU and MPT assets, including proper handling of trustlines and MPT authorization, transfer rates, and locked balances.
- Enforces invariant checks for escrowed IOU/MPT amounts.
- Extends GatewayBalances RPC to report locked (escrowed) balances.
2025-06-03 12:51:55 -04:00
Vlad
7e24adbdd0 fix: Address NFT interactions with trustlines (#5297)
The changes are focused on fixing NFT transactions bypassing the trustline authorization requirement and potential invariant violation when interacting with deep frozen trustlines.
2025-06-02 16:13:20 +00:00
Gregory Tsipenyuk
621df422a7 fix: Add AMMv1_3 amendment (#5203)
* Add AMM bid/create/deposit/swap/withdraw/vote invariants:
  - Deposit, Withdrawal invariants: `sqrt(asset1Balance * asset2Balance) >= LPTokens`.
  - Bid: `sqrt(asset1Balance * asset2Balance) > LPTokens` and the pool balances don't change.
  - Create: `sqrt(asset1Balance * assetBalance2) == LPTokens`.
  - Swap: `asset1BalanceAfter * asset2BalanceAfter >= asset1BalanceBefore * asset2BalanceBefore`
     and `LPTokens` don't change.
  - Vote: `LPTokens` and pool balances don't change.
  - All AMM and swap transactions: amounts and tokens are greater than zero, except on withdrawal if all tokens
    are withdrawn.
* Add AMM deposit and withdraw rounding to ensure AMM invariant:
  - On deposit, tokens out are rounded downward and deposit amount is rounded upward.
  - On withdrawal, tokens in are rounded upward and withdrawal amount is rounded downward.
* Add Order Book Offer invariant to verify consumed amounts. Consumed amounts are less than the offer.
* Fix Bid validation. `AuthAccount` can't have duplicate accounts or the submitter account.
2025-06-02 09:52:10 -04:00
Shawn Xie
0a34b5c691 Add support for XLS-81 Permissioned DEX (#5404)
Modified transactions:
- OfferCreate
- Payment

Modified RPCs:
- book_changes
- subscribe
- book_offers
- ripple_path_find
- path_find

Spec: https://github.com/XRPLF/XRPL-Standards/pull/281
2025-05-30 13:24:48 -04:00
Matt Mankins
e0bc3dd51f docs: update example keyserver host in SECURITY.md (#5460) 2025-05-30 08:46:08 -04:00
Bronek Kozicki
dacecd24ba Fix unit build error (#5459)
This change fixes the issue that there is a `using namespace` statement inside a namespace scope.
2025-05-29 20:53:31 +00:00
Mayukha Vadari
05105743e9 chore[tests]: improve env.meta usage (#5457)
This commit changes the ledger close in env.meta to be conditional on if it hasn't already been closed (i.e. the current ledger doesn't have any transactions in it). This change will make it a bit easier to use, as it will still work if you close the ledger outside of this usage. Previously, if you accidentally closed the ledger outside of the meta function, it would segfault and it was incredibly difficult to debug.
2025-05-29 16:28:09 +00:00
Bronek Kozicki
9e1fe9a85e Fix: Improve handling of expired credentials in VaultDeposit (#5452)
This change returns `tecEXPIRED` from VaultDeposit to allow the Transactor to remove the expired credentials.
2025-05-28 10:28:18 -04:00
Vito Tumas
d71ce51901 feat: improve squelching configuration (#5438)
This commit introduces the following changes:
* Renames `vp_enable config` option to `vp_base_squelch_enable` to enable squelching for validators.
* Removes `vp_squelch` config option which was used to configure whether to send squelch messages to peers or not. With this flag removed, if squelching is enabled, squelch messages will be sent. This was an option used for debugging.
* Introduces a temporary `vp_base_squelch_max_trusted_peers` config option to change the max number of peers who are selected as validator message sources. This is a temporary option, which will be removed once a good value is found.
* Adds a traffic counter to count the number of times peers ignored squelch messages and kept sending messages for squelched validators.
* Moves the decision whether squelching is enabled and ready into Slot.h.
2025-05-28 06:30:03 -04:00
Michael Legleux
be668ee26d chore: Update CPP ref source (#5453) 2025-05-27 20:46:25 +00:00
Bart
cae5294b4e chore: Rename docs job (#5398) 2025-05-27 20:03:23 +00:00
Elliot.
cd777f79ef docs: add -j $(nproc) to BUILD.md (#5288)
This improves build times.
2025-05-27 19:11:13 +00:00
Valentin Balaschenko
8b9e21e3f5 docs: Update build instructions for Ubuntu 22.04+ (#5292) 2025-05-27 18:32:25 +00:00
Denis Angell
2a61aee562 Add Batch feature (XLS-56) (#5060)
- Specification: [XRPLF/XRPL-Standards 56](https://github.com/XRPLF/XRPL-Standards/blob/master/XLS-0056d-batch/README.md)
- Amendment: `Batch`
- Implements execution of multiple transactions within a single batch transaction with four execution modes: `tfAllOrNothing`, `tfOnlyOne`, `tfUntilFailure`, and `tfIndependent`.
- Enables atomic multi-party transactions where multiple accounts can participate in a single batch, with up to 8 inner transactions and 8 batch signers per batch transaction.
- Inner transactions use `tfInnerBatchTxn` flag with zero fees, no signature, and empty signing public key.
- Inner transactions are applied after the outer batch succeeds via the `applyBatchTransactions` function in apply.cpp.
- Network layer prevents relay of transactions with `tfInnerBatchTxn` flag - each peer applies inner transactions locally from the batch.
- Batch transactions are excluded from AccountDelegate permissions but inner transactions retain full delegation support.
- Metadata includes `ParentBatchID` linking inner transactions to their containing batch for traceability and auditing.
- Extended STTx with batch-specific signature verification methods and added protocol structures (`sfRawTransactions`, `sfBatchSigners`).
2025-05-23 19:53:53 +00:00
Bronek Kozicki
40ce8a8833 fix: Fix pseudo-account ID calculation (#5447)
Before #5224, the pseudoaccount ID was calculated using prefix expressed in `std::uint16_t`. The refactoring to move the pseudoaccount ID calculation to View.cpp had accidentally changed the prefix type to `int` (derived from `auto i = 0`) which in turn changed the length of the input to `sha512Half` from 2 bytes to 4, altering the result.

This resulted in a different ID of the pseudoaccount calculated from the function after the refactoring, breaking the ledger. This impacts AMMCreate, even when the `SingleAssetVault` amendment is not active. This change restores the prefix type to `std::uint16_t`.
2025-05-23 14:05:36 +00:00
Bronek Kozicki
7713ff8c5c Add codecov badge, raise .codecov.yml thresholds (#5428) 2025-05-22 14:43:41 +00:00
Olek
70371a4344 Fix initializer list initialization for GCC-15 (#5443) 2025-05-21 13:28:18 -04:00
Bronek Kozicki
e514de76ed Add single asset vault (XLS-65d) (#5224)
- Specification: XRPLF/XRPL-Standards#239
- Amendment: `SingleAssetVault`
- Implements a vault feature used to store a fungible asset (XRP, IOU, or MPT, but not NFT) and to receive shares in the vault (an MPT) in exchange.
- A vault can be private or public.
- A private vault can use permissioned domains, subject to the `PermissionedDomains` amendment.
- Shares can be exchanged back into asset with `VaultWithdraw`.
- Permissions on the asset in the vault are transitively applied on shares in the vault.
- Issuer of the asset in the vault can clawback with `VaultClawback`.
- Extended `MPTokenIssuance` with `DomainID`, used by the permissioned domain on the vault shares.

Co-authored-by: John Freeman <jfreeman08@gmail.com>
2025-05-20 14:06:41 -04:00
Bart
dd62cfcc22 fix: Update path in CODEOWNERS (#5440) 2025-05-20 15:24:07 +00:00
Michael Legleux
09690f1b38 Set version to 2.5.0-b1 2025-05-18 20:39:18 +01:00
Valentin Balaschenko
380ba9f1c1 Fix: Resolve slow test on macOS pipeline (#5392)
Using std::barrier performs extremely poorly (~1 hour vs ~1 minute to run the test suite) in certain macOS environments.
To unblock our macOS CI pipeline, std::barrier has been replaced with a custom mutex-based barrier (Barrier) that significantly improves performance without compromising correctness.
2025-05-16 10:31:51 +00:00
brettmollin
c3e9380fb4 fix: Update validators-example.txt fix xrplf example URL (#5384) 2025-05-16 09:49:14 +00:00
Jingchen
e3ebc253fa fix: Ensure that coverage file generation is atomic. (#5426)
Running unit tests in parallel and multiple threads can write into one file can corrupt output files, and then gcovr won't be able to parse the corrupted file. This change adds -fprofile-update=atomic as instructed by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68080.
2025-05-12 14:54:01 +00:00
Bart
c6c7c84355 Configure CODEOWNERS for changes to RPC code (#5266)
To ensure changes to any RPC-related code are compatible with other services, such as Clio, the RPC team will be required to review them.
2025-05-12 12:42:03 +00:00
yinyiqian1
28f50cb7cf fix: enable LedgerStateFix for delegation (#5427) 2025-05-10 10:36:11 -04:00
Vito Tumas
3e152fec74 refactor: use east const convention (#5409)
This change refactors the codebase to use the "east const convention", and adds a clang-format rule to follow this convention.
2025-05-08 11:00:42 +00:00
yinyiqian1
2db2791805 Add PermissionDelegation feature (#5354)
This change implements the account permission delegation described in XLS-75d, see https://github.com/XRPLF/XRPL-Standards/pull/257.

* Introduces transaction-level and granular permissions that can be delegated to other accounts.
* Adds `DelegateSet` transaction to grant specified permissions to another account.
* Adds `ltDelegate` ledger object to maintain the permission list for delegating/delegated account pair.
* Adds an optional `Delegate` field in common fields, allowing a delegated account to send transactions on behalf of the delegating account within the granted permission scope. The `Account` field remains the delegating account; the `Delegate` field specifies the delegated account. The transaction is signed by the delegated account.
2025-05-08 06:14:02 -04:00
Vito Tumas
9ec2d7f8ff Enable passive squelching (#5358)
This change updates the squelching logic to accept squelch messages for untrusted validators. As a result, servers will also squelch untrusted validator messages reducing duplicate traffic they generate.

In particular:
* Updates squelch message handling logic to squelch messages for all validators, not only trusted ones.
* Updates the logic to send squelch messages to peers that don't squelch themselves
* Increases the threshold for the number of messages that a peer has to deliver to consider it as a candidate for validator messages.
2025-05-02 11:01:45 -04:00
Ed Hennis
4a084ce34c Improve transaction relay logic (#4985)
Combines four related changes:
1. "Decrease `shouldRelay` limit to 30s." Pretty self-explanatory. Currently, the limit is 5 minutes, by which point the `HashRouter` entry could have expired, making this transaction look brand new (and thus causing it to be relayed back to peers which have sent it to us recently).
2.  "Give a transaction more chances to be retried." Will put a transaction into `LedgerMaster`'s held transactions if the transaction gets a `ter`, `tel`, or `tef` result. Old behavior was just `ter`.
     * Additionally, to prevent a transaction from being repeatedly held indefinitely, it must meet some extra conditions. (Documented in a comment in the code.)
3. "Pop all transactions with sequential sequences, or tickets." When a transaction is processed successfully, currently, one held transaction for the same account (if any) will be popped out of the held transactions list, and queued up for the next transaction batch. This change pops all transactions for the account, but only if they have sequential sequences (for non-ticket transactions) or use a ticket. This issue was identified from interactions with @mtrippled's #4504, which was merged, but unfortunately reverted later by #4852. When the batches were spaced out, it could potentially take a very long time for a large number of held transactions for an account to get processed through. However, whether batched or not, this change will help get held transactions cleared out, particularly if a missing earlier transaction is what held them up.
4. "Process held transactions through existing NetworkOPs batching." In the current processing, at the end of each consensus round, all held transactions are directly applied to the open ledger, then the held list is reset. This bypasses all of the logic in `NetworkOPs::apply` which, among other things, broadcasts successful transactions to peers. This means that the transaction may not get broadcast to peers for a really long time (5 minutes in the current implementation, or 30 seconds with this first commit). If the node is a bottleneck (either due to network configuration, or because the transaction was submitted locally), the transaction may not be seen by any other nodes or validators before it expires or causes other problems.
2025-05-01 13:58:18 -04:00
Vito Tumas
3502df2174 fix: Replaces random endpoint resolution with sequential (#5365)
This change addresses an issue where `rippled` attempts to connect to an IPv6 address, even when the local network lacks IPv6 support, resulting in a "Network is unreachable" error.

The fix replaces the custom endpoint selection logic with `boost::async_connect`, which sequentially attempts to connect to available endpoints until one succeeds or all fail.
2025-04-28 15:38:55 -04:00
Vlad
fa1e25abef chore: Small clarification to lsfDefaultRipple comment (#5410) 2025-04-25 15:21:27 +00:00
Denis Angell
217ba8dd4d fix: CTID to use correct ledger_index (#5408) 2025-04-24 10:24:10 -04:00
Ed Hennis
405f4613d8 chore: Run CI on PRs that are Ready or have the "DraftRunCI" label (#5400)
- Avoids costly overhead for idle PRs where the CI results don't add any
  value.
2025-04-11 22:20:59 +00:00
Mayukha Vadari
cba512068b refactor: Clean up test logging to make it easier to search (#5396)
This PR replaces the word `failed` with `failure` in any test names and renames some test files to fix MSVC warnings, so that it is easier to search through the test output to find tests that failed.
2025-04-11 09:07:42 +00:00
Valentin Balaschenko
1c99ea23d1 Temporary disable automatic triggering macOS pipeline (#5397)
We temporarily disable running unit tests on macOS on the CI pipeline while we are investigating the delays.
2025-04-10 21:58:29 +02:00
Denis Angell
c4308b216f fix: Adds CTID to RPC tx and updates error (#4738)
This change fixes a number of issues involved with CTID:
* CTID is not present on all RPC tx transactions.
* rpcWRONG_NETWORK is missing in the ErrorCodes.cpp
2025-04-10 12:38:52 +00:00
Wietse Wind
aafd2d8525 Fix: admin RPC webhook queue limit removal and timeout reduction (#5163)
When using subscribe at admin RPC port to send webhooks for the transaction stream to a backend, on large(r) ledgers the endpoint receives fewer HTTP POSTs with TX information than the amount of transactions in a ledger. This change removes the hardcoded queue length to avoid dropping TX notifications for the admin-only command. In addition, the per-request TTL for outgoing RPC HTTP calls has been reduced from 10 minutes to 30 seconds.
2025-04-10 06:37:24 +00:00
Denis Angell
a574ec6023 fix: fixPayChanV1 (#4717)
This change introduces a new fix amendment (`fixPayChanV1`) that prevents the creation of new `PaymentChannelCreate` transaction with a `CancelAfter` time less than the current ledger time. It piggy backs off of fix1571.

Once the amendment is activated, creating a new `PaymentChannel` will require that if you specify the `CancelAfter` time/value, that value must be greater than or equal to the current ledger time.

Currently users can create a payment channel where the `CancelAfter` time is before the current ledger time. This results in the payment channel being immediately closed on the next PaymentChannel transaction.
2025-04-09 22:08:44 +00:00
Mayukha Vadari
e429455f4d refactor(trivial): reorganize ledger entry tests and helper functions (#5376)
This PR splits out `ledger_entry` tests into its own file (`LedgerEntry_test.cpp`) and alphabetizes the helper functions in `LedgerEntry.cpp`. These commits were split out of #5237 to make that PR a little more manageable, since these basic trivial changes are most of the diff. There is no code change, just moving code around.
2025-04-09 17:02:03 +00:00
Vito Tumas
7692eeb9a0 Instrument proposal, validation and transaction messages (#5348)
Adds metric counters for the following P2P message types:

* Untrusted proposal and validation messages
* Duplicate proposal, validation and transaction messages
2025-04-09 15:33:17 +02:00
Bronek Kozicki
a099f5a804 Remove UNREACHABLE from NetworkOPsImp::processTrustedProposal (#5387)
It’s possible for this to happen legitimately if a set of peers, including a validator, are connected in a cycle, and the latency and message processing time between those peers is significantly less than the latency between the validator and the last peer. It’s unlikely in the real world, but obviously easy to simulate with Antithesis.
2025-04-08 14:43:34 +00:00
Michael Legleux
ca0bc767fe fix: Use the build image from ghcr.io (#5390)
The ci pipelines are constantly hitting Docker Hub's public rate limiting since increasing the number of jobs we're running. This change switches over to images hosted in GitHub's registry.
2025-04-05 02:24:31 +00:00
Mayukha Vadari
4ba9288935 fix: disable channel_authorize when signing_support is disabled (#5385) 2025-04-05 01:08:34 +00:00
Valentin Balaschenko
e923ec6d36 Fix to correct memory ordering for compare_exchange_weak and wait in the intrusive reference counting logic (#5381)
This change addresses a memory ordering assertion failure observed on one of the Windows test machines during the IntrusiveShared_test suite.
2025-04-04 18:21:17 +00:00
Vlad
851d99d99e fix: uint128 ambiguousness breaking macos unity build (#5386) 2025-04-04 08:28:33 -04:00
Bart
f608e653ca Fix undefined uint128_t type on Windows non-unity builds (#5377)
As part of import optimization, a transitive include had been removed that defined `BOOST_COMP_MSVC` on Windows. In unity builds, this definition was pulled in, but in non-unity builds it was not - causing a compilation error. An inspection of the Boost code revealed that we can just gate the statements by `_MS_VER` instead. A `#pragma message` is added to verify that the statement is only printed on Windows builds.
2025-04-01 11:21:59 -04:00
Vlad
72e076b694 test: enable compile time param to change reference fee value (#5159)
Adds an extra CI pipeline to perform unit tests using different values for fees.
2025-03-27 23:40:36 +00:00
Bart
6cf37c4abe refactor: Move integration tests from 'examples/' into 'tests/' (#5367)
This change moves `examples/example` into `tests/conan` to make it clear it is an integration test, and adjusts the `conan` CI job accordingly
2025-03-27 14:49:09 +00:00
Valentin Balaschenko
fc204773d6 Intrusive SHAMap smart pointers for efficient memory use and lock-free synchronization (#5152)
The main goal of this optimisation is memory reduction in SHAMapTreeNodes by introducing intrusive pointers instead of standard std::shared_ptr and std::weak_ptr.
2025-03-25 18:40:25 +00:00
Vlad
2bc5cb240f test: enable unit tests to work with variable reference fee (#5145)
Fix remaining unit tests to be able to process reference fee values other than 10.
2025-03-25 10:31:25 -04:00
Vlad
67028d6ea6 test: enable TxQ unit tests work with variable reference fee (#5118)
In preparation for a potential reference fee change we would like to verify that fee change works as expected. The first step is to fix all unit tests to be able to work with different reference fee values.
2025-03-24 14:56:19 -04:00
Ed Hennis
d22a5057b9 Prevent consensus from getting stuck in the establish phase (#5277)
- Detects if the consensus process is "stalled". If it is, then we can declare a 
  consensus and end successfully even if we do not have 80% agreement on
  our proposal.
  - "Stalled" is defined as:
    - We have a close time consensus
    - Each disputed transaction is individually stalled:
      - It has been in the final "stuck" 95% requirement for at least 2
        (avMIN_ROUNDS) "inner rounds" of phaseEstablish,
      - and either all of the other trusted proposers or this validator, if proposing,
        have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
        rounds", and at least 80% of the validators (including this one, if
        appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
  consensus establish phase's time, then consensus is considered "expired",
  and we will leave the round, which sends a partial validation (indicating
  that the node is moving on without validating). Two restrictions avoid
  prematurely exiting, or having an extended exit in extreme situations.
  - The 10x time is clamped to be within a range of 15s
    (ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
  - If consensus has not had an opportunity to walk through all avalanche
    states (defined as not going through 8 "inner rounds" of phaseEstablish),
    then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
  fallen behind, and move on, too, generally before hitting the timeout. Any
  validations or partial validations sent during this time will help the
  consensus process bring the nodes back together.
2025-03-20 12:41:44 -04:00
Alex Kremer
75a20194c5 chore: Update link to ripple-binary-codec (#5355)
The link to ripple-binary-codec's definitions.json appears to be outdated. The updated link is also documented here: https://xrpl.org/docs/references/protocol/binary-format#definitions-file
2025-03-19 17:33:23 -04:00
Alex Kremer
7fe81fe62e chore: Add PR number to payload (#5310)
This PR adds one more payload field to the libXRPL compatibility check workflow - the PR number itself.
2025-03-18 17:26:08 +00:00
Bronek Kozicki
345ddc7234 fix: Remove null pointer deref, just do abort (#5338)
This change removes the existing undefined behavior from `LogicError`, so we can be certain that there will be always a stacktrace.

De-referencing a null pointer is an old trick to generate `SIGSEGV`, which would typically also create a stacktrace. However it is also an undefined behaviour and compilers can do something else. A more robust way to create a stacktrace while crashing the program is to use `std::abort`, which we have also used in this location for a long time. If we combine the two, we might not get the expected behaviour - namely, the nullpointer deref followed by `std::abort`, as handled in certain compiler versions may not immediately cause a crash. We have observed stacktrace being wiped instead, and thread put in indeterminate state, then stacktrace created without any useful information.
2025-03-18 12:45:25 -04:00
Bart
d167d4864f refactor: Updates Conan dependencies: RocksDB (#5335)
Updates RocksDB to version 9.7.3, the latest version supported in Conan 1.x. A patch for 9.7.4 that fixes a memory leak is included.
2025-03-18 11:25:48 -04:00
Vlad
bf504912a4 fix: trust line RPC no ripple flag (#5345)
The Trustline RPC `no_ripple` flag gets set depending on `lsfDefaultRipple` flag, which is not a flag of a trustline but of the account root. The `lsfDefaultRipple` flag does not provide any insight if this particular trust line has `lsfLowNoRipple` or `lsfHighNoRipple` flag set, so it should not be used here at all. This change simplifies the logic.
2025-03-18 09:03:03 -04:00
cyan317
a7fb8ae915 fix: Handle invalid marker parameter in grpc call (#5317)
The `end_marker` is used to limit the range of ledger entries to fetch. If `end_marker` is less than `marker`, a crash can occur. This change adds an additional check.
2025-03-18 08:21:33 -04:00
Sergey Kuznetsov
d9b7a2688f fix: Error message for ledger_entry rpc (#5344)
Changes the error to `malformedAddress` for `permissioned_domain` in the `ledger_entry` rpc, when the account is not a string. This change makes it more clear to a user what is wrong with their request.
2025-03-17 09:14:49 -04:00
Darius Tumas
c0299dba88 Adds hub.xrpl-commons.org as a new Bootstrap Cluster (#5263) 2025-03-17 07:04:46 -04:00
Bronek Kozicki
c3ecdb4746 Rename "deadlock" to "stall" in LoadManager (#5341)
What the LoadManager class does is stall detection, which is not the same as deadlock detection. In the condition of severe CPU starvation, LoadManager will currently intentionally crash rippled reporting `LogicError: Deadlock detected`. This error message is misleading as the condition being detected is not a deadlock. This change fixes and refactors the code in response.
2025-03-14 16:15:09 -04:00
Ed Hennis
c17676a9be refactor: Improve ordering of headers with clang-format (#5343)
Removes all manual header groupings from source and header files by leveraging clang-format options.
2025-03-12 18:33:21 -04:00
Ed Hennis
ed8e32cc92 refactor: Calculate numFeatures automatically (#5324)
Requiring manual updates of numFeatures is an annoying manual process that is easily forgotten, and leads to frequent merge conflicts. This change takes advantage of the `XRPL_FEATURE` and `XRPL_FIX` macros, and adds a new `XRPL_RETIRE` macro to automatically set `numFeatures`.
2025-03-12 17:34:06 -04:00
Bart
2406b28e64 refactor: Remove unused and add missing includes (#5293)
The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms.
2025-03-11 14:16:45 -04:00
Michael Legleux
2216e5a13f Set version to 2.4.0 2025-03-06 10:41:58 -08:00
1173 changed files with 84035 additions and 15293 deletions

View File

@@ -44,6 +44,7 @@ DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
ForEachMacros: [ Q_FOREACH, BOOST_FOREACH ]
IncludeBlocks: Regroup
IncludeCategories:
- Regex: '^<(test)/'
Priority: 0
@@ -53,8 +54,12 @@ IncludeCategories:
Priority: 2
- Regex: '^<(boost)/'
Priority: 3
- Regex: '.*'
- Regex: '^.*/'
Priority: 4
- Regex: '^.*\.h'
Priority: 5
- Regex: '.*'
Priority: 6
IncludeIsMainRegex: '$'
IndentCaseLabels: true
IndentFunctionDeclarationAfterType: false
@@ -89,3 +94,4 @@ SpacesInSquareBrackets: false
Standard: Cpp11
TabWidth: 8
UseTab: Never
QualifierAlignment: Right

View File

@@ -7,13 +7,13 @@ comment:
show_carryforward_flags: false
coverage:
range: "60..80"
range: "70..85"
precision: 1
round: nearest
status:
project:
default:
target: 60%
target: 75%
threshold: 2%
patch:
default:

8
.github/CODEOWNERS vendored Normal file
View File

@@ -0,0 +1,8 @@
# Allow anyone to review any change by default.
*
# Require the rpc-reviewers team to review changes to the rpc code.
include/xrpl/protocol/ @xrplf/rpc-reviewers
src/libxrpl/protocol/ @xrplf/rpc-reviewers
src/xrpld/rpc/ @xrplf/rpc-reviewers
src/xrpld/app/misc/ @xrplf/rpc-reviewers

View File

@@ -14,7 +14,7 @@ runs:
run: |
conan config set general.revisions_enabled=1
conan export external/snappy snappy/1.1.10@
conan export external/rocksdb rocksdb/6.29.5@
conan export external/rocksdb rocksdb/9.7.3@
conan export external/soci soci/4.0.3@
conan export external/nudb nudb/2.0.8@
- name: add Ripple Conan remote
@@ -55,7 +55,3 @@ runs:
--options xrpld=True \
--settings build_type=${{ inputs.configuration }} \
..
- name: upload dependencies to remote
if: (steps.binaries.outputs.missing != '[]') && (steps.remote.outputs.outcome == 'success')
shell: bash
run: conan upload --remote ripple '*' --all --parallel --confirm

View File

@@ -1,9 +1,13 @@
name: clang-format
on: [push, pull_request]
on:
push:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
jobs:
check:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
runs-on: ubuntu-24.04
env:
CLANG_VERSION: 18
@@ -20,7 +24,7 @@ jobs:
sudo apt-get update
sudo apt-get install clang-format-${CLANG_VERSION}
- name: Format first-party sources
run: find include src -type f \( -name '*.cpp' -o -name '*.hpp' -o -name '*.h' -o -name '*.ipp' \) -exec clang-format-${CLANG_VERSION} -i {} +
run: find include src tests -type f \( -name '*.cpp' -o -name '*.hpp' -o -name '*.h' -o -name '*.ipp' \) -exec clang-format-${CLANG_VERSION} -i {} +
- name: Check for differences
id: assert
run: |

View File

@@ -10,11 +10,11 @@ concurrency:
cancel-in-progress: true
jobs:
job:
documentation:
runs-on: ubuntu-latest
permissions:
contents: write
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
steps:
- name: checkout
uses: actions/checkout@v4

View File

@@ -1,9 +1,13 @@
name: levelization
on: [push, pull_request]
on:
push:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
jobs:
check:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
runs-on: ubuntu-latest
env:
CLANG_VERSION: 10

View File

@@ -1,6 +1,6 @@
name: Check libXRPL compatibility with Clio
env:
CONAN_URL: http://18.143.149.228:8081/artifactory/api/conan/conan-non-prod
CONAN_URL: http://18.143.149.228:8081/artifactory/api/conan/dev
CONAN_LOGIN_USERNAME_RIPPLE: ${{ secrets.CONAN_USERNAME }}
CONAN_PASSWORD_RIPPLE: ${{ secrets.CONAN_TOKEN }}
on:
@@ -8,19 +8,21 @@ on:
paths:
- 'src/libxrpl/protocol/BuildInfo.cpp'
- '.github/workflows/libxrpl.yml'
types: [opened, reopened, synchronize, ready_for_review]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
publish:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
name: Publish libXRPL
outputs:
outcome: ${{ steps.upload.outputs.outcome }}
version: ${{ steps.version.outputs.version }}
channel: ${{ steps.channel.outputs.channel }}
runs-on: [self-hosted, heavy]
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
steps:
- name: Wait for essential checks to succeed
uses: lewagon/wait-on-check-action@v1.3.4
@@ -85,4 +87,5 @@ jobs:
run: |
gh api --method POST -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" \
/repos/xrplf/clio/dispatches -f "event_type=check_libxrpl" \
-F "client_payload[version]=${{ needs.publish.outputs.version }}@${{ needs.publish.outputs.channel }}"
-F "client_payload[version]=${{ needs.publish.outputs.version }}@${{ needs.publish.outputs.channel }}" \
-F "client_payload[pr]=${{ github.event.pull_request.number }}"

View File

@@ -1,6 +1,7 @@
name: macos
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
push:
# If the branches list is ever changed, be sure to change it on all
# build/test jobs (nix, macos, windows, instrumentation)
@@ -18,6 +19,7 @@ concurrency:
jobs:
test:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
strategy:
matrix:
platform:
@@ -69,11 +71,13 @@ jobs:
nproc --version
echo -n "nproc returns: "
nproc
system_profiler SPHardwareDataType
sysctl -n hw.logicalcpu
clang --version
- name: configure Conan
run : |
conan profile new default --detect || true
conan profile update settings.compiler.cppstd=20 default
conan profile update 'conf.tools.build:cxxflags+=["-DBOOST_ASIO_DISABLE_CONCEPTS"]' default
- name: build dependencies
uses: ./.github/actions/dependencies
env:

View File

@@ -1,6 +1,7 @@
name: nix
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
push:
# If the branches list is ever changed, be sure to change it on all
# build/test jobs (nix, macos, windows)
@@ -39,6 +40,7 @@ concurrency:
jobs:
dependencies:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
strategy:
fail-fast: false
matrix:
@@ -62,7 +64,7 @@ jobs:
cc: /usr/bin/clang-14
cxx: /usr/bin/clang++-14
runs-on: [self-hosted, heavy]
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
env:
build_dir: .build
steps:
@@ -124,7 +126,61 @@ jobs:
- "-Dunity=ON"
needs: dependencies
runs-on: [self-hosted, heavy]
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
env:
build_dir: .build
steps:
- name: upgrade conan
run: |
pip install --upgrade "conan<2"
- name: download cache
uses: actions/download-artifact@v4
with:
name: ${{ matrix.platform }}-${{ matrix.compiler }}-${{ matrix.configuration }}
- name: extract cache
run: |
mkdir -p ~/.conan
tar -xzf conan.tar -C ~/.conan
- name: check environment
run: |
env | sort
echo ${PATH} | tr ':' '\n'
conan --version
cmake --version
- name: checkout
uses: actions/checkout@v4
- name: dependencies
uses: ./.github/actions/dependencies
env:
CONAN_URL: http://18.143.149.228:8081/artifactory/api/conan/conan-non-prod
with:
configuration: ${{ matrix.configuration }}
- name: build
uses: ./.github/actions/build
with:
generator: Ninja
configuration: ${{ matrix.configuration }}
cmake-args: "-Dassert=TRUE -Dwerr=TRUE ${{ matrix.cmake-args }}"
- name: test
run: |
${build_dir}/rippled --unittest --unittest-jobs $(nproc)
reference-fee-test:
strategy:
fail-fast: false
matrix:
platform:
- linux
compiler:
- gcc
configuration:
- Debug
cmake-args:
- "-DUNIT_TEST_REFERENCE_FEE=200"
- "-DUNIT_TEST_REFERENCE_FEE=1000"
needs: dependencies
runs-on: [self-hosted, heavy]
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
env:
build_dir: .build
steps:
@@ -175,7 +231,7 @@ jobs:
- Debug
needs: dependencies
runs-on: [self-hosted, heavy]
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
env:
build_dir: .build
steps:
@@ -191,7 +247,7 @@ jobs:
mkdir -p ~/.conan
tar -xzf conan.tar -C ~/.conan
- name: install gcovr
run: pip install "gcovr>=7,<8"
run: pip install "gcovr>=7,<9"
- name: check environment
run: |
echo ${PATH} | tr ':' '\n'
@@ -249,7 +305,7 @@ jobs:
conan:
needs: dependencies
runs-on: [self-hosted, heavy]
container: rippleci/rippled-build-ubuntu:aaf5e3e
container: ghcr.io/xrplf/rippled-build-ubuntu:aaf5e3e
env:
build_dir: .build
configuration: Release
@@ -288,7 +344,7 @@ jobs:
echo "reference=${reference}" >> "${GITHUB_ENV}"
- name: build
run: |
cd examples/example
cd tests/conan
mkdir ${build_dir}
cd ${build_dir}
conan install .. --output-folder . \
@@ -304,6 +360,7 @@ jobs:
# later
instrumentation-build:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
env:
CLANG_RELEASE: 16
strategy:

View File

@@ -2,6 +2,7 @@ name: windows
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
push:
# If the branches list is ever changed, be sure to change it on all
# build/test jobs (nix, macos, windows, instrumentation)
@@ -21,6 +22,7 @@ concurrency:
jobs:
test:
if: ${{ github.event_name == 'push' || github.event.pull_request.draft != true || contains(github.event.pull_request.labels.*.name, 'DraftRunCI') }}
strategy:
fail-fast: false
matrix:

View File

@@ -83,9 +83,17 @@ The [commandline](https://xrpl.org/docs/references/http-websocket-apis/api-conve
The `network_id` field was added in the `server_info` response in version 1.5.0 (2019), but it is not returned in [reporting mode](https://xrpl.org/rippled-server-modes.html#reporting-mode). However, use of reporting mode is now discouraged, in favor of using [Clio](https://github.com/XRPLF/clio) instead.
## XRP Ledger server version 2.5.0
As of 2025-04-04, version 2.5.0 is in development. You can use a pre-release version by building from source or [using the `nightly` package](https://xrpl.org/docs/infrastructure/installation/install-rippled-on-ubuntu).
### Additions and bugfixes in 2.5.0
- `channel_authorize`: If `signing_support` is not enabled in the config, the RPC is disabled.
## XRP Ledger server version 2.4.0
As of 2025-01-28, version 2.4.0 is in development. You can use a pre-release version by building from source or [using the `nightly` package](https://xrpl.org/docs/infrastructure/installation/install-rippled-on-ubuntu).
[Version 2.4.0](https://github.com/XRPLF/rippled/releases/tag/2.4.0) was released on March 4, 2025.
### Additions and bugfixes in 2.4.0

View File

@@ -178,9 +178,9 @@ It does not override paths to dependencies when building with Visual Studio.
```
# Conan 1.x
conan export external/rocksdb rocksdb/6.29.5@
conan export external/rocksdb rocksdb/9.7.3@
# Conan 2.x
conan export --version 6.29.5 external/rocksdb
conan export --version 9.7.3 external/rocksdb
```
Export our [Conan recipe for SOCI](./external/soci).
@@ -288,7 +288,7 @@ It fixes some source files to add missing `#include`s.
Single-config generators:
```
cmake --build .
cmake --build . -j $(nproc)
```
Multi-config generators:

View File

@@ -14,10 +14,10 @@ Loop: xrpld.app xrpld.net
xrpld.app > xrpld.net
Loop: xrpld.app xrpld.overlay
xrpld.overlay == xrpld.app
xrpld.overlay > xrpld.app
Loop: xrpld.app xrpld.peerfinder
xrpld.app > xrpld.peerfinder
xrpld.peerfinder ~= xrpld.app
Loop: xrpld.app xrpld.rpc
xrpld.rpc > xrpld.app

View File

@@ -6,6 +6,7 @@ libxrpl.protocol > xrpl.basics
libxrpl.protocol > xrpl.json
libxrpl.protocol > xrpl.protocol
libxrpl.resource > xrpl.basics
libxrpl.resource > xrpl.json
libxrpl.resource > xrpl.resource
libxrpl.server > xrpl.basics
libxrpl.server > xrpl.json
@@ -42,6 +43,7 @@ test.consensus > xrpl.basics
test.consensus > xrpld.app
test.consensus > xrpld.consensus
test.consensus > xrpld.ledger
test.consensus > xrpl.json
test.core > test.jtx
test.core > test.toplevel
test.core > test.unit_test
@@ -58,7 +60,6 @@ test.json > test.jtx
test.json > xrpl.json
test.jtx > xrpl.basics
test.jtx > xrpld.app
test.jtx > xrpld.consensus
test.jtx > xrpld.core
test.jtx > xrpld.ledger
test.jtx > xrpld.net
@@ -158,7 +159,6 @@ xrpld.core > xrpl.basics
xrpld.core > xrpl.json
xrpld.core > xrpl.protocol
xrpld.ledger > xrpl.basics
xrpld.ledger > xrpld.core
xrpld.ledger > xrpl.json
xrpld.ledger > xrpl.protocol
xrpld.net > xrpl.basics
@@ -183,7 +183,6 @@ xrpld.peerfinder > xrpld.core
xrpld.peerfinder > xrpl.protocol
xrpld.perflog > xrpl.basics
xrpld.perflog > xrpl.json
xrpld.perflog > xrpl.protocol
xrpld.rpc > xrpl.basics
xrpld.rpc > xrpld.core
xrpld.rpc > xrpld.ledger

View File

@@ -16,6 +16,18 @@ set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU")
# GCC-specific fixes
add_compile_options(-Wno-unknown-pragmas -Wno-subobject-linkage)
# -Wno-subobject-linkage can be removed when we upgrade GCC version to at least 13.3
elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
# Clang-specific fixes
add_compile_options(-Wno-unknown-warning-option) # Ignore unknown warning options
elseif(MSVC)
# MSVC-specific fixes
add_compile_options(/wd4068) # Ignore unknown pragmas
endif()
# make GIT_COMMIT_HASH define available to all sources
find_package(Git)
if(Git_FOUND)
@@ -82,6 +94,7 @@ add_subdirectory(external/secp256k1)
add_library(secp256k1::secp256k1 ALIAS secp256k1)
add_subdirectory(external/ed25519-donna)
add_subdirectory(external/antithesis-sdk)
add_subdirectory(external/blake3)
find_package(gRPC REQUIRED)
find_package(lz4 REQUIRED)
# Target names with :: are not allowed in a generator expression.
@@ -112,6 +125,7 @@ target_link_libraries(ripple_libs INTERFACE
secp256k1::secp256k1
soci::soci
SQLite::SQLite3
blake3
)
# Work around changes to Conan recipe for now.

View File

@@ -1,3 +1,5 @@
[![codecov](https://codecov.io/gh/XRPLF/rippled/graph/badge.svg?token=WyFr5ajq3O)](https://codecov.io/gh/XRPLF/rippled)
# The XRP Ledger
The [XRP Ledger](https://xrpl.org/) is a decentralized cryptographic ledger powered by a network of peer-to-peer nodes. The XRP Ledger uses a novel Byzantine Fault Tolerant consensus algorithm to settle and record transactions in a secure distributed database without a central operator.

File diff suppressed because it is too large Load Diff

View File

@@ -83,7 +83,7 @@ To report a qualifying bug, please send a detailed report to:
|Long Key ID | `0xCD49A0AFC57929BE` |
|Fingerprint | `24E6 3B02 37E0 FA9C 5E96 8974 CD49 A0AF C579 29BE` |
The full PGP key for this address, which is also available on several key servers (e.g. on [keys.gnupg.net](https://keys.gnupg.net)), is:
The full PGP key for this address, which is also available on several key servers (e.g. on [keyserver.ubuntu.com](https://keyserver.ubuntu.com)), is:
```
-----BEGIN PGP PUBLIC KEY BLOCK-----
mQINBFUwGHYBEAC0wpGpBPkd8W1UdQjg9+cEFzeIEJRaoZoeuJD8mofwI5Ejnjdt

View File

@@ -420,6 +420,7 @@
# - r.ripple.com 51235
# - sahyadri.isrdc.in 51235
# - hubs.xrpkuwait.com 51235
# - hub.xrpl-commons.org 51235
#
# Examples:
#

View File

@@ -26,7 +26,7 @@
#
# Examples:
# https://vl.ripple.com
# https://vl.xrplf.org
# https://unl.xrplf.org
# http://127.0.0.1:8000
# file:///etc/opt/ripple/vl.txt
#

View File

@@ -98,6 +98,9 @@
# 2024-04-03, Bronek Kozicki
# - add support for output formats: jacoco, clover, lcov
#
# 2025-05-12, Jingchen Wu
# - add -fprofile-update=atomic to ensure atomic profile generation
#
# USAGE:
#
# 1. Copy this file into your cmake modules path.
@@ -200,15 +203,27 @@ set(COVERAGE_COMPILER_FLAGS "-g --coverage"
CACHE INTERNAL "")
if(CMAKE_CXX_COMPILER_ID MATCHES "(GNU|Clang)")
include(CheckCXXCompilerFlag)
include(CheckCCompilerFlag)
check_cxx_compiler_flag(-fprofile-abs-path HAVE_cxx_fprofile_abs_path)
if(HAVE_cxx_fprofile_abs_path)
set(COVERAGE_CXX_COMPILER_FLAGS "${COVERAGE_COMPILER_FLAGS} -fprofile-abs-path")
endif()
include(CheckCCompilerFlag)
check_c_compiler_flag(-fprofile-abs-path HAVE_c_fprofile_abs_path)
if(HAVE_c_fprofile_abs_path)
set(COVERAGE_C_COMPILER_FLAGS "${COVERAGE_COMPILER_FLAGS} -fprofile-abs-path")
endif()
check_cxx_compiler_flag(-fprofile-update HAVE_cxx_fprofile_update)
if(HAVE_cxx_fprofile_update)
set(COVERAGE_CXX_COMPILER_FLAGS "${COVERAGE_COMPILER_FLAGS} -fprofile-update=atomic")
endif()
check_c_compiler_flag(-fprofile-update HAVE_c_fprofile_update)
if(HAVE_c_fprofile_update)
set(COVERAGE_C_COMPILER_FLAGS "${COVERAGE_COMPILER_FLAGS} -fprofile-update=atomic")
endif()
endif()
set(CMAKE_Fortran_FLAGS_COVERAGE

View File

@@ -64,6 +64,7 @@ target_link_libraries(xrpl.imports.main
secp256k1::secp256k1
xrpl.libpb
xxHash::xxhash
blake3
$<$<BOOL:${voidstar}>:antithesis-sdk-cpp>
)
@@ -136,6 +137,9 @@ if(xrpld)
add_executable(rippled)
if(tests)
target_compile_definitions(rippled PUBLIC ENABLE_TESTS)
target_compile_definitions(rippled PRIVATE
UNIT_TEST_REFERENCE_FEE=${UNIT_TEST_REFERENCE_FEE}
)
endif()
target_include_directories(rippled
PRIVATE

View File

@@ -53,9 +53,9 @@ set(download_script "${CMAKE_BINARY_DIR}/docs/download-cppreference.cmake")
file(WRITE
"${download_script}"
"file(DOWNLOAD \
http://upload.cppreference.com/mwiki/images/b/b2/html_book_20190607.zip \
https://github.com/PeterFeicht/cppreference-doc/releases/download/v20250209/html-book-20250209.zip \
${CMAKE_BINARY_DIR}/docs/cppreference.zip \
EXPECTED_HASH MD5=82b3a612d7d35a83e3cb1195a63689ab \
EXPECTED_HASH MD5=bda585f72fbca4b817b29a3d5746567b \
)\n \
execute_process( \
COMMAND \"${CMAKE_COMMAND}\" -E tar -xf cppreference.zip \

View File

@@ -2,16 +2,6 @@
convenience variables and sanity checks
#]===================================================================]
include(ProcessorCount)
if (NOT ep_procs)
ProcessorCount(ep_procs)
if (ep_procs GREATER 1)
# never use more than half of cores for EP builds
math (EXPR ep_procs "${ep_procs} / 2")
message (STATUS "Using ${ep_procs} cores for ExternalProject builds.")
endif ()
endif ()
get_property(is_multiconfig GLOBAL PROPERTY GENERATOR_IS_MULTI_CONFIG)
set (CMAKE_CONFIGURATION_TYPES "Debug;Release" CACHE STRING "" FORCE)

View File

@@ -11,6 +11,12 @@ option(assert "Enables asserts, even in release builds" OFF)
option(xrpld "Build xrpld" ON)
option(tests "Build tests" ON)
if(tests)
# This setting allows making a separate workflow to test fees other than default 10
if(NOT UNIT_TEST_REFERENCE_FEE)
set(UNIT_TEST_REFERENCE_FEE "10" CACHE STRING "")
endif()
endif()
option(unity "Creates a build using UNITY support in cmake. This is the default" ON)
if(unity)

View File

@@ -1,4 +1,4 @@
from conan import ConanFile
from conan import ConanFile, __version__ as conan_version
from conan.tools.cmake import CMake, CMakeToolchain, cmake_layout
import re
@@ -24,13 +24,11 @@ class Xrpl(ConanFile):
}
requires = [
'date/3.0.3',
'grpc/1.50.1',
'libarchive/3.7.6',
'nudb/2.0.8',
'openssl/1.1.1v',
'soci/4.0.3',
'xxhash/0.8.2',
'zlib/1.3.1',
]
@@ -99,14 +97,18 @@ class Xrpl(ConanFile):
self.options['boost'].visibility = 'global'
def requirements(self):
self.requires('boost/1.83.0', force=True)
# Conan 2 requires transitive headers to be specified
transitive_headers_opt = {'transitive_headers': True} if conan_version.split('.')[0] == '2' else {}
self.requires('boost/1.83.0', force=True, **transitive_headers_opt)
self.requires('date/3.0.3', **transitive_headers_opt)
self.requires('lz4/1.10.0', force=True)
self.requires('protobuf/3.21.9', force=True)
self.requires('sqlite3/3.47.0', force=True)
if self.options.jemalloc:
self.requires('jemalloc/5.3.0')
if self.options.rocksdb:
self.requires('rocksdb/6.29.5')
self.requires('rocksdb/9.7.3')
self.requires('xxhash/0.8.2', **transitive_headers_opt)
exports_sources = (
'CMakeLists.txt',

View File

@@ -23,7 +23,7 @@ direction.
```
apt update
apt install --yes curl git libssl-dev python3.10-dev python3-pip make g++-11 libprotobuf-dev protobuf-compiler
apt install --yes curl git libssl-dev pipx python3.10-dev python3-pip make g++-11 libprotobuf-dev protobuf-compiler
curl --location --remote-name \
"https://github.com/Kitware/CMake/releases/download/v3.25.1/cmake-3.25.1.tar.gz"
@@ -35,7 +35,8 @@ make --jobs $(nproc)
make install
cd ..
pip3 install 'conan<2'
pipx install 'conan<2'
pipx ensurepath
```
[1]: https://github.com/thejohnfreeman/rippled-docker/blob/master/ubuntu-22.04/install.sh

View File

@@ -558,7 +558,7 @@ struct ConsensusResult
ConsensusTimer roundTime;
// Indicates state in which consensus ended. Once in the accept phase
// will be either Yes or MovedOn
// will be either Yes or MovedOn or Expired
ConsensusState state = ConsensusState::No;
};

383
external/blake3/CMakeLists.txt vendored Normal file
View File

@@ -0,0 +1,383 @@
cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
# respect C_EXTENSIONS OFF without explicitly setting C_STANDARD
if (POLICY CMP0128)
cmake_policy(SET CMP0128 NEW)
endif()
# mark_as_advanced does not implicitly create UNINITIALIZED cache entries
if (POLICY CMP0102)
cmake_policy(SET CMP0102 NEW)
endif()
project(libblake3
VERSION 1.8.2
DESCRIPTION "BLAKE3 C implementation"
LANGUAGES C CXX ASM
)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
option(BLAKE3_USE_TBB "Enable oneTBB parallelism" OFF)
option(BLAKE3_FETCH_TBB "Allow fetching oneTBB from GitHub if not found on system" OFF)
include(CTest)
include(FeatureSummary)
include(GNUInstallDirs)
add_subdirectory(dependencies)
# architecture lists for which to enable assembly / SIMD sources
set(BLAKE3_AMD64_NAMES amd64 AMD64 x86_64)
set(BLAKE3_X86_NAMES i686 x86 X86)
set(BLAKE3_ARMv8_NAMES aarch64 AArch64 arm64 ARM64 armv8 armv8a)
# default SIMD compiler flag configuration (can be overriden by toolchains or CLI)
if(MSVC)
set(BLAKE3_CFLAGS_SSE2 "/arch:SSE2" CACHE STRING "the compiler flags to enable SSE2")
# MSVC has no dedicated sse4.1 flag (see https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170)
set(BLAKE3_CFLAGS_SSE4.1 "/arch:AVX" CACHE STRING "the compiler flags to enable SSE4.1")
set(BLAKE3_CFLAGS_AVX2 "/arch:AVX2" CACHE STRING "the compiler flags to enable AVX2")
set(BLAKE3_CFLAGS_AVX512 "/arch:AVX512" CACHE STRING "the compiler flags to enable AVX512")
set(BLAKE3_AMD64_ASM_SOURCES
blake3_avx2_x86-64_windows_msvc.asm
blake3_avx512_x86-64_windows_msvc.asm
blake3_sse2_x86-64_windows_msvc.asm
blake3_sse41_x86-64_windows_msvc.asm
)
elseif(CMAKE_C_COMPILER_ID STREQUAL "GNU"
OR CMAKE_C_COMPILER_ID STREQUAL "Clang"
OR CMAKE_C_COMPILER_ID STREQUAL "AppleClang")
set(BLAKE3_CFLAGS_SSE2 "-msse2" CACHE STRING "the compiler flags to enable SSE2")
set(BLAKE3_CFLAGS_SSE4.1 "-msse4.1" CACHE STRING "the compiler flags to enable SSE4.1")
set(BLAKE3_CFLAGS_AVX2 "-mavx2" CACHE STRING "the compiler flags to enable AVX2")
set(BLAKE3_CFLAGS_AVX512 "-mavx512f -mavx512vl" CACHE STRING "the compiler flags to enable AVX512")
if (WIN32 OR CYGWIN)
set(BLAKE3_AMD64_ASM_SOURCES
blake3_avx2_x86-64_windows_gnu.S
blake3_avx512_x86-64_windows_gnu.S
blake3_sse2_x86-64_windows_gnu.S
blake3_sse41_x86-64_windows_gnu.S
)
elseif(UNIX)
set(BLAKE3_AMD64_ASM_SOURCES
blake3_avx2_x86-64_unix.S
blake3_avx512_x86-64_unix.S
blake3_sse2_x86-64_unix.S
blake3_sse41_x86-64_unix.S
)
endif()
if (CMAKE_SYSTEM_PROCESSOR IN_LIST BLAKE3_ARMv8_NAMES
AND NOT CMAKE_SIZEOF_VOID_P EQUAL 8)
# 32-bit ARMv8 needs NEON to be enabled explicitly
set(BLAKE3_CFLAGS_NEON "-mfpu=neon" CACHE STRING "the compiler flags to enable NEON")
endif()
endif()
mark_as_advanced(BLAKE3_CFLAGS_SSE2 BLAKE3_CFLAGS_SSE4.1 BLAKE3_CFLAGS_AVX2 BLAKE3_CFLAGS_AVX512 BLAKE3_CFLAGS_NEON)
mark_as_advanced(BLAKE3_AMD64_ASM_SOURCES)
message(STATUS "BLAKE3 SIMD configuration: ${CMAKE_C_COMPILER_ARCHITECTURE_ID}")
if(MSVC AND DEFINED CMAKE_C_COMPILER_ARCHITECTURE_ID)
if(CMAKE_C_COMPILER_ARCHITECTURE_ID MATCHES "[Xx]86")
set(BLAKE3_SIMD_TYPE "x86-intrinsics" CACHE STRING "the SIMD acceleration type to use")
elseif(CMAKE_C_COMPILER_ARCHITECTURE_ID MATCHES "[Xx]64")
set(BLAKE3_SIMD_TYPE "amd64-asm" CACHE STRING "the SIMD acceleration type to use")
elseif(CMAKE_C_COMPILER_ARCHITECTURE_ID MATCHES "[Aa][Rr][Mm]64")
set(BLAKE3_SIMD_TYPE "neon-intrinsics" CACHE STRING "the SIMD acceleration type to use")
else()
set(BLAKE3_SIMD_TYPE "none" CACHE STRING "the SIMD acceleration type to use")
endif()
elseif(CMAKE_SYSTEM_PROCESSOR IN_LIST BLAKE3_AMD64_NAMES)
set(BLAKE3_SIMD_TYPE "amd64-asm" CACHE STRING "the SIMD acceleration type to use")
elseif(CMAKE_SYSTEM_PROCESSOR IN_LIST BLAKE3_X86_NAMES
AND DEFINED BLAKE3_CFLAGS_SSE2
AND DEFINED BLAKE3_CFLAGS_SSE4.1
AND DEFINED BLAKE3_CFLAGS_AVX2
AND DEFINED BLAKE3_CFLAGS_AVX512)
set(BLAKE3_SIMD_TYPE "x86-intrinsics" CACHE STRING "the SIMD acceleration type to use")
elseif((CMAKE_SYSTEM_PROCESSOR IN_LIST BLAKE3_ARMv8_NAMES
OR ANDROID_ABI STREQUAL "armeabi-v7a"
OR BLAKE3_USE_NEON_INTRINSICS)
AND (DEFINED BLAKE3_CFLAGS_NEON
OR CMAKE_SIZEOF_VOID_P EQUAL 8))
set(BLAKE3_SIMD_TYPE "neon-intrinsics" CACHE STRING "the SIMD acceleration type to use")
else()
set(BLAKE3_SIMD_TYPE "none" CACHE STRING "the SIMD acceleration type to use")
endif()
mark_as_advanced(BLAKE3_SIMD_TYPE)
# library target
add_library(blake3
blake3.c
blake3_dispatch.c
blake3_portable.c
)
add_library(BLAKE3::blake3 ALIAS blake3)
# library configuration
set(PKG_CONFIG_CFLAGS)
if (BUILD_SHARED_LIBS)
target_compile_definitions(blake3
PUBLIC BLAKE3_DLL
PRIVATE BLAKE3_DLL_EXPORTS
)
list(APPEND PKG_CONFIG_CFLAGS -DBLAKE3_DLL)
endif()
target_include_directories(blake3 PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
)
set_target_properties(blake3 PROPERTIES
VERSION ${PROJECT_VERSION}
SOVERSION 0
C_VISIBILITY_PRESET hidden
C_EXTENSIONS OFF
)
target_compile_features(blake3 PUBLIC c_std_99)
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.12)
target_compile_features(blake3 PUBLIC cxx_std_20)
# else: add it further below through `BLAKE3_CMAKE_CXXFLAGS_*`
endif()
# ensure C_EXTENSIONS OFF is respected without overriding CMAKE_C_STANDARD
# which may be set by the user or toolchain file
if (NOT POLICY CMP0128 AND NOT DEFINED CMAKE_C_STANDARD)
set_target_properties(blake3 PROPERTIES C_STANDARD 99)
endif()
# optional SIMD sources
if(BLAKE3_SIMD_TYPE STREQUAL "amd64-asm")
if (NOT DEFINED BLAKE3_AMD64_ASM_SOURCES)
message(FATAL_ERROR "BLAKE3_SIMD_TYPE is set to 'amd64-asm' but no assembly sources are available for the target architecture.")
endif()
set(BLAKE3_SIMD_AMD64_ASM ON)
if(MSVC)
enable_language(ASM_MASM)
endif()
target_sources(blake3 PRIVATE ${BLAKE3_AMD64_ASM_SOURCES})
elseif(BLAKE3_SIMD_TYPE STREQUAL "x86-intrinsics")
if (NOT DEFINED BLAKE3_CFLAGS_SSE2
OR NOT DEFINED BLAKE3_CFLAGS_SSE4.1
OR NOT DEFINED BLAKE3_CFLAGS_AVX2
OR NOT DEFINED BLAKE3_CFLAGS_AVX512)
message(FATAL_ERROR "BLAKE3_SIMD_TYPE is set to 'x86-intrinsics' but no compiler flags are available for the target architecture.")
endif()
set(BLAKE3_SIMD_X86_INTRINSICS ON)
target_sources(blake3 PRIVATE
blake3_avx2.c
blake3_avx512.c
blake3_sse2.c
blake3_sse41.c
)
set_source_files_properties(blake3_avx2.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_AVX2}")
set_source_files_properties(blake3_avx512.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_AVX512}")
set_source_files_properties(blake3_sse2.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_SSE2}")
set_source_files_properties(blake3_sse41.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_SSE4.1}")
elseif(BLAKE3_SIMD_TYPE STREQUAL "neon-intrinsics")
set(BLAKE3_SIMD_NEON_INTRINSICS ON)
target_sources(blake3 PRIVATE
blake3_neon.c
)
target_compile_definitions(blake3 PRIVATE
BLAKE3_USE_NEON=1
)
if (DEFINED BLAKE3_CFLAGS_NEON)
set_source_files_properties(blake3_neon.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_NEON}")
endif()
elseif(BLAKE3_SIMD_TYPE STREQUAL "none")
target_compile_definitions(blake3 PRIVATE
BLAKE3_USE_NEON=0
BLAKE3_NO_SSE2
BLAKE3_NO_SSE41
BLAKE3_NO_AVX2
BLAKE3_NO_AVX512
)
else()
message(FATAL_ERROR "BLAKE3_SIMD_TYPE is set to an unknown value: '${BLAKE3_SIMD_TYPE}'")
endif()
if(BLAKE3_USE_TBB)
find_package(TBB 2021.11.0 QUIET)
if(NOT TBB_FOUND AND NOT TARGET TBB::tbb)
message(WARNING
"oneTBB not found; disabling BLAKE3_USE_TBB\n"
"Enable BLAKE3_FETCH_TBB to automatically fetch and build oneTBB"
)
set(BLAKE3_USE_TBB OFF)
else()
target_sources(blake3
PRIVATE
blake3_tbb.cpp)
target_link_libraries(blake3
PUBLIC
# Make shared TBB a transitive dependency. The consuming program is technically not required
# to link TBB in order for libblake3 to function but we do this in order to prevent the
# possibility of multiple separate TBB runtimes being linked into a final program in case
# the consuming program also happens to already use TBB.
TBB::tbb)
target_compile_definitions(blake3
PUBLIC
BLAKE3_USE_TBB)
endif()
list(APPEND PKG_CONFIG_REQUIRES "tbb >= ${TBB_VERSION}")
list(APPEND PKG_CONFIG_CFLAGS -DBLAKE3_USE_TBB)
include(CheckCXXSymbolExists)
check_cxx_symbol_exists(_LIBCPP_VERSION "version" BLAKE3_HAVE_LIBCPP)
check_cxx_symbol_exists(__GLIBCXX__ "version" BLAKE3_HAVE_GLIBCXX)
if(BLAKE3_HAVE_GLIBCXX)
list(APPEND PKG_CONFIG_LIBS -lstdc++)
elseif(BLAKE3_HAVE_LIBCPP)
list(APPEND PKG_CONFIG_LIBS -lc++)
endif()
endif()
if(BLAKE3_USE_TBB)
# Define some scratch variables for building appropriate flags per compiler
if(CMAKE_VERSION VERSION_LESS 3.12)
set(APPEND BLAKE3_CXX_STANDARD_FLAGS_GNU -std=c++20)
set(APPEND BLAKE3_CXX_STANDARD_FLAGS_MSVC /std:c++20)
endif()
set(BLAKE3_CXXFLAGS_GNU "-fno-exceptions;-fno-rtti;${BLAKE3_CXX_STANDARD_FLAGS_GNU}" CACHE STRING "C++ flags used for compiling private BLAKE3 library components with GNU-like compiler frontends.")
set(BLAKE3_CXXFLAGS_MSVC "/EHs-c-;/GR-;${BLAKE3_CXX_STANDARD_FLAGS_MSVC}" CACHE STRING "C++ flags used for compiling private BLAKE3 library components with MSVC-like compiler frontends.")
# Get the C++ compiler name without extension
get_filename_component(BLAKE3_CMAKE_CXX_COMPILER_NAME "${CMAKE_CXX_COMPILER}" NAME_WE)
# Strip any trailing versioning from the C++ compiler name
string(REGEX MATCH "^(clang\\+\\+|clang-cl)" BLAKE3_CMAKE_CXX_COMPILER_NAME "${BLAKE3_CMAKE_CXX_COMPILER_NAME}")
# TODO: Simplify with CMAKE_CXX_COMPILER_FRONTEND_VARIANT once min CMake version is 3.14.
if(CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang")
target_compile_options(blake3 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${BLAKE3_CXXFLAGS_GNU}>)
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
if(BLAKE3_CMAKE_CXX_COMPILER_NAME STREQUAL "clang++")
target_compile_options(blake3 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${BLAKE3_CXXFLAGS_GNU}>)
elseif(BLAKE3_CMAKE_CXX_COMPILER_NAME STREQUAL "clang-cl")
target_compile_options(blake3 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${BLAKE3_CXXFLAGS_MSVC}>)
endif()
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
target_compile_options(blake3 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${BLAKE3_CXXFLAGS_GNU}>)
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
target_compile_options(blake3 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${BLAKE3_CXXFLAGS_MSVC}>)
endif()
# Undefine scratch variables
unset(BLAKE3_CXX_STANDARD_FLAGS_GNU)
unset(BLAKE3_CXX_STANDARD_FLAGS_MSVC)
unset(BLAKE3_CMAKE_CXX_COMPILER_NAME)
unset(BLAKE3_CXXFLAGS_GNU)
unset(BLAKE3_CXXFLAGS_MSVC)
endif()
# cmake install support
install(FILES blake3.h DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}")
install(TARGETS blake3 EXPORT blake3-targets
ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}"
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}"
RUNTIME DESTINATION "${CMAKE_INSTALL_BINDIR}"
)
install(EXPORT blake3-targets
NAMESPACE BLAKE3::
DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/blake3"
)
include(CMakePackageConfigHelpers)
configure_package_config_file(blake3-config.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/blake3-config.cmake"
INSTALL_DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/blake3"
)
write_basic_package_version_file(
"${CMAKE_CURRENT_BINARY_DIR}/blake3-config-version.cmake"
VERSION ${libblake3_VERSION}
COMPATIBILITY SameMajorVersion
)
install(FILES
"${CMAKE_CURRENT_BINARY_DIR}/blake3-config.cmake"
"${CMAKE_CURRENT_BINARY_DIR}/blake3-config-version.cmake"
DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/blake3"
)
# Function for joining paths known from most languages
#
# SPDX-License-Identifier: (MIT OR CC0-1.0)
# Copyright 2020 Jan Tojnar
# https://github.com/jtojnar/cmake-snips
#
# Modelled after Pythons os.path.join
# https://docs.python.org/3.7/library/os.path.html#os.path.join
# Windows not supported
function(join_paths joined_path first_path_segment)
set(temp_path "${first_path_segment}")
foreach(current_segment IN LISTS ARGN)
if(NOT ("${current_segment}" STREQUAL ""))
if(IS_ABSOLUTE "${current_segment}")
set(temp_path "${current_segment}")
else()
set(temp_path "${temp_path}/${current_segment}")
endif()
endif()
endforeach()
set(${joined_path} "${temp_path}" PARENT_SCOPE)
endfunction()
# In-place rewrite a string and and join by `sep`.
#
# TODO: Replace function with list(JOIN) when updating to CMake 3.12
function(join_pkg_config_field sep requires)
set(_requires "${${requires}}") # avoid shadowing issues, e.g. "${requires}"=len
list(LENGTH "${requires}" len)
set(idx 1)
foreach(req IN LISTS _requires)
string(APPEND acc "${req}")
if(idx LESS len)
string(APPEND acc "${sep}")
endif()
math(EXPR idx "${idx} + 1")
endforeach()
set("${requires}" "${acc}" PARENT_SCOPE)
endfunction()
# pkg-config support
join_pkg_config_field(", " PKG_CONFIG_REQUIRES)
join_pkg_config_field(" " PKG_CONFIG_LIBS)
join_pkg_config_field(" " PKG_CONFIG_CFLAGS)
join_paths(PKG_CONFIG_INSTALL_LIBDIR "\${prefix}" "${CMAKE_INSTALL_LIBDIR}")
join_paths(PKG_CONFIG_INSTALL_INCLUDEDIR "\${prefix}" "${CMAKE_INSTALL_INCLUDEDIR}")
configure_file(libblake3.pc.in libblake3.pc @ONLY)
install(FILES "${CMAKE_BINARY_DIR}/libblake3.pc"
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
# print feature summary
# add_feature_info cannot directly use the BLAKE3_SIMD_TYPE :(
add_feature_info("AMD64 assembly" BLAKE3_SIMD_AMD64_ASM "The library uses hand written amd64 SIMD assembly.")
add_feature_info("x86 SIMD intrinsics" BLAKE3_SIMD_X86_INTRINSICS "The library uses x86 SIMD intrinsics.")
add_feature_info("NEON SIMD intrinsics" BLAKE3_SIMD_NEON_INTRINSICS "The library uses NEON SIMD intrinsics.")
add_feature_info("oneTBB parallelism" BLAKE3_USE_TBB "The library uses oneTBB parallelism.")
feature_summary(WHAT ENABLED_FEATURES)
if(BLAKE3_EXAMPLES)
include(BLAKE3/Examples)
endif()
if(BLAKE3_TESTING)
include(BLAKE3/Testing)
endif()

82
external/blake3/Makefile.testing vendored Normal file
View File

@@ -0,0 +1,82 @@
# This Makefile is only for testing. C callers should follow the instructions
# in ./README.md to incorporate these C files into their existing build.
NAME=blake3
CC=gcc
CFLAGS=-O3 -Wall -Wextra -std=c11 -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -fPIE -fvisibility=hidden
LDFLAGS=-pie -Wl,-z,relro,-z,now
TARGETS=
ASM_TARGETS=
EXTRAFLAGS=-Wa,--noexecstack
ifdef BLAKE3_NO_SSE2
EXTRAFLAGS += -DBLAKE3_NO_SSE2
else
TARGETS += blake3_sse2.o
ASM_TARGETS += blake3_sse2_x86-64_unix.S
endif
ifdef BLAKE3_NO_SSE41
EXTRAFLAGS += -DBLAKE3_NO_SSE41
else
TARGETS += blake3_sse41.o
ASM_TARGETS += blake3_sse41_x86-64_unix.S
endif
ifdef BLAKE3_NO_AVX2
EXTRAFLAGS += -DBLAKE3_NO_AVX2
else
TARGETS += blake3_avx2.o
ASM_TARGETS += blake3_avx2_x86-64_unix.S
endif
ifdef BLAKE3_NO_AVX512
EXTRAFLAGS += -DBLAKE3_NO_AVX512
else
TARGETS += blake3_avx512.o
ASM_TARGETS += blake3_avx512_x86-64_unix.S
endif
ifdef BLAKE3_USE_NEON
EXTRAFLAGS += -DBLAKE3_USE_NEON=1
TARGETS += blake3_neon.o
endif
ifdef BLAKE3_NO_NEON
EXTRAFLAGS += -DBLAKE3_USE_NEON=0
endif
all: blake3.c blake3_dispatch.c blake3_portable.c main.c $(TARGETS)
$(CC) $(CFLAGS) $(EXTRAFLAGS) $^ -o $(NAME) $(LDFLAGS)
blake3_sse2.o: blake3_sse2.c
$(CC) $(CFLAGS) $(EXTRAFLAGS) -c $^ -o $@ -msse2
blake3_sse41.o: blake3_sse41.c
$(CC) $(CFLAGS) $(EXTRAFLAGS) -c $^ -o $@ -msse4.1
blake3_avx2.o: blake3_avx2.c
$(CC) $(CFLAGS) $(EXTRAFLAGS) -c $^ -o $@ -mavx2
blake3_avx512.o: blake3_avx512.c
$(CC) $(CFLAGS) $(EXTRAFLAGS) -c $^ -o $@ -mavx512f -mavx512vl
blake3_neon.o: blake3_neon.c
$(CC) $(CFLAGS) $(EXTRAFLAGS) -c $^ -o $@
test: CFLAGS += -DBLAKE3_TESTING -fsanitize=address,undefined
test: all
./test.py
asm: blake3.c blake3_dispatch.c blake3_portable.c main.c $(ASM_TARGETS)
$(CC) $(CFLAGS) $(EXTRAFLAGS) $^ -o $(NAME) $(LDFLAGS)
test_asm: CFLAGS += -DBLAKE3_TESTING -fsanitize=address,undefined
test_asm: asm
./test.py
example: example.c blake3.c blake3_dispatch.c blake3_portable.c $(ASM_TARGETS)
$(CC) $(CFLAGS) $(EXTRAFLAGS) $^ -o $@ $(LDFLAGS)
clean:
rm -f $(NAME) *.o

403
external/blake3/README.md vendored Normal file
View File

@@ -0,0 +1,403 @@
The official C implementation of BLAKE3.
# Example
An example program that hashes bytes from standard input and prints the
result:
```c
#include "blake3.h"
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(void) {
// Initialize the hasher.
blake3_hasher hasher;
blake3_hasher_init(&hasher);
// Read input bytes from stdin.
unsigned char buf[65536];
while (1) {
ssize_t n = read(STDIN_FILENO, buf, sizeof(buf));
if (n > 0) {
blake3_hasher_update(&hasher, buf, n);
} else if (n == 0) {
break; // end of file
} else {
fprintf(stderr, "read failed: %s\n", strerror(errno));
return 1;
}
}
// Finalize the hash. BLAKE3_OUT_LEN is the default output length, 32 bytes.
uint8_t output[BLAKE3_OUT_LEN];
blake3_hasher_finalize(&hasher, output, BLAKE3_OUT_LEN);
// Print the hash as hexadecimal.
for (size_t i = 0; i < BLAKE3_OUT_LEN; i++) {
printf("%02x", output[i]);
}
printf("\n");
return 0;
}
```
The code above is included in this directory as `example.c`. If you're
on x86\_64 with a Unix-like OS, you can compile a working binary like
this:
```bash
gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c \
blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \
blake3_avx512_x86-64_unix.S
```
# API
## The Struct
```c
typedef struct {
// private fields
} blake3_hasher;
```
An incremental BLAKE3 hashing state, which can accept any number of
updates. This implementation doesn't allocate any heap memory, but
`sizeof(blake3_hasher)` itself is relatively large, currently 1912 bytes
on x86-64. This size can be reduced by restricting the maximum input
length, as described in Section 5.4 of [the BLAKE3
spec](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf),
but this implementation doesn't currently support that strategy.
## Common API Functions
```c
void blake3_hasher_init(
blake3_hasher *self);
```
Initialize a `blake3_hasher` in the default hashing mode.
---
```c
void blake3_hasher_update(
blake3_hasher *self,
const void *input,
size_t input_len);
```
Add input to the hasher. This can be called any number of times. This function
is always single-threaded; for multithreading see `blake3_hasher_update_tbb`
below.
---
```c
void blake3_hasher_finalize(
const blake3_hasher *self,
uint8_t *out,
size_t out_len);
```
Finalize the hasher and return an output of any length, given in bytes.
This doesn't modify the hasher itself, and it's possible to finalize
again after adding more input. The constant `BLAKE3_OUT_LEN` provides
the default output length, 32 bytes, which is recommended for most
callers. See the [Security Notes](#security-notes) below.
## Less Common API Functions
```c
void blake3_hasher_init_keyed(
blake3_hasher *self,
const uint8_t key[BLAKE3_KEY_LEN]);
```
Initialize a `blake3_hasher` in the keyed hashing mode. The key must be
exactly 32 bytes.
---
```c
void blake3_hasher_init_derive_key(
blake3_hasher *self,
const char *context);
```
Initialize a `blake3_hasher` in the key derivation mode. The context
string is given as an initialization parameter, and afterwards input key
material should be given with `blake3_hasher_update`. The context string
is a null-terminated C string which should be **hardcoded, globally
unique, and application-specific**. The context string should not
include any dynamic input like salts, nonces, or identifiers read from a
database at runtime. A good default format for the context string is
`"[application] [commit timestamp] [purpose]"`, e.g., `"example.com
2019-12-25 16:18:03 session tokens v1"`.
This function is intended for application code written in C. For
language bindings, see `blake3_hasher_init_derive_key_raw` below.
---
```c
void blake3_hasher_init_derive_key_raw(
blake3_hasher *self,
const void *context,
size_t context_len);
```
As `blake3_hasher_init_derive_key` above, except that the context string
is given as a pointer to an array of arbitrary bytes with a provided
length. This is intended for writing language bindings, where C string
conversion would add unnecessary overhead and new error cases. Unicode
strings should be encoded as UTF-8.
Application code in C should prefer `blake3_hasher_init_derive_key`,
which takes the context as a C string. If you need to use arbitrary
bytes as a context string in application code, consider whether you're
violating the requirement that context strings should be hardcoded.
---
```c
void blake3_hasher_update_tbb(
blake3_hasher *self,
const void *input,
size_t input_len);
```
Add input to the hasher, using [oneTBB] to process large inputs using multiple
threads. This can be called any number of times. This gives the same result as
`blake3_hasher_update` above.
[oneTBB]: https://uxlfoundation.github.io/oneTBB/
NOTE: This function is only enabled when the library is compiled with CMake option `BLAKE3_USE_TBB`
and when the oneTBB library is detected on the host system. See the building instructions for
further details.
To get any performance benefit from multithreading, the input buffer needs to
be large. As a rule of thumb on x86_64, `blake3_hasher_update_tbb` is _slower_
than `blake3_hasher_update` for inputs under 128 KiB. That threshold varies
quite a lot across different processors, and it's important to benchmark your
specific use case.
Hashing large files with this function usually requires
[memory-mapping](https://en.wikipedia.org/wiki/Memory-mapped_file), since
reading a file into memory in a single-threaded loop takes longer than hashing
the resulting buffer. Note that hashing a memory-mapped file with this function
produces a "random" pattern of disk reads, which can be slow on spinning disks.
Again it's important to benchmark your specific use case.
This implementation doesn't require configuration of thread resources and will
use as many cores as possible by default. More fine-grained control of
resources is possible using the [oneTBB] API.
---
```c
void blake3_hasher_finalize_seek(
const blake3_hasher *self,
uint64_t seek,
uint8_t *out,
size_t out_len);
```
The same as `blake3_hasher_finalize`, but with an additional `seek`
parameter for the starting byte position in the output stream. To
efficiently stream a large output without allocating memory, call this
function in a loop, incrementing `seek` by the output length each time.
---
```c
void blake3_hasher_reset(
blake3_hasher *self);
```
Reset the hasher to its initial state, prior to any calls to
`blake3_hasher_update`. Currently this is no different from calling
`blake3_hasher_init` or similar again.
# Security Notes
Outputs shorter than the default length of 32 bytes (256 bits) provide less security. An N-bit
BLAKE3 output is intended to provide N bits of first and second preimage resistance and N/2
bits of collision resistance, for any N up to 256. Longer outputs don't provide any additional
security.
Avoid relying on the secrecy of the output offset, that is, the `seek` argument of
`blake3_hasher_finalize_seek`. [_Block-Cipher-Based Tree Hashing_ by Aldo
Gunsing](https://eprint.iacr.org/2022/283) shows that an attacker who knows both the message
and the key (if any) can easily determine the offset of an extended output. For comparison,
AES-CTR has a similar property: if you know the key, you can decrypt a block from an unknown
position in the output stream to recover its block index. Callers with strong secret keys
aren't affected in practice, but secret offsets are a [design
smell](https://en.wikipedia.org/wiki/Design_smell) in any case.
# Building
The easiest and most complete method of compiling this library is with CMake.
This is the method described in the next section. Toward the end of the
building section there are more in depth notes about compiling manually and
things that are useful to understand if you need to integrate this library with
another build system.
## CMake
The minimum version of CMake is 3.9. The following invocations will compile and
install `libblake3`. With recent CMake:
```bash
cmake -S c -B c/build "-DCMAKE_INSTALL_PREFIX=/usr/local"
cmake --build c/build --target install
```
With an older CMake:
```bash
cd c
mkdir build
cd build
cmake .. "-DCMAKE_INSTALL_PREFIX=/usr/local"
cmake --build . --target install
```
The following options are available when compiling with CMake:
- `BLAKE3_USE_TBB`: Enable oneTBB parallelism (Requires a C++20 capable compiler)
- `BLAKE3_FETCH_TBB`: Allow fetching oneTBB from GitHub (only if not found on system)
- `BLAKE3_EXAMPLES`: Compile and install example programs
Options can be enabled like this:
```bash
cmake -S c -B c/build "-DCMAKE_INSTALL_PREFIX=/usr/local" -DBLAKE3_USE_TBB=1 -DBLAKE3_FETCH_TBB=1
```
## Building manually
We try to keep the build simple enough that you can compile this library "by
hand", and it's expected that many callers will integrate it with their
pre-existing build systems. See the `gcc` one-liner in the "Example" section
above.
### x86
Dynamic dispatch is enabled by default on x86. The implementation will
query the CPU at runtime to detect SIMD support, and it will use the
widest instruction set available. By default, `blake3_dispatch.c`
expects to be linked with code for five different instruction sets:
portable C, SSE2, SSE4.1, AVX2, and AVX-512.
For each of the x86 SIMD instruction sets, four versions are available:
three flavors of assembly (Unix, Windows MSVC, and Windows GNU) and one
version using C intrinsics. The assembly versions are generally
preferred. They perform better, they perform more consistently across
different compilers, and they build more quickly. On the other hand, the
assembly versions are x86\_64-only, and you need to select the right
flavor for your target platform.
Here's an example of building a shared library on x86\_64 Linux using
the assembly implementations:
```bash
gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \
blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \
blake3_avx512_x86-64_unix.S
```
When building the intrinsics-based implementations, you need to build
each implementation separately, with the corresponding instruction set
explicitly enabled in the compiler. Here's the same shared library using
the intrinsics-based implementations:
```bash
gcc -c -fPIC -O3 -msse2 blake3_sse2.c -o blake3_sse2.o
gcc -c -fPIC -O3 -msse4.1 blake3_sse41.c -o blake3_sse41.o
gcc -c -fPIC -O3 -mavx2 blake3_avx2.c -o blake3_avx2.o
gcc -c -fPIC -O3 -mavx512f -mavx512vl blake3_avx512.c -o blake3_avx512.o
gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \
blake3_avx2.o blake3_avx512.o blake3_sse41.o blake3_sse2.o
```
Note above that building `blake3_avx512.c` requires both `-mavx512f` and
`-mavx512vl` under GCC and Clang. Under MSVC, the single `/arch:AVX512`
flag is sufficient. The MSVC equivalent of `-mavx2` is `/arch:AVX2`.
MSVC enables SSE2 and SSE4.1 by default, and it doesn't have a
corresponding flag.
If you want to omit SIMD code entirely, you need to explicitly disable
each instruction set. Here's an example of building a shared library on
x86 with only portable code:
```bash
gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 \
-DBLAKE3_NO_AVX512 blake3.c blake3_dispatch.c blake3_portable.c
```
### ARM NEON
The NEON implementation is enabled by default on AArch64, but not on
other ARM targets, since not all of them support it. To enable it, set
`BLAKE3_USE_NEON=1`. Here's an example of building a shared library on
ARM Linux with NEON support:
```bash
gcc -shared -O3 -o libblake3.so -DBLAKE3_USE_NEON=1 blake3.c blake3_dispatch.c \
blake3_portable.c blake3_neon.c
```
To explicitiy disable using NEON instructions on AArch64, set
`BLAKE3_USE_NEON=0`.
```bash
gcc -shared -O3 -o libblake3.so -DBLAKE3_USE_NEON=0 blake3.c blake3_dispatch.c \
blake3_portable.c
```
Note that on some targets (ARMv7 in particular), extra flags may be
required to activate NEON support in the compiler. If you see an error
like...
```
/usr/lib/gcc/armv7l-unknown-linux-gnueabihf/9.2.0/include/arm_neon.h:635:1: error: inlining failed
in call to always_inline vaddq_u32: target specific option mismatch
```
...then you may need to add something like `-mfpu=neon-vfpv4
-mfloat-abi=hard`.
### Other Platforms
The portable implementation should work on most other architectures. For
example:
```bash
gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c
```
### Multithreading
Multithreading is available using [oneTBB], by compiling the optional C++
support file [`blake3_tbb.cpp`](./blake3_tbb.cpp). For an example of using
`mmap` (non-Windows) and `blake3_hasher_update_tbb` to get large-file
performance on par with [`b3sum`](../b3sum), see
[`example_tbb.c`](./example_tbb.c). You can build it like this:
```bash
g++ -c -O3 -fno-exceptions -fno-rtti -DBLAKE3_USE_TBB -o blake3_tbb.o blake3_tbb.cpp
gcc -O3 -o example_tbb -lstdc++ -ltbb -DBLAKE3_USE_TBB blake3_tbb.o example_tbb.c blake3.c \
blake3_dispatch.c blake3_portable.c blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S \
blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S
```
NOTE: `-fno-exceptions` or equivalent is required to compile `blake3_tbb.cpp`,
and public API methods with external C linkage are marked `noexcept`. Compiling
that file with exceptions enabled will fail. Compiling with RTTI disabled isn't
required but is recommended for code size.

14
external/blake3/blake3-config.cmake.in vendored Normal file
View File

@@ -0,0 +1,14 @@
@PACKAGE_INIT@
include(CMakeFindDependencyMacro)
# Remember TBB option state
set(BLAKE3_USE_TBB @BLAKE3_USE_TBB@)
if(BLAKE3_USE_TBB)
find_dependency(TBB @TBB_VERSION@)
endif()
include("${CMAKE_CURRENT_LIST_DIR}/blake3-targets.cmake")
check_required_components(blake3)

650
external/blake3/blake3.c vendored Normal file
View File

@@ -0,0 +1,650 @@
#include <assert.h>
#include <stdbool.h>
#include <string.h>
#include "blake3.h"
#include "blake3_impl.h"
const char *blake3_version(void) { return BLAKE3_VERSION_STRING; }
INLINE void chunk_state_init(blake3_chunk_state *self, const uint32_t key[8],
uint8_t flags) {
memcpy(self->cv, key, BLAKE3_KEY_LEN);
self->chunk_counter = 0;
memset(self->buf, 0, BLAKE3_BLOCK_LEN);
self->buf_len = 0;
self->blocks_compressed = 0;
self->flags = flags;
}
INLINE void chunk_state_reset(blake3_chunk_state *self, const uint32_t key[8],
uint64_t chunk_counter) {
memcpy(self->cv, key, BLAKE3_KEY_LEN);
self->chunk_counter = chunk_counter;
self->blocks_compressed = 0;
memset(self->buf, 0, BLAKE3_BLOCK_LEN);
self->buf_len = 0;
}
INLINE size_t chunk_state_len(const blake3_chunk_state *self) {
return (BLAKE3_BLOCK_LEN * (size_t)self->blocks_compressed) +
((size_t)self->buf_len);
}
INLINE size_t chunk_state_fill_buf(blake3_chunk_state *self,
const uint8_t *input, size_t input_len) {
size_t take = BLAKE3_BLOCK_LEN - ((size_t)self->buf_len);
if (take > input_len) {
take = input_len;
}
uint8_t *dest = self->buf + ((size_t)self->buf_len);
memcpy(dest, input, take);
self->buf_len += (uint8_t)take;
return take;
}
INLINE uint8_t chunk_state_maybe_start_flag(const blake3_chunk_state *self) {
if (self->blocks_compressed == 0) {
return CHUNK_START;
} else {
return 0;
}
}
typedef struct {
uint32_t input_cv[8];
uint64_t counter;
uint8_t block[BLAKE3_BLOCK_LEN];
uint8_t block_len;
uint8_t flags;
} output_t;
INLINE output_t make_output(const uint32_t input_cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags) {
output_t ret;
memcpy(ret.input_cv, input_cv, 32);
memcpy(ret.block, block, BLAKE3_BLOCK_LEN);
ret.block_len = block_len;
ret.counter = counter;
ret.flags = flags;
return ret;
}
// Chaining values within a given chunk (specifically the compress_in_place
// interface) are represented as words. This avoids unnecessary bytes<->words
// conversion overhead in the portable implementation. However, the hash_many
// interface handles both user input and parent node blocks, so it accepts
// bytes. For that reason, chaining values in the CV stack are represented as
// bytes.
INLINE void output_chaining_value(const output_t *self, uint8_t cv[32]) {
uint32_t cv_words[8];
memcpy(cv_words, self->input_cv, 32);
blake3_compress_in_place(cv_words, self->block, self->block_len,
self->counter, self->flags);
store_cv_words(cv, cv_words);
}
INLINE void output_root_bytes(const output_t *self, uint64_t seek, uint8_t *out,
size_t out_len) {
if (out_len == 0) {
return;
}
uint64_t output_block_counter = seek / 64;
size_t offset_within_block = seek % 64;
uint8_t wide_buf[64];
if(offset_within_block) {
blake3_compress_xof(self->input_cv, self->block, self->block_len, output_block_counter, self->flags | ROOT, wide_buf);
const size_t available_bytes = 64 - offset_within_block;
const size_t bytes = out_len > available_bytes ? available_bytes : out_len;
memcpy(out, wide_buf + offset_within_block, bytes);
out += bytes;
out_len -= bytes;
output_block_counter += 1;
}
if(out_len / 64) {
blake3_xof_many(self->input_cv, self->block, self->block_len, output_block_counter, self->flags | ROOT, out, out_len / 64);
}
output_block_counter += out_len / 64;
out += out_len & -64;
out_len -= out_len & -64;
if(out_len) {
blake3_compress_xof(self->input_cv, self->block, self->block_len, output_block_counter, self->flags | ROOT, wide_buf);
memcpy(out, wide_buf, out_len);
}
}
INLINE void chunk_state_update(blake3_chunk_state *self, const uint8_t *input,
size_t input_len) {
if (self->buf_len > 0) {
size_t take = chunk_state_fill_buf(self, input, input_len);
input += take;
input_len -= take;
if (input_len > 0) {
blake3_compress_in_place(
self->cv, self->buf, BLAKE3_BLOCK_LEN, self->chunk_counter,
self->flags | chunk_state_maybe_start_flag(self));
self->blocks_compressed += 1;
self->buf_len = 0;
memset(self->buf, 0, BLAKE3_BLOCK_LEN);
}
}
while (input_len > BLAKE3_BLOCK_LEN) {
blake3_compress_in_place(self->cv, input, BLAKE3_BLOCK_LEN,
self->chunk_counter,
self->flags | chunk_state_maybe_start_flag(self));
self->blocks_compressed += 1;
input += BLAKE3_BLOCK_LEN;
input_len -= BLAKE3_BLOCK_LEN;
}
chunk_state_fill_buf(self, input, input_len);
}
INLINE output_t chunk_state_output(const blake3_chunk_state *self) {
uint8_t block_flags =
self->flags | chunk_state_maybe_start_flag(self) | CHUNK_END;
return make_output(self->cv, self->buf, self->buf_len, self->chunk_counter,
block_flags);
}
INLINE output_t parent_output(const uint8_t block[BLAKE3_BLOCK_LEN],
const uint32_t key[8], uint8_t flags) {
return make_output(key, block, BLAKE3_BLOCK_LEN, 0, flags | PARENT);
}
// Given some input larger than one chunk, return the number of bytes that
// should go in the left subtree. This is the largest power-of-2 number of
// chunks that leaves at least 1 byte for the right subtree.
INLINE size_t left_subtree_len(size_t input_len) {
// Subtract 1 to reserve at least one byte for the right side. input_len
// should always be greater than BLAKE3_CHUNK_LEN.
size_t full_chunks = (input_len - 1) / BLAKE3_CHUNK_LEN;
return round_down_to_power_of_2(full_chunks) * BLAKE3_CHUNK_LEN;
}
// Use SIMD parallelism to hash up to MAX_SIMD_DEGREE chunks at the same time
// on a single thread. Write out the chunk chaining values and return the
// number of chunks hashed. These chunks are never the root and never empty;
// those cases use a different codepath.
INLINE size_t compress_chunks_parallel(const uint8_t *input, size_t input_len,
const uint32_t key[8],
uint64_t chunk_counter, uint8_t flags,
uint8_t *out) {
#if defined(BLAKE3_TESTING)
assert(0 < input_len);
assert(input_len <= MAX_SIMD_DEGREE * BLAKE3_CHUNK_LEN);
#endif
const uint8_t *chunks_array[MAX_SIMD_DEGREE];
size_t input_position = 0;
size_t chunks_array_len = 0;
while (input_len - input_position >= BLAKE3_CHUNK_LEN) {
chunks_array[chunks_array_len] = &input[input_position];
input_position += BLAKE3_CHUNK_LEN;
chunks_array_len += 1;
}
blake3_hash_many(chunks_array, chunks_array_len,
BLAKE3_CHUNK_LEN / BLAKE3_BLOCK_LEN, key, chunk_counter,
true, flags, CHUNK_START, CHUNK_END, out);
// Hash the remaining partial chunk, if there is one. Note that the empty
// chunk (meaning the empty message) is a different codepath.
if (input_len > input_position) {
uint64_t counter = chunk_counter + (uint64_t)chunks_array_len;
blake3_chunk_state chunk_state;
chunk_state_init(&chunk_state, key, flags);
chunk_state.chunk_counter = counter;
chunk_state_update(&chunk_state, &input[input_position],
input_len - input_position);
output_t output = chunk_state_output(&chunk_state);
output_chaining_value(&output, &out[chunks_array_len * BLAKE3_OUT_LEN]);
return chunks_array_len + 1;
} else {
return chunks_array_len;
}
}
// Use SIMD parallelism to hash up to MAX_SIMD_DEGREE parents at the same time
// on a single thread. Write out the parent chaining values and return the
// number of parents hashed. (If there's an odd input chaining value left over,
// return it as an additional output.) These parents are never the root and
// never empty; those cases use a different codepath.
INLINE size_t compress_parents_parallel(const uint8_t *child_chaining_values,
size_t num_chaining_values,
const uint32_t key[8], uint8_t flags,
uint8_t *out) {
#if defined(BLAKE3_TESTING)
assert(2 <= num_chaining_values);
assert(num_chaining_values <= 2 * MAX_SIMD_DEGREE_OR_2);
#endif
const uint8_t *parents_array[MAX_SIMD_DEGREE_OR_2];
size_t parents_array_len = 0;
while (num_chaining_values - (2 * parents_array_len) >= 2) {
parents_array[parents_array_len] =
&child_chaining_values[2 * parents_array_len * BLAKE3_OUT_LEN];
parents_array_len += 1;
}
blake3_hash_many(parents_array, parents_array_len, 1, key,
0, // Parents always use counter 0.
false, flags | PARENT,
0, // Parents have no start flags.
0, // Parents have no end flags.
out);
// If there's an odd child left over, it becomes an output.
if (num_chaining_values > 2 * parents_array_len) {
memcpy(&out[parents_array_len * BLAKE3_OUT_LEN],
&child_chaining_values[2 * parents_array_len * BLAKE3_OUT_LEN],
BLAKE3_OUT_LEN);
return parents_array_len + 1;
} else {
return parents_array_len;
}
}
// The wide helper function returns (writes out) an array of chaining values
// and returns the length of that array. The number of chaining values returned
// is the dynamically detected SIMD degree, at most MAX_SIMD_DEGREE. Or fewer,
// if the input is shorter than that many chunks. The reason for maintaining a
// wide array of chaining values going back up the tree, is to allow the
// implementation to hash as many parents in parallel as possible.
//
// As a special case when the SIMD degree is 1, this function will still return
// at least 2 outputs. This guarantees that this function doesn't perform the
// root compression. (If it did, it would use the wrong flags, and also we
// wouldn't be able to implement extendable output.) Note that this function is
// not used when the whole input is only 1 chunk long; that's a different
// codepath.
//
// Why not just have the caller split the input on the first update(), instead
// of implementing this special rule? Because we don't want to limit SIMD or
// multi-threading parallelism for that update().
size_t blake3_compress_subtree_wide(const uint8_t *input, size_t input_len,
const uint32_t key[8],
uint64_t chunk_counter, uint8_t flags,
uint8_t *out, bool use_tbb) {
// Note that the single chunk case does *not* bump the SIMD degree up to 2
// when it is 1. If this implementation adds multi-threading in the future,
// this gives us the option of multi-threading even the 2-chunk case, which
// can help performance on smaller platforms.
if (input_len <= blake3_simd_degree() * BLAKE3_CHUNK_LEN) {
return compress_chunks_parallel(input, input_len, key, chunk_counter, flags,
out);
}
// With more than simd_degree chunks, we need to recurse. Start by dividing
// the input into left and right subtrees. (Note that this is only optimal
// as long as the SIMD degree is a power of 2. If we ever get a SIMD degree
// of 3 or something, we'll need a more complicated strategy.)
size_t left_input_len = left_subtree_len(input_len);
size_t right_input_len = input_len - left_input_len;
const uint8_t *right_input = &input[left_input_len];
uint64_t right_chunk_counter =
chunk_counter + (uint64_t)(left_input_len / BLAKE3_CHUNK_LEN);
// Make space for the child outputs. Here we use MAX_SIMD_DEGREE_OR_2 to
// account for the special case of returning 2 outputs when the SIMD degree
// is 1.
uint8_t cv_array[2 * MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN];
size_t degree = blake3_simd_degree();
if (left_input_len > BLAKE3_CHUNK_LEN && degree == 1) {
// The special case: We always use a degree of at least two, to make
// sure there are two outputs. Except, as noted above, at the chunk
// level, where we allow degree=1. (Note that the 1-chunk-input case is
// a different codepath.)
degree = 2;
}
uint8_t *right_cvs = &cv_array[degree * BLAKE3_OUT_LEN];
// Recurse!
size_t left_n = -1;
size_t right_n = -1;
#if defined(BLAKE3_USE_TBB)
blake3_compress_subtree_wide_join_tbb(
key, flags, use_tbb,
// left-hand side
input, left_input_len, chunk_counter, cv_array, &left_n,
// right-hand side
right_input, right_input_len, right_chunk_counter, right_cvs, &right_n);
#else
left_n = blake3_compress_subtree_wide(
input, left_input_len, key, chunk_counter, flags, cv_array, use_tbb);
right_n = blake3_compress_subtree_wide(right_input, right_input_len, key,
right_chunk_counter, flags, right_cvs,
use_tbb);
#endif // BLAKE3_USE_TBB
// The special case again. If simd_degree=1, then we'll have left_n=1 and
// right_n=1. Rather than compressing them into a single output, return
// them directly, to make sure we always have at least two outputs.
if (left_n == 1) {
memcpy(out, cv_array, 2 * BLAKE3_OUT_LEN);
return 2;
}
// Otherwise, do one layer of parent node compression.
size_t num_chaining_values = left_n + right_n;
return compress_parents_parallel(cv_array, num_chaining_values, key, flags,
out);
}
// Hash a subtree with compress_subtree_wide(), and then condense the resulting
// list of chaining values down to a single parent node. Don't compress that
// last parent node, however. Instead, return its message bytes (the
// concatenated chaining values of its children). This is necessary when the
// first call to update() supplies a complete subtree, because the topmost
// parent node of that subtree could end up being the root. It's also necessary
// for extended output in the general case.
//
// As with compress_subtree_wide(), this function is not used on inputs of 1
// chunk or less. That's a different codepath.
INLINE void
compress_subtree_to_parent_node(const uint8_t *input, size_t input_len,
const uint32_t key[8], uint64_t chunk_counter,
uint8_t flags, uint8_t out[2 * BLAKE3_OUT_LEN],
bool use_tbb) {
#if defined(BLAKE3_TESTING)
assert(input_len > BLAKE3_CHUNK_LEN);
#endif
uint8_t cv_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN];
size_t num_cvs = blake3_compress_subtree_wide(input, input_len, key,
chunk_counter, flags, cv_array, use_tbb);
assert(num_cvs <= MAX_SIMD_DEGREE_OR_2);
// The following loop never executes when MAX_SIMD_DEGREE_OR_2 is 2, because
// as we just asserted, num_cvs will always be <=2 in that case. But GCC
// (particularly GCC 8.5) can't tell that it never executes, and if NDEBUG is
// set then it emits incorrect warnings here. We tried a few different
// hacks to silence these, but in the end our hacks just produced different
// warnings (see https://github.com/BLAKE3-team/BLAKE3/pull/380). Out of
// desperation, we ifdef out this entire loop when we know it's not needed.
#if MAX_SIMD_DEGREE_OR_2 > 2
// If MAX_SIMD_DEGREE_OR_2 is greater than 2 and there's enough input,
// compress_subtree_wide() returns more than 2 chaining values. Condense
// them into 2 by forming parent nodes repeatedly.
uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
while (num_cvs > 2) {
num_cvs =
compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
}
#endif
memcpy(out, cv_array, 2 * BLAKE3_OUT_LEN);
}
INLINE void hasher_init_base(blake3_hasher *self, const uint32_t key[8],
uint8_t flags) {
memcpy(self->key, key, BLAKE3_KEY_LEN);
chunk_state_init(&self->chunk, key, flags);
self->cv_stack_len = 0;
}
void blake3_hasher_init(blake3_hasher *self) { hasher_init_base(self, IV, 0); }
void blake3_hasher_init_keyed(blake3_hasher *self,
const uint8_t key[BLAKE3_KEY_LEN]) {
uint32_t key_words[8];
load_key_words(key, key_words);
hasher_init_base(self, key_words, KEYED_HASH);
}
void blake3_hasher_init_derive_key_raw(blake3_hasher *self, const void *context,
size_t context_len) {
blake3_hasher context_hasher;
hasher_init_base(&context_hasher, IV, DERIVE_KEY_CONTEXT);
blake3_hasher_update(&context_hasher, context, context_len);
uint8_t context_key[BLAKE3_KEY_LEN];
blake3_hasher_finalize(&context_hasher, context_key, BLAKE3_KEY_LEN);
uint32_t context_key_words[8];
load_key_words(context_key, context_key_words);
hasher_init_base(self, context_key_words, DERIVE_KEY_MATERIAL);
}
void blake3_hasher_init_derive_key(blake3_hasher *self, const char *context) {
blake3_hasher_init_derive_key_raw(self, context, strlen(context));
}
// As described in hasher_push_cv() below, we do "lazy merging", delaying
// merges until right before the next CV is about to be added. This is
// different from the reference implementation. Another difference is that we
// aren't always merging 1 chunk at a time. Instead, each CV might represent
// any power-of-two number of chunks, as long as the smaller-above-larger stack
// order is maintained. Instead of the "count the trailing 0-bits" algorithm
// described in the spec, we use a "count the total number of 1-bits" variant
// that doesn't require us to retain the subtree size of the CV on top of the
// stack. The principle is the same: each CV that should remain in the stack is
// represented by a 1-bit in the total number of chunks (or bytes) so far.
INLINE void hasher_merge_cv_stack(blake3_hasher *self, uint64_t total_len) {
size_t post_merge_stack_len = (size_t)popcnt(total_len);
while (self->cv_stack_len > post_merge_stack_len) {
uint8_t *parent_node =
&self->cv_stack[(self->cv_stack_len - 2) * BLAKE3_OUT_LEN];
output_t output = parent_output(parent_node, self->key, self->chunk.flags);
output_chaining_value(&output, parent_node);
self->cv_stack_len -= 1;
}
}
// In reference_impl.rs, we merge the new CV with existing CVs from the stack
// before pushing it. We can do that because we know more input is coming, so
// we know none of the merges are root.
//
// This setting is different. We want to feed as much input as possible to
// compress_subtree_wide(), without setting aside anything for the chunk_state.
// If the user gives us 64 KiB, we want to parallelize over all 64 KiB at once
// as a single subtree, if at all possible.
//
// This leads to two problems:
// 1) This 64 KiB input might be the only call that ever gets made to update.
// In this case, the root node of the 64 KiB subtree would be the root node
// of the whole tree, and it would need to be ROOT finalized. We can't
// compress it until we know.
// 2) This 64 KiB input might complete a larger tree, whose root node is
// similarly going to be the root of the whole tree. For example, maybe
// we have 196 KiB (that is, 128 + 64) hashed so far. We can't compress the
// node at the root of the 256 KiB subtree until we know how to finalize it.
//
// The second problem is solved with "lazy merging". That is, when we're about
// to add a CV to the stack, we don't merge it with anything first, as the
// reference impl does. Instead we do merges using the *previous* CV that was
// added, which is sitting on top of the stack, and we put the new CV
// (unmerged) on top of the stack afterwards. This guarantees that we never
// merge the root node until finalize().
//
// Solving the first problem requires an additional tool,
// compress_subtree_to_parent_node(). That function always returns the top
// *two* chaining values of the subtree it's compressing. We then do lazy
// merging with each of them separately, so that the second CV will always
// remain unmerged. (That also helps us support extendable output when we're
// hashing an input all-at-once.)
INLINE void hasher_push_cv(blake3_hasher *self, uint8_t new_cv[BLAKE3_OUT_LEN],
uint64_t chunk_counter) {
hasher_merge_cv_stack(self, chunk_counter);
memcpy(&self->cv_stack[self->cv_stack_len * BLAKE3_OUT_LEN], new_cv,
BLAKE3_OUT_LEN);
self->cv_stack_len += 1;
}
INLINE void blake3_hasher_update_base(blake3_hasher *self, const void *input,
size_t input_len, bool use_tbb) {
// Explicitly checking for zero avoids causing UB by passing a null pointer
// to memcpy. This comes up in practice with things like:
// std::vector<uint8_t> v;
// blake3_hasher_update(&hasher, v.data(), v.size());
if (input_len == 0) {
return;
}
const uint8_t *input_bytes = (const uint8_t *)input;
// If we have some partial chunk bytes in the internal chunk_state, we need
// to finish that chunk first.
if (chunk_state_len(&self->chunk) > 0) {
size_t take = BLAKE3_CHUNK_LEN - chunk_state_len(&self->chunk);
if (take > input_len) {
take = input_len;
}
chunk_state_update(&self->chunk, input_bytes, take);
input_bytes += take;
input_len -= take;
// If we've filled the current chunk and there's more coming, finalize this
// chunk and proceed. In this case we know it's not the root.
if (input_len > 0) {
output_t output = chunk_state_output(&self->chunk);
uint8_t chunk_cv[32];
output_chaining_value(&output, chunk_cv);
hasher_push_cv(self, chunk_cv, self->chunk.chunk_counter);
chunk_state_reset(&self->chunk, self->key, self->chunk.chunk_counter + 1);
} else {
return;
}
}
// Now the chunk_state is clear, and we have more input. If there's more than
// a single chunk (so, definitely not the root chunk), hash the largest whole
// subtree we can, with the full benefits of SIMD (and maybe in the future,
// multi-threading) parallelism. Two restrictions:
// - The subtree has to be a power-of-2 number of chunks. Only subtrees along
// the right edge can be incomplete, and we don't know where the right edge
// is going to be until we get to finalize().
// - The subtree must evenly divide the total number of chunks up until this
// point (if total is not 0). If the current incomplete subtree is only
// waiting for 1 more chunk, we can't hash a subtree of 4 chunks. We have
// to complete the current subtree first.
// Because we might need to break up the input to form powers of 2, or to
// evenly divide what we already have, this part runs in a loop.
while (input_len > BLAKE3_CHUNK_LEN) {
size_t subtree_len = round_down_to_power_of_2(input_len);
uint64_t count_so_far = self->chunk.chunk_counter * BLAKE3_CHUNK_LEN;
// Shrink the subtree_len until it evenly divides the count so far. We know
// that subtree_len itself is a power of 2, so we can use a bitmasking
// trick instead of an actual remainder operation. (Note that if the caller
// consistently passes power-of-2 inputs of the same size, as is hopefully
// typical, this loop condition will always fail, and subtree_len will
// always be the full length of the input.)
//
// An aside: We don't have to shrink subtree_len quite this much. For
// example, if count_so_far is 1, we could pass 2 chunks to
// compress_subtree_to_parent_node. Since we'll get 2 CVs back, we'll still
// get the right answer in the end, and we might get to use 2-way SIMD
// parallelism. The problem with this optimization, is that it gets us
// stuck always hashing 2 chunks. The total number of chunks will remain
// odd, and we'll never graduate to higher degrees of parallelism. See
// https://github.com/BLAKE3-team/BLAKE3/issues/69.
while ((((uint64_t)(subtree_len - 1)) & count_so_far) != 0) {
subtree_len /= 2;
}
// The shrunken subtree_len might now be 1 chunk long. If so, hash that one
// chunk by itself. Otherwise, compress the subtree into a pair of CVs.
uint64_t subtree_chunks = subtree_len / BLAKE3_CHUNK_LEN;
if (subtree_len <= BLAKE3_CHUNK_LEN) {
blake3_chunk_state chunk_state;
chunk_state_init(&chunk_state, self->key, self->chunk.flags);
chunk_state.chunk_counter = self->chunk.chunk_counter;
chunk_state_update(&chunk_state, input_bytes, subtree_len);
output_t output = chunk_state_output(&chunk_state);
uint8_t cv[BLAKE3_OUT_LEN];
output_chaining_value(&output, cv);
hasher_push_cv(self, cv, chunk_state.chunk_counter);
} else {
// This is the high-performance happy path, though getting here depends
// on the caller giving us a long enough input.
uint8_t cv_pair[2 * BLAKE3_OUT_LEN];
compress_subtree_to_parent_node(input_bytes, subtree_len, self->key,
self->chunk.chunk_counter,
self->chunk.flags, cv_pair, use_tbb);
hasher_push_cv(self, cv_pair, self->chunk.chunk_counter);
hasher_push_cv(self, &cv_pair[BLAKE3_OUT_LEN],
self->chunk.chunk_counter + (subtree_chunks / 2));
}
self->chunk.chunk_counter += subtree_chunks;
input_bytes += subtree_len;
input_len -= subtree_len;
}
// If there's any remaining input less than a full chunk, add it to the chunk
// state. In that case, also do a final merge loop to make sure the subtree
// stack doesn't contain any unmerged pairs. The remaining input means we
// know these merges are non-root. This merge loop isn't strictly necessary
// here, because hasher_push_chunk_cv already does its own merge loop, but it
// simplifies blake3_hasher_finalize below.
if (input_len > 0) {
chunk_state_update(&self->chunk, input_bytes, input_len);
hasher_merge_cv_stack(self, self->chunk.chunk_counter);
}
}
void blake3_hasher_update(blake3_hasher *self, const void *input,
size_t input_len) {
bool use_tbb = false;
blake3_hasher_update_base(self, input, input_len, use_tbb);
}
#if defined(BLAKE3_USE_TBB)
void blake3_hasher_update_tbb(blake3_hasher *self, const void *input,
size_t input_len) {
bool use_tbb = true;
blake3_hasher_update_base(self, input, input_len, use_tbb);
}
#endif // BLAKE3_USE_TBB
void blake3_hasher_finalize(const blake3_hasher *self, uint8_t *out,
size_t out_len) {
blake3_hasher_finalize_seek(self, 0, out, out_len);
}
void blake3_hasher_finalize_seek(const blake3_hasher *self, uint64_t seek,
uint8_t *out, size_t out_len) {
// Explicitly checking for zero avoids causing UB by passing a null pointer
// to memcpy. This comes up in practice with things like:
// std::vector<uint8_t> v;
// blake3_hasher_finalize(&hasher, v.data(), v.size());
if (out_len == 0) {
return;
}
// If the subtree stack is empty, then the current chunk is the root.
if (self->cv_stack_len == 0) {
output_t output = chunk_state_output(&self->chunk);
output_root_bytes(&output, seek, out, out_len);
return;
}
// If there are any bytes in the chunk state, finalize that chunk and do a
// roll-up merge between that chunk hash and every subtree in the stack. In
// this case, the extra merge loop at the end of blake3_hasher_update
// guarantees that none of the subtrees in the stack need to be merged with
// each other first. Otherwise, if there are no bytes in the chunk state,
// then the top of the stack is a chunk hash, and we start the merge from
// that.
output_t output;
size_t cvs_remaining;
if (chunk_state_len(&self->chunk) > 0) {
cvs_remaining = self->cv_stack_len;
output = chunk_state_output(&self->chunk);
} else {
// There are always at least 2 CVs in the stack in this case.
cvs_remaining = self->cv_stack_len - 2;
output = parent_output(&self->cv_stack[cvs_remaining * 32], self->key,
self->chunk.flags);
}
while (cvs_remaining > 0) {
cvs_remaining -= 1;
uint8_t parent_block[BLAKE3_BLOCK_LEN];
memcpy(parent_block, &self->cv_stack[cvs_remaining * 32], 32);
output_chaining_value(&output, &parent_block[32]);
output = parent_output(parent_block, self->key, self->chunk.flags);
}
output_root_bytes(&output, seek, out, out_len);
}
void blake3_hasher_reset(blake3_hasher *self) {
chunk_state_reset(&self->chunk, self->key, 0);
self->cv_stack_len = 0;
}

86
external/blake3/blake3.h vendored Normal file
View File

@@ -0,0 +1,86 @@
#ifndef BLAKE3_H
#define BLAKE3_H
#include <stddef.h>
#include <stdint.h>
#if !defined(BLAKE3_API)
# if defined(_WIN32) || defined(__CYGWIN__)
# if defined(BLAKE3_DLL)
# if defined(BLAKE3_DLL_EXPORTS)
# define BLAKE3_API __declspec(dllexport)
# else
# define BLAKE3_API __declspec(dllimport)
# endif
# define BLAKE3_PRIVATE
# else
# define BLAKE3_API
# define BLAKE3_PRIVATE
# endif
# elif __GNUC__ >= 4
# define BLAKE3_API __attribute__((visibility("default")))
# define BLAKE3_PRIVATE __attribute__((visibility("hidden")))
# else
# define BLAKE3_API
# define BLAKE3_PRIVATE
# endif
#endif
#ifdef __cplusplus
extern "C" {
#endif
#define BLAKE3_VERSION_STRING "1.8.2"
#define BLAKE3_KEY_LEN 32
#define BLAKE3_OUT_LEN 32
#define BLAKE3_BLOCK_LEN 64
#define BLAKE3_CHUNK_LEN 1024
#define BLAKE3_MAX_DEPTH 54
// This struct is a private implementation detail. It has to be here because
// it's part of blake3_hasher below.
typedef struct {
uint32_t cv[8];
uint64_t chunk_counter;
uint8_t buf[BLAKE3_BLOCK_LEN];
uint8_t buf_len;
uint8_t blocks_compressed;
uint8_t flags;
} blake3_chunk_state;
typedef struct {
uint32_t key[8];
blake3_chunk_state chunk;
uint8_t cv_stack_len;
// The stack size is MAX_DEPTH + 1 because we do lazy merging. For example,
// with 7 chunks, we have 3 entries in the stack. Adding an 8th chunk
// requires a 4th entry, rather than merging everything down to 1, because we
// don't know whether more input is coming. This is different from how the
// reference implementation does things.
uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];
} blake3_hasher;
BLAKE3_API const char *blake3_version(void);
BLAKE3_API void blake3_hasher_init(blake3_hasher *self);
BLAKE3_API void blake3_hasher_init_keyed(blake3_hasher *self,
const uint8_t key[BLAKE3_KEY_LEN]);
BLAKE3_API void blake3_hasher_init_derive_key(blake3_hasher *self, const char *context);
BLAKE3_API void blake3_hasher_init_derive_key_raw(blake3_hasher *self, const void *context,
size_t context_len);
BLAKE3_API void blake3_hasher_update(blake3_hasher *self, const void *input,
size_t input_len);
#if defined(BLAKE3_USE_TBB)
BLAKE3_API void blake3_hasher_update_tbb(blake3_hasher *self, const void *input,
size_t input_len);
#endif // BLAKE3_USE_TBB
BLAKE3_API void blake3_hasher_finalize(const blake3_hasher *self, uint8_t *out,
size_t out_len);
BLAKE3_API void blake3_hasher_finalize_seek(const blake3_hasher *self, uint64_t seek,
uint8_t *out, size_t out_len);
BLAKE3_API void blake3_hasher_reset(blake3_hasher *self);
#ifdef __cplusplus
}
#endif
#endif /* BLAKE3_H */

326
external/blake3/blake3_avx2.c vendored Normal file
View File

@@ -0,0 +1,326 @@
#include "blake3_impl.h"
#include <immintrin.h>
#define DEGREE 8
INLINE __m256i loadu(const uint8_t src[32]) {
return _mm256_loadu_si256((const __m256i *)src);
}
INLINE void storeu(__m256i src, uint8_t dest[16]) {
_mm256_storeu_si256((__m256i *)dest, src);
}
INLINE __m256i addv(__m256i a, __m256i b) { return _mm256_add_epi32(a, b); }
// Note that clang-format doesn't like the name "xor" for some reason.
INLINE __m256i xorv(__m256i a, __m256i b) { return _mm256_xor_si256(a, b); }
INLINE __m256i set1(uint32_t x) { return _mm256_set1_epi32((int32_t)x); }
INLINE __m256i rot16(__m256i x) {
return _mm256_shuffle_epi8(
x, _mm256_set_epi8(13, 12, 15, 14, 9, 8, 11, 10, 5, 4, 7, 6, 1, 0, 3, 2,
13, 12, 15, 14, 9, 8, 11, 10, 5, 4, 7, 6, 1, 0, 3, 2));
}
INLINE __m256i rot12(__m256i x) {
return _mm256_or_si256(_mm256_srli_epi32(x, 12), _mm256_slli_epi32(x, 32 - 12));
}
INLINE __m256i rot8(__m256i x) {
return _mm256_shuffle_epi8(
x, _mm256_set_epi8(12, 15, 14, 13, 8, 11, 10, 9, 4, 7, 6, 5, 0, 3, 2, 1,
12, 15, 14, 13, 8, 11, 10, 9, 4, 7, 6, 5, 0, 3, 2, 1));
}
INLINE __m256i rot7(__m256i x) {
return _mm256_or_si256(_mm256_srli_epi32(x, 7), _mm256_slli_epi32(x, 32 - 7));
}
INLINE void round_fn(__m256i v[16], __m256i m[16], size_t r) {
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][0]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][2]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][4]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][6]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[15] = rot16(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot12(v[4]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][1]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][3]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][5]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][7]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[15] = rot8(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot7(v[4]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][8]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][10]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][12]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][14]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot16(v[15]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[4] = rot12(v[4]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][9]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][11]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][13]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][15]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot8(v[15]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[4] = rot7(v[4]);
}
INLINE void transpose_vecs(__m256i vecs[DEGREE]) {
// Interleave 32-bit lanes. The low unpack is lanes 00/11/44/55, and the high
// is 22/33/66/77.
__m256i ab_0145 = _mm256_unpacklo_epi32(vecs[0], vecs[1]);
__m256i ab_2367 = _mm256_unpackhi_epi32(vecs[0], vecs[1]);
__m256i cd_0145 = _mm256_unpacklo_epi32(vecs[2], vecs[3]);
__m256i cd_2367 = _mm256_unpackhi_epi32(vecs[2], vecs[3]);
__m256i ef_0145 = _mm256_unpacklo_epi32(vecs[4], vecs[5]);
__m256i ef_2367 = _mm256_unpackhi_epi32(vecs[4], vecs[5]);
__m256i gh_0145 = _mm256_unpacklo_epi32(vecs[6], vecs[7]);
__m256i gh_2367 = _mm256_unpackhi_epi32(vecs[6], vecs[7]);
// Interleave 64-bit lanes. The low unpack is lanes 00/22 and the high is
// 11/33.
__m256i abcd_04 = _mm256_unpacklo_epi64(ab_0145, cd_0145);
__m256i abcd_15 = _mm256_unpackhi_epi64(ab_0145, cd_0145);
__m256i abcd_26 = _mm256_unpacklo_epi64(ab_2367, cd_2367);
__m256i abcd_37 = _mm256_unpackhi_epi64(ab_2367, cd_2367);
__m256i efgh_04 = _mm256_unpacklo_epi64(ef_0145, gh_0145);
__m256i efgh_15 = _mm256_unpackhi_epi64(ef_0145, gh_0145);
__m256i efgh_26 = _mm256_unpacklo_epi64(ef_2367, gh_2367);
__m256i efgh_37 = _mm256_unpackhi_epi64(ef_2367, gh_2367);
// Interleave 128-bit lanes.
vecs[0] = _mm256_permute2x128_si256(abcd_04, efgh_04, 0x20);
vecs[1] = _mm256_permute2x128_si256(abcd_15, efgh_15, 0x20);
vecs[2] = _mm256_permute2x128_si256(abcd_26, efgh_26, 0x20);
vecs[3] = _mm256_permute2x128_si256(abcd_37, efgh_37, 0x20);
vecs[4] = _mm256_permute2x128_si256(abcd_04, efgh_04, 0x31);
vecs[5] = _mm256_permute2x128_si256(abcd_15, efgh_15, 0x31);
vecs[6] = _mm256_permute2x128_si256(abcd_26, efgh_26, 0x31);
vecs[7] = _mm256_permute2x128_si256(abcd_37, efgh_37, 0x31);
}
INLINE void transpose_msg_vecs(const uint8_t *const *inputs,
size_t block_offset, __m256i out[16]) {
out[0] = loadu(&inputs[0][block_offset + 0 * sizeof(__m256i)]);
out[1] = loadu(&inputs[1][block_offset + 0 * sizeof(__m256i)]);
out[2] = loadu(&inputs[2][block_offset + 0 * sizeof(__m256i)]);
out[3] = loadu(&inputs[3][block_offset + 0 * sizeof(__m256i)]);
out[4] = loadu(&inputs[4][block_offset + 0 * sizeof(__m256i)]);
out[5] = loadu(&inputs[5][block_offset + 0 * sizeof(__m256i)]);
out[6] = loadu(&inputs[6][block_offset + 0 * sizeof(__m256i)]);
out[7] = loadu(&inputs[7][block_offset + 0 * sizeof(__m256i)]);
out[8] = loadu(&inputs[0][block_offset + 1 * sizeof(__m256i)]);
out[9] = loadu(&inputs[1][block_offset + 1 * sizeof(__m256i)]);
out[10] = loadu(&inputs[2][block_offset + 1 * sizeof(__m256i)]);
out[11] = loadu(&inputs[3][block_offset + 1 * sizeof(__m256i)]);
out[12] = loadu(&inputs[4][block_offset + 1 * sizeof(__m256i)]);
out[13] = loadu(&inputs[5][block_offset + 1 * sizeof(__m256i)]);
out[14] = loadu(&inputs[6][block_offset + 1 * sizeof(__m256i)]);
out[15] = loadu(&inputs[7][block_offset + 1 * sizeof(__m256i)]);
for (size_t i = 0; i < 8; ++i) {
_mm_prefetch((const void *)&inputs[i][block_offset + 256], _MM_HINT_T0);
}
transpose_vecs(&out[0]);
transpose_vecs(&out[8]);
}
INLINE void load_counters(uint64_t counter, bool increment_counter,
__m256i *out_lo, __m256i *out_hi) {
const __m256i mask = _mm256_set1_epi32(-(int32_t)increment_counter);
const __m256i add0 = _mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0);
const __m256i add1 = _mm256_and_si256(mask, add0);
__m256i l = _mm256_add_epi32(_mm256_set1_epi32((int32_t)counter), add1);
__m256i carry = _mm256_cmpgt_epi32(_mm256_xor_si256(add1, _mm256_set1_epi32(0x80000000)),
_mm256_xor_si256( l, _mm256_set1_epi32(0x80000000)));
__m256i h = _mm256_sub_epi32(_mm256_set1_epi32((int32_t)(counter >> 32)), carry);
*out_lo = l;
*out_hi = h;
}
static
void blake3_hash8_avx2(const uint8_t *const *inputs, size_t blocks,
const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out) {
__m256i h_vecs[8] = {
set1(key[0]), set1(key[1]), set1(key[2]), set1(key[3]),
set1(key[4]), set1(key[5]), set1(key[6]), set1(key[7]),
};
__m256i counter_low_vec, counter_high_vec;
load_counters(counter, increment_counter, &counter_low_vec,
&counter_high_vec);
uint8_t block_flags = flags | flags_start;
for (size_t block = 0; block < blocks; block++) {
if (block + 1 == blocks) {
block_flags |= flags_end;
}
__m256i block_len_vec = set1(BLAKE3_BLOCK_LEN);
__m256i block_flags_vec = set1(block_flags);
__m256i msg_vecs[16];
transpose_msg_vecs(inputs, block * BLAKE3_BLOCK_LEN, msg_vecs);
__m256i v[16] = {
h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3],
h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7],
set1(IV[0]), set1(IV[1]), set1(IV[2]), set1(IV[3]),
counter_low_vec, counter_high_vec, block_len_vec, block_flags_vec,
};
round_fn(v, msg_vecs, 0);
round_fn(v, msg_vecs, 1);
round_fn(v, msg_vecs, 2);
round_fn(v, msg_vecs, 3);
round_fn(v, msg_vecs, 4);
round_fn(v, msg_vecs, 5);
round_fn(v, msg_vecs, 6);
h_vecs[0] = xorv(v[0], v[8]);
h_vecs[1] = xorv(v[1], v[9]);
h_vecs[2] = xorv(v[2], v[10]);
h_vecs[3] = xorv(v[3], v[11]);
h_vecs[4] = xorv(v[4], v[12]);
h_vecs[5] = xorv(v[5], v[13]);
h_vecs[6] = xorv(v[6], v[14]);
h_vecs[7] = xorv(v[7], v[15]);
block_flags = flags;
}
transpose_vecs(h_vecs);
storeu(h_vecs[0], &out[0 * sizeof(__m256i)]);
storeu(h_vecs[1], &out[1 * sizeof(__m256i)]);
storeu(h_vecs[2], &out[2 * sizeof(__m256i)]);
storeu(h_vecs[3], &out[3 * sizeof(__m256i)]);
storeu(h_vecs[4], &out[4 * sizeof(__m256i)]);
storeu(h_vecs[5], &out[5 * sizeof(__m256i)]);
storeu(h_vecs[6], &out[6 * sizeof(__m256i)]);
storeu(h_vecs[7], &out[7 * sizeof(__m256i)]);
}
#if !defined(BLAKE3_NO_SSE41)
void blake3_hash_many_sse41(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#else
void blake3_hash_many_portable(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#endif
void blake3_hash_many_avx2(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out) {
while (num_inputs >= DEGREE) {
blake3_hash8_avx2(inputs, blocks, key, counter, increment_counter, flags,
flags_start, flags_end, out);
if (increment_counter) {
counter += DEGREE;
}
inputs += DEGREE;
num_inputs -= DEGREE;
out = &out[DEGREE * BLAKE3_OUT_LEN];
}
#if !defined(BLAKE3_NO_SSE41)
blake3_hash_many_sse41(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end, out);
#else
blake3_hash_many_portable(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
#endif
}

1815
external/blake3/blake3_avx2_x86-64_unix.S vendored Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1388
external/blake3/blake3_avx512.c vendored Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,32 @@
# These are Rust bindings for the C implementation of BLAKE3. As there is a
# native (and faster) Rust implementation of BLAKE3 provided in this same repo,
# these bindings are not expected to be used in production. They're intended
# for testing and benchmarking.
[package]
name = "blake3_c_rust_bindings"
version = "0.0.0"
description = "TESTING ONLY Rust bindings for the BLAKE3 C implementation"
edition = "2021"
[features]
# By default the x86-64 build uses assembly implementations. This feature makes
# the build use the C intrinsics implementations instead.
prefer_intrinsics = []
# Activate NEON bindings. We don't currently do any CPU feature detection for
# this. If this Cargo feature is on, the NEON gets used.
neon = []
# Enable TBB-based multithreading.
tbb = []
[dev-dependencies]
arrayref = "0.3.5"
arrayvec = { version = "0.7.0", default-features = false }
page_size = "0.6.0"
rand = "0.9.0"
rand_chacha = "0.9.0"
reference_impl = { path = "../../reference_impl" }
[build-dependencies]
cc = "1.0.48"
ignore = "0.4.23"

View File

@@ -0,0 +1,4 @@
These are Rust bindings for the C implementation of BLAKE3. As there is
a native Rust implementation of BLAKE3 provided in this same repo, these
bindings are not expected to be used in production. They're intended for
testing and benchmarking.

View File

@@ -0,0 +1,477 @@
#![feature(test)]
extern crate test;
use arrayref::array_ref;
use arrayvec::ArrayVec;
use rand::prelude::*;
use test::Bencher;
const KIB: usize = 1024;
const MAX_SIMD_DEGREE: usize = 16;
const BLOCK_LEN: usize = 64;
const CHUNK_LEN: usize = 1024;
const OUT_LEN: usize = 32;
// This struct randomizes two things:
// 1. The actual bytes of input.
// 2. The page offset the input starts at.
pub struct RandomInput {
buf: Vec<u8>,
len: usize,
offsets: Vec<usize>,
offset_index: usize,
}
impl RandomInput {
pub fn new(b: &mut Bencher, len: usize) -> Self {
b.bytes += len as u64;
let page_size: usize = page_size::get();
let mut buf = vec![0u8; len + page_size];
let mut rng = rand::rng();
rng.fill_bytes(&mut buf);
let mut offsets: Vec<usize> = (0..page_size).collect();
offsets.shuffle(&mut rng);
Self {
buf,
len,
offsets,
offset_index: 0,
}
}
pub fn get(&mut self) -> &[u8] {
let offset = self.offsets[self.offset_index];
self.offset_index += 1;
if self.offset_index >= self.offsets.len() {
self.offset_index = 0;
}
&self.buf[offset..][..self.len]
}
}
type CompressInPlaceFn =
unsafe extern "C" fn(cv: *mut u32, block: *const u8, block_len: u8, counter: u64, flags: u8);
fn bench_single_compression_fn(b: &mut Bencher, f: CompressInPlaceFn) {
let mut state = [1u32; 8];
let mut r = RandomInput::new(b, 64);
let input = array_ref!(r.get(), 0, 64);
b.iter(|| unsafe { f(state.as_mut_ptr(), input.as_ptr(), 64, 0, 0) });
}
#[bench]
fn bench_single_compression_portable(b: &mut Bencher) {
bench_single_compression_fn(
b,
blake3_c_rust_bindings::ffi::blake3_compress_in_place_portable,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_single_compression_sse2(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse2_detected() {
return;
}
bench_single_compression_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_compress_in_place_sse2,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_single_compression_sse41(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse41_detected() {
return;
}
bench_single_compression_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_compress_in_place_sse41,
);
}
#[bench]
fn bench_single_compression_avx512(b: &mut Bencher) {
if !blake3_c_rust_bindings::avx512_detected() {
return;
}
bench_single_compression_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_compress_in_place_avx512,
);
}
type HashManyFn = unsafe extern "C" fn(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
fn bench_many_chunks_fn(b: &mut Bencher, f: HashManyFn, degree: usize) {
let mut inputs = Vec::new();
for _ in 0..degree {
inputs.push(RandomInput::new(b, CHUNK_LEN));
}
b.iter(|| {
let input_arrays: ArrayVec<&[u8; CHUNK_LEN], MAX_SIMD_DEGREE> = inputs
.iter_mut()
.take(degree)
.map(|i| array_ref!(i.get(), 0, CHUNK_LEN))
.collect();
let mut out = [0; MAX_SIMD_DEGREE * OUT_LEN];
unsafe {
f(
input_arrays.as_ptr() as _,
input_arrays.len(),
CHUNK_LEN / BLOCK_LEN,
[0u32; 8].as_ptr(),
0,
true,
0,
0,
0,
out.as_mut_ptr(),
)
}
});
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_chunks_sse2(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse2_detected() {
return;
}
bench_many_chunks_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_sse2,
4,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_chunks_sse41(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse41_detected() {
return;
}
bench_many_chunks_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_sse41,
4,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_chunks_avx2(b: &mut Bencher) {
if !blake3_c_rust_bindings::avx2_detected() {
return;
}
bench_many_chunks_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_avx2,
8,
);
}
#[bench]
fn bench_many_chunks_avx512(b: &mut Bencher) {
if !blake3_c_rust_bindings::avx512_detected() {
return;
}
bench_many_chunks_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_avx512,
16,
);
}
#[bench]
#[cfg(feature = "neon")]
fn bench_many_chunks_neon(b: &mut Bencher) {
// When "neon" is on, NEON support is assumed.
bench_many_chunks_fn(
b,
blake3_c_rust_bindings::ffi::neon::blake3_hash_many_neon,
4,
);
}
// TODO: When we get const generics we can unify this with the chunks code.
fn bench_many_parents_fn(b: &mut Bencher, f: HashManyFn, degree: usize) {
let mut inputs = Vec::new();
for _ in 0..degree {
inputs.push(RandomInput::new(b, BLOCK_LEN));
}
b.iter(|| {
let input_arrays: ArrayVec<&[u8; BLOCK_LEN], MAX_SIMD_DEGREE> = inputs
.iter_mut()
.take(degree)
.map(|i| array_ref!(i.get(), 0, BLOCK_LEN))
.collect();
let mut out = [0; MAX_SIMD_DEGREE * OUT_LEN];
unsafe {
f(
input_arrays.as_ptr() as _,
input_arrays.len(),
1,
[0u32; 8].as_ptr(),
0,
false,
0,
0,
0,
out.as_mut_ptr(),
)
}
});
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_parents_sse2(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse2_detected() {
return;
}
bench_many_parents_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_sse2,
4,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_parents_sse41(b: &mut Bencher) {
if !blake3_c_rust_bindings::sse41_detected() {
return;
}
bench_many_parents_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_sse41,
4,
);
}
#[bench]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn bench_many_parents_avx2(b: &mut Bencher) {
if !blake3_c_rust_bindings::avx2_detected() {
return;
}
bench_many_parents_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_avx2,
8,
);
}
#[bench]
fn bench_many_parents_avx512(b: &mut Bencher) {
if !blake3_c_rust_bindings::avx512_detected() {
return;
}
bench_many_parents_fn(
b,
blake3_c_rust_bindings::ffi::x86::blake3_hash_many_avx512,
16,
);
}
#[bench]
#[cfg(feature = "neon")]
fn bench_many_parents_neon(b: &mut Bencher) {
// When "neon" is on, NEON support is assumed.
bench_many_parents_fn(
b,
blake3_c_rust_bindings::ffi::neon::blake3_hash_many_neon,
4,
);
}
fn bench_incremental(b: &mut Bencher, len: usize) {
let mut input = RandomInput::new(b, len);
b.iter(|| {
let mut hasher = blake3_c_rust_bindings::Hasher::new();
hasher.update(input.get());
let mut out = [0; 32];
hasher.finalize(&mut out);
out
});
}
#[bench]
fn bench_incremental_0001_block(b: &mut Bencher) {
bench_incremental(b, BLOCK_LEN);
}
#[bench]
fn bench_incremental_0001_kib(b: &mut Bencher) {
bench_incremental(b, 1 * KIB);
}
#[bench]
fn bench_incremental_0002_kib(b: &mut Bencher) {
bench_incremental(b, 2 * KIB);
}
#[bench]
fn bench_incremental_0004_kib(b: &mut Bencher) {
bench_incremental(b, 4 * KIB);
}
#[bench]
fn bench_incremental_0008_kib(b: &mut Bencher) {
bench_incremental(b, 8 * KIB);
}
#[bench]
fn bench_incremental_0016_kib(b: &mut Bencher) {
bench_incremental(b, 16 * KIB);
}
#[bench]
fn bench_incremental_0032_kib(b: &mut Bencher) {
bench_incremental(b, 32 * KIB);
}
#[bench]
fn bench_incremental_0064_kib(b: &mut Bencher) {
bench_incremental(b, 64 * KIB);
}
#[bench]
fn bench_incremental_0128_kib(b: &mut Bencher) {
bench_incremental(b, 128 * KIB);
}
#[bench]
fn bench_incremental_0256_kib(b: &mut Bencher) {
bench_incremental(b, 256 * KIB);
}
#[bench]
fn bench_incremental_0512_kib(b: &mut Bencher) {
bench_incremental(b, 512 * KIB);
}
#[bench]
fn bench_incremental_1024_kib(b: &mut Bencher) {
bench_incremental(b, 1024 * KIB);
}
#[cfg(feature = "tbb")]
fn bench_tbb(b: &mut Bencher, len: usize) {
let mut input = RandomInput::new(b, len);
b.iter(|| {
let mut hasher = blake3_c_rust_bindings::Hasher::new();
hasher.update_tbb(input.get());
let mut out = [0; 32];
hasher.finalize(&mut out);
out
});
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0001_block(b: &mut Bencher) {
bench_tbb(b, BLOCK_LEN);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0001_kib(b: &mut Bencher) {
bench_tbb(b, 1 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0002_kib(b: &mut Bencher) {
bench_tbb(b, 2 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0004_kib(b: &mut Bencher) {
bench_tbb(b, 4 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0008_kib(b: &mut Bencher) {
bench_tbb(b, 8 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0016_kib(b: &mut Bencher) {
bench_tbb(b, 16 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0032_kib(b: &mut Bencher) {
bench_tbb(b, 32 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0064_kib(b: &mut Bencher) {
bench_tbb(b, 64 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0128_kib(b: &mut Bencher) {
bench_tbb(b, 128 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0256_kib(b: &mut Bencher) {
bench_tbb(b, 256 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_0512_kib(b: &mut Bencher) {
bench_tbb(b, 512 * KIB);
}
#[bench]
#[cfg(feature = "tbb")]
fn bench_tbb_1024_kib(b: &mut Bencher) {
bench_tbb(b, 1024 * KIB);
}
// This checks that update() splits up its input in increasing powers of 2, so
// that it can recover a high degree of parallelism when the number of bytes
// hashed so far is uneven. The performance of this benchmark should be
// reasonably close to bench_incremental_0064_kib, within 80% or so. When we
// had a bug in this logic (https://github.com/BLAKE3-team/BLAKE3/issues/69),
// performance was less than half.
#[bench]
fn bench_two_updates(b: &mut Bencher) {
let len = 65536;
let mut input = RandomInput::new(b, len);
b.iter(|| {
let mut hasher = blake3_c_rust_bindings::Hasher::new();
let input = input.get();
hasher.update(&input[..1]);
hasher.update(&input[1..]);
let mut out = [0; 32];
hasher.finalize(&mut out);
out
});
}

View File

@@ -0,0 +1,253 @@
use std::env;
fn defined(var: &str) -> bool {
env::var_os(var).is_some()
}
fn target_components() -> Vec<String> {
let target = env::var("TARGET").unwrap();
target.split("-").map(|s| s.to_string()).collect()
}
fn is_x86_64() -> bool {
target_components()[0] == "x86_64"
}
fn is_windows_target() -> bool {
env::var("CARGO_CFG_TARGET_OS").unwrap() == "windows"
}
fn use_msvc_asm() -> bool {
const MSVC_NAMES: &[&str] = &["", "cl", "cl.exe"];
let target_os = env::var("CARGO_CFG_TARGET_OS").unwrap_or_default();
let target_env = env::var("CARGO_CFG_TARGET_ENV").unwrap_or_default();
let target_windows_msvc = target_os == "windows" && target_env == "msvc";
let host_triple = env::var("HOST").unwrap_or_default();
let target_triple = env::var("TARGET").unwrap_or_default();
let cross_compiling = host_triple != target_triple;
let cc = env::var("CC").unwrap_or_default().to_ascii_lowercase();
if !target_windows_msvc {
// We are not building for Windows with the MSVC toolchain.
false
} else if !cross_compiling && MSVC_NAMES.contains(&&*cc) {
// We are building on Windows with the MSVC toolchain (and not cross-compiling for another architecture or target).
true
} else {
// We are cross-compiling to Windows with the MSVC toolchain.
let target_arch = env::var("CARGO_CFG_TARGET_ARCH").unwrap_or_default();
let target_vendor = env::var("CARGO_CFG_TARGET_VENDOR").unwrap_or_default();
let cc = env::var(format!("CC_{target_arch}_{target_vendor}_windows_msvc"))
.unwrap_or_default()
.to_ascii_lowercase();
// Check if we are using the MSVC compiler.
MSVC_NAMES.contains(&&*cc)
}
}
fn is_x86_32() -> bool {
let arch = &target_components()[0];
arch == "i386" || arch == "i586" || arch == "i686"
}
fn is_armv7() -> bool {
target_components()[0] == "armv7"
}
fn is_aarch64() -> bool {
target_components()[0] == "aarch64"
}
// Windows targets may be using the MSVC toolchain or the GNU toolchain. The
// right compiler flags to use depend on the toolchain. (And we don't want to
// use flag_if_supported, because we don't want features to be silently
// disabled by old compilers.)
fn is_windows_msvc() -> bool {
// Some targets are only two components long, so check in steps.
target_components()[1] == "pc"
&& target_components()[2] == "windows"
&& target_components()[3] == "msvc"
}
fn new_build() -> cc::Build {
let mut build = cc::Build::new();
if !is_windows_msvc() {
build.flag("-std=c11");
}
build
}
fn new_cpp_build() -> cc::Build {
let mut build = cc::Build::new();
build.cpp(true);
if is_windows_msvc() {
build.flag("/std:c++20");
build.flag("/EHs-c-");
build.flag("/GR-");
} else {
build.flag("-std=c++20");
build.flag("-fno-exceptions");
build.flag("-fno-rtti");
}
build
}
fn c_dir_path(filename: &str) -> String {
// The `cross` tool doesn't support reading files in parent directories. As a hacky workaround
// in `cross_test.sh`, we move the c/ directory around and set BLAKE3_C_DIR_OVERRIDE. Regular
// building and testing doesn't require this.
if let Ok(c_dir_override) = env::var("BLAKE3_C_DIR_OVERRIDE") {
c_dir_override + "/" + filename
} else {
"../".to_string() + filename
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut base_build = new_build();
base_build.file(c_dir_path("blake3.c"));
base_build.file(c_dir_path("blake3_dispatch.c"));
base_build.file(c_dir_path("blake3_portable.c"));
if cfg!(feature = "tbb") {
base_build.define("BLAKE3_USE_TBB", "1");
}
base_build.compile("blake3_base");
if cfg!(feature = "tbb") {
let mut tbb_build = new_cpp_build();
tbb_build.define("BLAKE3_USE_TBB", "1");
tbb_build.file(c_dir_path("blake3_tbb.cpp"));
tbb_build.compile("blake3_tbb");
println!("cargo::rustc-link-lib=tbb");
}
if is_x86_64() && !defined("CARGO_FEATURE_PREFER_INTRINSICS") {
// On 64-bit, use the assembly implementations, unless the
// "prefer_intrinsics" feature is enabled.
if is_windows_target() {
if use_msvc_asm() {
let mut build = new_build();
build.file(c_dir_path("blake3_sse2_x86-64_windows_msvc.asm"));
build.file(c_dir_path("blake3_sse41_x86-64_windows_msvc.asm"));
build.file(c_dir_path("blake3_avx2_x86-64_windows_msvc.asm"));
build.file(c_dir_path("blake3_avx512_x86-64_windows_msvc.asm"));
build.compile("blake3_asm");
} else {
let mut build = new_build();
build.file(c_dir_path("blake3_sse2_x86-64_windows_gnu.S"));
build.file(c_dir_path("blake3_sse41_x86-64_windows_gnu.S"));
build.file(c_dir_path("blake3_avx2_x86-64_windows_gnu.S"));
build.file(c_dir_path("blake3_avx512_x86-64_windows_gnu.S"));
build.compile("blake3_asm");
}
} else {
// All non-Windows implementations are assumed to support
// Linux-style assembly. These files do contain a small
// explicit workaround for macOS also.
let mut build = new_build();
build.file(c_dir_path("blake3_sse2_x86-64_unix.S"));
build.file(c_dir_path("blake3_sse41_x86-64_unix.S"));
build.file(c_dir_path("blake3_avx2_x86-64_unix.S"));
build.file(c_dir_path("blake3_avx512_x86-64_unix.S"));
build.compile("blake3_asm");
}
} else if is_x86_64() || is_x86_32() {
// Assembly implementations are only for 64-bit. On 32-bit, or if
// the "prefer_intrinsics" feature is enabled, use the
// intrinsics-based C implementations. These each need to be
// compiled separately, with the corresponding instruction set
// extension explicitly enabled in the compiler.
let mut sse2_build = new_build();
sse2_build.file(c_dir_path("blake3_sse2.c"));
if is_windows_msvc() {
// /arch:SSE2 is the default on x86 and undefined on x86_64:
// https://docs.microsoft.com/en-us/cpp/build/reference/arch-x86
// It also includes SSE4.1 intrinsics:
// https://stackoverflow.com/a/32183222/823869
} else {
sse2_build.flag("-msse2");
}
sse2_build.compile("blake3_sse2");
let mut sse41_build = new_build();
sse41_build.file(c_dir_path("blake3_sse41.c"));
if is_windows_msvc() {
// /arch:SSE2 is the default on x86 and undefined on x86_64:
// https://docs.microsoft.com/en-us/cpp/build/reference/arch-x86
// It also includes SSE4.1 intrinsics:
// https://stackoverflow.com/a/32183222/823869
} else {
sse41_build.flag("-msse4.1");
}
sse41_build.compile("blake3_sse41");
let mut avx2_build = new_build();
avx2_build.file(c_dir_path("blake3_avx2.c"));
if is_windows_msvc() {
avx2_build.flag("/arch:AVX2");
} else {
avx2_build.flag("-mavx2");
}
avx2_build.compile("blake3_avx2");
let mut avx512_build = new_build();
avx512_build.file(c_dir_path("blake3_avx512.c"));
if is_windows_msvc() {
// Note that a lot of versions of MSVC don't support /arch:AVX512,
// and they'll discard it with a warning, hopefully leading to a
// build error.
avx512_build.flag("/arch:AVX512");
} else {
avx512_build.flag("-mavx512f");
avx512_build.flag("-mavx512vl");
}
avx512_build.compile("blake3_avx512");
}
// We only build NEON code here if
// 1) it's requested
// and 2) the root crate is not already building it.
// The only time this will really happen is if you build this
// crate by hand with the "neon" feature for some reason.
//
// In addition, 3) if the target is aarch64, NEON is on by default.
if defined("CARGO_FEATURE_NEON") || is_aarch64() {
let mut neon_build = new_build();
neon_build.file(c_dir_path("blake3_neon.c"));
// ARMv7 platforms that support NEON generally need the following
// flags. AArch64 supports NEON by default and does not support -mpfu.
if is_armv7() {
neon_build.flag("-mfpu=neon-vfpv4");
neon_build.flag("-mfloat-abi=hard");
}
neon_build.compile("blake3_neon");
}
// The `cc` crate does not automatically emit rerun-if directives for the
// environment variables it supports, in particular for $CC. We expect to
// do a lot of benchmarking across different compilers, so we explicitly
// add the variables that we're likely to need.
println!("cargo:rerun-if-env-changed=CC");
println!("cargo:rerun-if-env-changed=CFLAGS");
// Ditto for source files, though these shouldn't change as often. `ignore::Walk` respects
// .gitignore, so this doesn't traverse target/.
for result in ignore::Walk::new("..") {
let result = result?;
let path = result.path();
if path.is_file() {
println!("cargo:rerun-if-changed={}", path.to_str().unwrap());
}
}
// When compiling with clang-cl for windows, it adds .asm files to the root
// which we need to delete so cargo doesn't get angry
if is_windows_target() && !use_msvc_asm() {
let _ = std::fs::remove_file("blake3_avx2_x86-64_windows_gnu.asm");
let _ = std::fs::remove_file("blake3_avx512_x86-64_windows_gnu.asm");
let _ = std::fs::remove_file("blake3_sse2_x86-64_windows_gnu.asm");
let _ = std::fs::remove_file("blake3_sse41_x86-64_windows_gnu.asm");
}
Ok(())
}

View File

@@ -0,0 +1,31 @@
#! /usr/bin/env bash
# This hacky script works around the fact that `cross test` does not support
# path dependencies. (It uses a docker shared folder to let the guest access
# project files, so parent directories aren't available.) Solve this problem by
# copying the entire project to a temp dir and rearranging paths to put "c" and
# "reference_impl" underneath "blake3_c_rust_bindings", so that everything is
# accessible. Hopefully this will just run on CI forever and no one will ever
# read this and discover my deep shame.
set -e -u -o pipefail
project_root="$(realpath "$(dirname "$BASH_SOURCE")/../..")"
tmpdir="$(mktemp -d)"
echo "Running cross tests in $tmpdir"
cd "$tmpdir"
git clone "$project_root" blake3
mv blake3/c/blake3_c_rust_bindings .
mv blake3/reference_impl blake3_c_rust_bindings
mv blake3/c blake3_c_rust_bindings
cd blake3_c_rust_bindings
sed -i 's|reference_impl = { path = "../../reference_impl" }|reference_impl = { path = "reference_impl" }|' Cargo.toml
export BLAKE3_C_DIR_OVERRIDE="./c"
cat > Cross.toml << EOF
[build.env]
passthrough = [
"BLAKE3_C_DIR_OVERRIDE",
]
EOF
cross test "$@"

View File

@@ -0,0 +1,333 @@
//! These are Rust bindings for the C implementation of BLAKE3. As there is a
//! native (and faster) Rust implementation of BLAKE3 provided in this same
//! repo, these bindings are not expected to be used in production. They're
//! intended for testing and benchmarking.
use std::ffi::{c_void, CString};
use std::mem::MaybeUninit;
#[cfg(test)]
mod test;
pub const BLOCK_LEN: usize = 64;
pub const CHUNK_LEN: usize = 1024;
pub const OUT_LEN: usize = 32;
// Feature detection functions for tests and benchmarks. Note that the C code
// does its own feature detection in blake3_dispatch.c.
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub fn sse2_detected() -> bool {
is_x86_feature_detected!("sse2")
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub fn sse41_detected() -> bool {
is_x86_feature_detected!("sse4.1")
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub fn avx2_detected() -> bool {
is_x86_feature_detected!("avx2")
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub fn avx512_detected() -> bool {
is_x86_feature_detected!("avx512f") && is_x86_feature_detected!("avx512vl")
}
#[derive(Clone)]
pub struct Hasher(ffi::blake3_hasher);
impl Hasher {
pub fn new() -> Self {
let mut c_state = MaybeUninit::uninit();
unsafe {
ffi::blake3_hasher_init(c_state.as_mut_ptr());
Self(c_state.assume_init())
}
}
pub fn new_keyed(key: &[u8; 32]) -> Self {
let mut c_state = MaybeUninit::uninit();
unsafe {
ffi::blake3_hasher_init_keyed(c_state.as_mut_ptr(), key.as_ptr());
Self(c_state.assume_init())
}
}
pub fn new_derive_key(context: &str) -> Self {
let mut c_state = MaybeUninit::uninit();
let context_c_string = CString::new(context).expect("valid C string, no null bytes");
unsafe {
ffi::blake3_hasher_init_derive_key(c_state.as_mut_ptr(), context_c_string.as_ptr());
Self(c_state.assume_init())
}
}
pub fn new_derive_key_raw(context: &[u8]) -> Self {
let mut c_state = MaybeUninit::uninit();
unsafe {
ffi::blake3_hasher_init_derive_key_raw(
c_state.as_mut_ptr(),
context.as_ptr() as *const _,
context.len(),
);
Self(c_state.assume_init())
}
}
pub fn update(&mut self, input: &[u8]) {
unsafe {
ffi::blake3_hasher_update(&mut self.0, input.as_ptr() as *const c_void, input.len());
}
}
#[cfg(feature = "tbb")]
pub fn update_tbb(&mut self, input: &[u8]) {
unsafe {
ffi::blake3_hasher_update_tbb(
&mut self.0,
input.as_ptr() as *const c_void,
input.len(),
);
}
}
pub fn finalize(&self, output: &mut [u8]) {
unsafe {
ffi::blake3_hasher_finalize(&self.0, output.as_mut_ptr(), output.len());
}
}
pub fn finalize_seek(&self, seek: u64, output: &mut [u8]) {
unsafe {
ffi::blake3_hasher_finalize_seek(&self.0, seek, output.as_mut_ptr(), output.len());
}
}
pub fn reset(&mut self) {
unsafe {
ffi::blake3_hasher_reset(&mut self.0);
}
}
}
pub mod ffi {
#[repr(C)]
#[derive(Copy, Clone)]
pub struct blake3_chunk_state {
pub cv: [u32; 8usize],
pub chunk_counter: u64,
pub buf: [u8; 64usize],
pub buf_len: u8,
pub blocks_compressed: u8,
pub flags: u8,
}
#[repr(C)]
#[derive(Copy, Clone)]
pub struct blake3_hasher {
pub key: [u32; 8usize],
pub chunk: blake3_chunk_state,
pub cv_stack_len: u8,
pub cv_stack: [u8; 1728usize],
}
extern "C" {
// public interface
pub fn blake3_hasher_init(self_: *mut blake3_hasher);
pub fn blake3_hasher_init_keyed(self_: *mut blake3_hasher, key: *const u8);
pub fn blake3_hasher_init_derive_key(
self_: *mut blake3_hasher,
context: *const ::std::os::raw::c_char,
);
pub fn blake3_hasher_init_derive_key_raw(
self_: *mut blake3_hasher,
context: *const ::std::os::raw::c_void,
context_len: usize,
);
pub fn blake3_hasher_update(
self_: *mut blake3_hasher,
input: *const ::std::os::raw::c_void,
input_len: usize,
);
#[cfg(feature = "tbb")]
pub fn blake3_hasher_update_tbb(
self_: *mut blake3_hasher,
input: *const ::std::os::raw::c_void,
input_len: usize,
);
pub fn blake3_hasher_finalize(self_: *const blake3_hasher, out: *mut u8, out_len: usize);
pub fn blake3_hasher_finalize_seek(
self_: *const blake3_hasher,
seek: u64,
out: *mut u8,
out_len: usize,
);
pub fn blake3_hasher_reset(self_: *mut blake3_hasher);
// portable low-level functions
pub fn blake3_compress_in_place_portable(
cv: *mut u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
);
pub fn blake3_compress_xof_portable(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
);
pub fn blake3_hash_many_portable(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub mod x86 {
extern "C" {
// SSE2 low level functions
pub fn blake3_compress_in_place_sse2(
cv: *mut u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
);
pub fn blake3_compress_xof_sse2(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
);
pub fn blake3_hash_many_sse2(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
// SSE4.1 low level functions
pub fn blake3_compress_in_place_sse41(
cv: *mut u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
);
pub fn blake3_compress_xof_sse41(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
);
pub fn blake3_hash_many_sse41(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
// AVX2 low level functions
pub fn blake3_hash_many_avx2(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
// AVX-512 low level functions
pub fn blake3_compress_xof_avx512(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
);
pub fn blake3_compress_in_place_avx512(
cv: *mut u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
);
pub fn blake3_hash_many_avx512(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
#[cfg(unix)]
pub fn blake3_xof_many_avx512(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
outblocks: usize,
);
}
}
#[cfg(feature = "neon")]
pub mod neon {
extern "C" {
// NEON low level functions
pub fn blake3_hash_many_neon(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
}
}
}

View File

@@ -0,0 +1,696 @@
// Most of this code is duplicated from the root `blake3` crate. Perhaps we
// could share more of it in the future.
use crate::{BLOCK_LEN, CHUNK_LEN, OUT_LEN};
use arrayref::{array_mut_ref, array_ref};
use arrayvec::ArrayVec;
use core::usize;
use rand::prelude::*;
const CHUNK_START: u8 = 1 << 0;
const CHUNK_END: u8 = 1 << 1;
const PARENT: u8 = 1 << 2;
const ROOT: u8 = 1 << 3;
const KEYED_HASH: u8 = 1 << 4;
// const DERIVE_KEY_CONTEXT: u8 = 1 << 5;
// const DERIVE_KEY_MATERIAL: u8 = 1 << 6;
// Interesting input lengths to run tests on.
pub const TEST_CASES: &[usize] = &[
0,
1,
2,
3,
4,
5,
6,
7,
8,
BLOCK_LEN - 1,
BLOCK_LEN,
BLOCK_LEN + 1,
2 * BLOCK_LEN - 1,
2 * BLOCK_LEN,
2 * BLOCK_LEN + 1,
CHUNK_LEN - 1,
CHUNK_LEN,
CHUNK_LEN + 1,
2 * CHUNK_LEN,
2 * CHUNK_LEN + 1,
3 * CHUNK_LEN,
3 * CHUNK_LEN + 1,
4 * CHUNK_LEN,
4 * CHUNK_LEN + 1,
5 * CHUNK_LEN,
5 * CHUNK_LEN + 1,
6 * CHUNK_LEN,
6 * CHUNK_LEN + 1,
7 * CHUNK_LEN,
7 * CHUNK_LEN + 1,
8 * CHUNK_LEN,
8 * CHUNK_LEN + 1,
16 * CHUNK_LEN, // AVX512's bandwidth
31 * CHUNK_LEN, // 16 + 8 + 4 + 2 + 1
100 * CHUNK_LEN, // subtrees larger than MAX_SIMD_DEGREE chunks
];
pub const TEST_CASES_MAX: usize = 100 * CHUNK_LEN;
// There's a test to make sure these two are equal below.
pub const TEST_KEY: [u8; 32] = *b"whats the Elvish word for friend";
pub const TEST_KEY_WORDS: [u32; 8] = [
1952540791, 1752440947, 1816469605, 1752394102, 1919907616, 1868963940, 1919295602, 1684956521,
];
// Paint the input with a repeating byte pattern. We use a cycle length of 251,
// because that's the largest prime number less than 256. This makes it
// unlikely to swapping any two adjacent input blocks or chunks will give the
// same answer.
fn paint_test_input(buf: &mut [u8]) {
for (i, b) in buf.iter_mut().enumerate() {
*b = (i % 251) as u8;
}
}
#[inline(always)]
fn le_bytes_from_words_32(words: &[u32; 8]) -> [u8; 32] {
let mut out = [0; 32];
*array_mut_ref!(out, 0 * 4, 4) = words[0].to_le_bytes();
*array_mut_ref!(out, 1 * 4, 4) = words[1].to_le_bytes();
*array_mut_ref!(out, 2 * 4, 4) = words[2].to_le_bytes();
*array_mut_ref!(out, 3 * 4, 4) = words[3].to_le_bytes();
*array_mut_ref!(out, 4 * 4, 4) = words[4].to_le_bytes();
*array_mut_ref!(out, 5 * 4, 4) = words[5].to_le_bytes();
*array_mut_ref!(out, 6 * 4, 4) = words[6].to_le_bytes();
*array_mut_ref!(out, 7 * 4, 4) = words[7].to_le_bytes();
out
}
type CompressInPlaceFn =
unsafe extern "C" fn(cv: *mut u32, block: *const u8, block_len: u8, counter: u64, flags: u8);
type CompressXofFn = unsafe extern "C" fn(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
);
// A shared helper function for platform-specific tests.
pub fn test_compress_fn(compress_in_place_fn: CompressInPlaceFn, compress_xof_fn: CompressXofFn) {
let initial_state = TEST_KEY_WORDS;
let block_len: u8 = 61;
let mut block = [0; BLOCK_LEN];
paint_test_input(&mut block[..block_len as usize]);
// Use a counter with set bits in both 32-bit words.
let counter = (5u64 << 32) + 6;
let flags = CHUNK_END | ROOT | KEYED_HASH;
let mut portable_out = [0; 64];
unsafe {
crate::ffi::blake3_compress_xof_portable(
initial_state.as_ptr(),
block.as_ptr(),
block_len,
counter,
flags,
portable_out.as_mut_ptr(),
);
}
let mut test_state = initial_state;
unsafe {
compress_in_place_fn(
test_state.as_mut_ptr(),
block.as_ptr(),
block_len,
counter,
flags,
)
};
let test_state_bytes = le_bytes_from_words_32(&test_state);
let mut test_xof = [0; 64];
unsafe {
compress_xof_fn(
initial_state.as_ptr(),
block.as_ptr(),
block_len,
counter,
flags,
test_xof.as_mut_ptr(),
)
};
assert_eq!(&portable_out[..32], &test_state_bytes[..]);
assert_eq!(&portable_out[..], &test_xof[..]);
}
// Testing the portable implementation against itself is circular, but why not.
#[test]
fn test_compress_portable() {
test_compress_fn(
crate::ffi::blake3_compress_in_place_portable,
crate::ffi::blake3_compress_xof_portable,
);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_compress_sse2() {
if !crate::sse2_detected() {
return;
}
test_compress_fn(
crate::ffi::x86::blake3_compress_in_place_sse2,
crate::ffi::x86::blake3_compress_xof_sse2,
);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_compress_sse41() {
if !crate::sse41_detected() {
return;
}
test_compress_fn(
crate::ffi::x86::blake3_compress_in_place_sse41,
crate::ffi::x86::blake3_compress_xof_sse41,
);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_compress_avx512() {
if !crate::avx512_detected() {
return;
}
test_compress_fn(
crate::ffi::x86::blake3_compress_in_place_avx512,
crate::ffi::x86::blake3_compress_xof_avx512,
);
}
type HashManyFn = unsafe extern "C" fn(
inputs: *const *const u8,
num_inputs: usize,
blocks: usize,
key: *const u32,
counter: u64,
increment_counter: bool,
flags: u8,
flags_start: u8,
flags_end: u8,
out: *mut u8,
);
// A shared helper function for platform-specific tests.
pub fn test_hash_many_fn(hash_many_fn: HashManyFn) {
// Test a few different initial counter values.
// - 0: The base case.
// - u32::MAX: The low word of the counter overflows for all inputs except the first.
// - i32::MAX: *No* overflow. But carry bugs in tricky SIMD code can screw this up, if you XOR
// when you're supposed to ANDNOT...
let initial_counters = [0, u32::MAX as u64, i32::MAX as u64];
for counter in initial_counters {
dbg!(counter);
// 31 (16 + 8 + 4 + 2 + 1) inputs
const NUM_INPUTS: usize = 31;
let mut input_buf = [0; CHUNK_LEN * NUM_INPUTS];
crate::test::paint_test_input(&mut input_buf);
// First hash chunks.
let mut chunks = ArrayVec::<&[u8; CHUNK_LEN], NUM_INPUTS>::new();
for i in 0..NUM_INPUTS {
chunks.push(array_ref!(input_buf, i * CHUNK_LEN, CHUNK_LEN));
}
let mut portable_chunks_out = [0; NUM_INPUTS * OUT_LEN];
unsafe {
crate::ffi::blake3_hash_many_portable(
chunks.as_ptr() as _,
chunks.len(),
CHUNK_LEN / BLOCK_LEN,
TEST_KEY_WORDS.as_ptr(),
counter,
true,
KEYED_HASH,
CHUNK_START,
CHUNK_END,
portable_chunks_out.as_mut_ptr(),
);
}
let mut test_chunks_out = [0; NUM_INPUTS * OUT_LEN];
unsafe {
hash_many_fn(
chunks.as_ptr() as _,
chunks.len(),
CHUNK_LEN / BLOCK_LEN,
TEST_KEY_WORDS.as_ptr(),
counter,
true,
KEYED_HASH,
CHUNK_START,
CHUNK_END,
test_chunks_out.as_mut_ptr(),
);
}
for n in 0..NUM_INPUTS {
dbg!(n);
assert_eq!(
&portable_chunks_out[n * OUT_LEN..][..OUT_LEN],
&test_chunks_out[n * OUT_LEN..][..OUT_LEN]
);
}
// Then hash parents.
let mut parents = ArrayVec::<&[u8; 2 * OUT_LEN], NUM_INPUTS>::new();
for i in 0..NUM_INPUTS {
parents.push(array_ref!(input_buf, i * 2 * OUT_LEN, 2 * OUT_LEN));
}
let mut portable_parents_out = [0; NUM_INPUTS * OUT_LEN];
unsafe {
crate::ffi::blake3_hash_many_portable(
parents.as_ptr() as _,
parents.len(),
1,
TEST_KEY_WORDS.as_ptr(),
counter,
false,
KEYED_HASH | PARENT,
0,
0,
portable_parents_out.as_mut_ptr(),
);
}
let mut test_parents_out = [0; NUM_INPUTS * OUT_LEN];
unsafe {
hash_many_fn(
parents.as_ptr() as _,
parents.len(),
1,
TEST_KEY_WORDS.as_ptr(),
counter,
false,
KEYED_HASH | PARENT,
0,
0,
test_parents_out.as_mut_ptr(),
);
}
for n in 0..NUM_INPUTS {
dbg!(n);
assert_eq!(
&portable_parents_out[n * OUT_LEN..][..OUT_LEN],
&test_parents_out[n * OUT_LEN..][..OUT_LEN]
);
}
}
}
// Testing the portable implementation against itself is circular, but why not.
#[test]
fn test_hash_many_portable() {
test_hash_many_fn(crate::ffi::blake3_hash_many_portable);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_hash_many_sse2() {
if !crate::sse2_detected() {
return;
}
test_hash_many_fn(crate::ffi::x86::blake3_hash_many_sse2);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_hash_many_sse41() {
if !crate::sse41_detected() {
return;
}
test_hash_many_fn(crate::ffi::x86::blake3_hash_many_sse41);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_hash_many_avx2() {
if !crate::avx2_detected() {
return;
}
test_hash_many_fn(crate::ffi::x86::blake3_hash_many_avx2);
}
#[test]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_hash_many_avx512() {
if !crate::avx512_detected() {
return;
}
test_hash_many_fn(crate::ffi::x86::blake3_hash_many_avx512);
}
#[test]
#[cfg(feature = "neon")]
fn test_hash_many_neon() {
test_hash_many_fn(crate::ffi::neon::blake3_hash_many_neon);
}
#[allow(unused)]
type XofManyFunction = unsafe extern "C" fn(
cv: *const u32,
block: *const u8,
block_len: u8,
counter: u64,
flags: u8,
out: *mut u8,
outblocks: usize,
);
// A shared helper function for platform-specific tests.
#[allow(unused)]
pub fn test_xof_many_fn(xof_many_function: XofManyFunction) {
let mut block = [0; BLOCK_LEN];
let block_len = 42;
crate::test::paint_test_input(&mut block[..block_len]);
let cv = [40, 41, 42, 43, 44, 45, 46, 47];
let flags = KEYED_HASH;
// Test a few different initial counter values.
// - 0: The base case.
// - u32::MAX: The low word of the counter overflows for all inputs except the first.
// - i32::MAX: *No* overflow. But carry bugs in tricky SIMD code can screw this up, if you XOR
// when you're supposed to ANDNOT...
let initial_counters = [0, u32::MAX as u64, i32::MAX as u64];
for counter in initial_counters {
dbg!(counter);
// 31 (16 + 8 + 4 + 2 + 1) outputs
const OUTPUT_SIZE: usize = 31 * BLOCK_LEN;
let mut portable_out = [0u8; OUTPUT_SIZE];
for (i, out_block) in portable_out.chunks_exact_mut(BLOCK_LEN).enumerate() {
unsafe {
crate::ffi::blake3_compress_xof_portable(
cv.as_ptr(),
block.as_ptr(),
block_len as u8,
counter + i as u64,
flags,
out_block.as_mut_ptr(),
);
}
}
let mut test_out = [0u8; OUTPUT_SIZE];
unsafe {
xof_many_function(
cv.as_ptr(),
block.as_ptr(),
block_len as u8,
counter,
flags,
test_out.as_mut_ptr(),
OUTPUT_SIZE / BLOCK_LEN,
);
}
assert_eq!(portable_out, test_out);
}
// Test that xof_many doesn't write more blocks than requested. Note that the current assembly
// implementation always outputs at least one block, so we don't test the zero case.
for block_count in 1..=32 {
let mut array = [0; BLOCK_LEN * 33];
let output_start = 17;
let output_len = block_count * BLOCK_LEN;
let output_end = output_start + output_len;
let output = &mut array[output_start..output_end];
unsafe {
xof_many_function(
cv.as_ptr(),
block.as_ptr(),
block_len as u8,
0,
flags,
output.as_mut_ptr(),
block_count,
);
}
for i in 0..array.len() {
if i < output_start || output_end <= i {
assert_eq!(0, array[i], "index {i}");
}
}
}
}
#[test]
#[cfg(unix)]
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn test_xof_many_avx512() {
if !crate::avx512_detected() {
return;
}
test_xof_many_fn(crate::ffi::x86::blake3_xof_many_avx512);
}
#[test]
fn test_compare_reference_impl() {
const OUT: usize = 303; // more than 64, not a multiple of 4
let mut input_buf = [0; TEST_CASES_MAX];
paint_test_input(&mut input_buf);
for &case in TEST_CASES {
let input = &input_buf[..case];
dbg!(case);
// regular
{
let mut reference_hasher = reference_impl::Hasher::new();
reference_hasher.update(input);
let mut expected_out = [0; OUT];
reference_hasher.finalize(&mut expected_out);
let mut test_hasher = crate::Hasher::new();
test_hasher.update(input);
let mut test_out = [0; OUT];
test_hasher.finalize(&mut test_out);
assert_eq!(test_out[..], expected_out[..]);
#[cfg(feature = "tbb")]
{
let mut tbb_hasher = crate::Hasher::new();
tbb_hasher.update_tbb(input);
let mut tbb_out = [0; OUT];
tbb_hasher.finalize(&mut tbb_out);
assert_eq!(tbb_out[..], expected_out[..]);
}
}
// keyed
{
let mut reference_hasher = reference_impl::Hasher::new_keyed(&TEST_KEY);
reference_hasher.update(input);
let mut expected_out = [0; OUT];
reference_hasher.finalize(&mut expected_out);
let mut test_hasher = crate::Hasher::new_keyed(&TEST_KEY);
test_hasher.update(input);
let mut test_out = [0; OUT];
test_hasher.finalize(&mut test_out);
assert_eq!(test_out[..], expected_out[..]);
#[cfg(feature = "tbb")]
{
let mut tbb_hasher = crate::Hasher::new_keyed(&TEST_KEY);
tbb_hasher.update_tbb(input);
let mut tbb_out = [0; OUT];
tbb_hasher.finalize(&mut tbb_out);
assert_eq!(tbb_out[..], expected_out[..]);
}
}
// derive_key
{
let context = "BLAKE3 2019-12-27 16:13:59 example context (not the test vector one)";
let mut reference_hasher = reference_impl::Hasher::new_derive_key(context);
reference_hasher.update(input);
let mut expected_out = [0; OUT];
reference_hasher.finalize(&mut expected_out);
// the regular C string API
let mut test_hasher = crate::Hasher::new_derive_key(context);
test_hasher.update(input);
let mut test_out = [0; OUT];
test_hasher.finalize(&mut test_out);
assert_eq!(test_out[..], expected_out[..]);
// the raw bytes API
let mut test_hasher_raw = crate::Hasher::new_derive_key_raw(context.as_bytes());
test_hasher_raw.update(input);
let mut test_out_raw = [0; OUT];
test_hasher_raw.finalize(&mut test_out_raw);
assert_eq!(test_out_raw[..], expected_out[..]);
#[cfg(feature = "tbb")]
{
let mut tbb_hasher = crate::Hasher::new_derive_key(context);
tbb_hasher.update_tbb(input);
let mut tbb_out = [0; OUT];
tbb_hasher.finalize(&mut tbb_out);
assert_eq!(tbb_out[..], expected_out[..]);
}
}
}
}
fn reference_hash(input: &[u8]) -> [u8; OUT_LEN] {
let mut hasher = reference_impl::Hasher::new();
hasher.update(input);
let mut bytes = [0; OUT_LEN];
hasher.finalize(&mut bytes);
bytes.into()
}
#[test]
fn test_compare_update_multiple() {
// Don't use all the long test cases here, since that's unnecessarily slow
// in debug mode.
let mut short_test_cases = TEST_CASES;
while *short_test_cases.last().unwrap() > 4 * CHUNK_LEN {
short_test_cases = &short_test_cases[..short_test_cases.len() - 1];
}
assert_eq!(*short_test_cases.last().unwrap(), 4 * CHUNK_LEN);
let mut input_buf = [0; 2 * TEST_CASES_MAX];
paint_test_input(&mut input_buf);
for &first_update in short_test_cases {
dbg!(first_update);
let first_input = &input_buf[..first_update];
let mut test_hasher = crate::Hasher::new();
test_hasher.update(first_input);
for &second_update in short_test_cases {
dbg!(second_update);
let second_input = &input_buf[first_update..][..second_update];
let total_input = &input_buf[..first_update + second_update];
// Clone the hasher with first_update bytes already written, so
// that the next iteration can reuse it.
let mut test_hasher = test_hasher.clone();
test_hasher.update(second_input);
let mut test_out = [0; OUT_LEN];
test_hasher.finalize(&mut test_out);
let expected = reference_hash(total_input);
assert_eq!(expected, test_out);
}
}
}
#[test]
fn test_fuzz_hasher() {
const INPUT_MAX: usize = 4 * CHUNK_LEN;
let mut input_buf = [0; 3 * INPUT_MAX];
paint_test_input(&mut input_buf);
// Don't do too many iterations in debug mode, to keep the tests under a
// second or so. CI should run tests in release mode also. Provide an
// environment variable for specifying a larger number of fuzz iterations.
let num_tests = if cfg!(debug_assertions) { 100 } else { 10_000 };
// Use a fixed RNG seed for reproducibility.
let mut rng = rand_chacha::ChaCha8Rng::from_seed([1; 32]);
for _num_test in 0..num_tests {
dbg!(_num_test);
let mut hasher = crate::Hasher::new();
let mut total_input = 0;
// For each test, write 3 inputs of random length.
for _ in 0..3 {
let input_len = rng.random_range(0..INPUT_MAX + 1);
dbg!(input_len);
let input = &input_buf[total_input..][..input_len];
hasher.update(input);
total_input += input_len;
}
let expected = reference_hash(&input_buf[..total_input]);
let mut test_out = [0; 32];
hasher.finalize(&mut test_out);
assert_eq!(expected, test_out);
}
}
#[test]
fn test_finalize_seek() {
let mut expected = [0; 1000];
{
let mut reference_hasher = reference_impl::Hasher::new();
reference_hasher.update(b"foobarbaz");
reference_hasher.finalize(&mut expected);
}
let mut test_hasher = crate::Hasher::new();
test_hasher.update(b"foobarbaz");
let mut out = [0; 103];
for &seek in &[0, 1, 7, 59, 63, 64, 65, 501, expected.len() - out.len()] {
dbg!(seek);
test_hasher.finalize_seek(seek as u64, &mut out);
assert_eq!(&expected[seek..][..out.len()], &out[..]);
}
}
#[test]
fn test_reset() {
{
let mut hasher = crate::Hasher::new();
hasher.update(&[42; 3 * CHUNK_LEN + 7]);
hasher.reset();
hasher.update(&[42; CHUNK_LEN + 3]);
let mut output = [0; 32];
hasher.finalize(&mut output);
let mut reference_hasher = reference_impl::Hasher::new();
reference_hasher.update(&[42; CHUNK_LEN + 3]);
let mut reference_hash = [0; 32];
reference_hasher.finalize(&mut reference_hash);
assert_eq!(reference_hash, output);
}
{
let key = &[99; 32];
let mut hasher = crate::Hasher::new_keyed(key);
hasher.update(&[42; 3 * CHUNK_LEN + 7]);
hasher.reset();
hasher.update(&[42; CHUNK_LEN + 3]);
let mut output = [0; 32];
hasher.finalize(&mut output);
let mut reference_hasher = reference_impl::Hasher::new_keyed(key);
reference_hasher.update(&[42; CHUNK_LEN + 3]);
let mut reference_hash = [0; 32];
reference_hasher.finalize(&mut reference_hash);
assert_eq!(reference_hash, output);
}
{
let context = "BLAKE3 2020-02-12 10:20:58 reset test";
let mut hasher = crate::Hasher::new_derive_key(context);
hasher.update(&[42; 3 * CHUNK_LEN + 7]);
hasher.reset();
hasher.update(&[42; CHUNK_LEN + 3]);
let mut output = [0; 32];
hasher.finalize(&mut output);
let mut reference_hasher = reference_impl::Hasher::new_derive_key(context);
reference_hasher.update(&[42; CHUNK_LEN + 3]);
let mut reference_hash = [0; 32];
reference_hasher.finalize(&mut reference_hash);
assert_eq!(reference_hash, output);
}
}

332
external/blake3/blake3_dispatch.c vendored Normal file
View File

@@ -0,0 +1,332 @@
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include "blake3_impl.h"
#if defined(_MSC_VER)
#include <Windows.h>
#endif
#if defined(IS_X86)
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__GNUC__)
#include <immintrin.h>
#else
#undef IS_X86 /* Unimplemented! */
#endif
#endif
#if !defined(BLAKE3_ATOMICS)
#if defined(__has_include)
#if __has_include(<stdatomic.h>) && !defined(_MSC_VER)
#define BLAKE3_ATOMICS 1
#else
#define BLAKE3_ATOMICS 0
#endif /* __has_include(<stdatomic.h>) && !defined(_MSC_VER) */
#else
#define BLAKE3_ATOMICS 0
#endif /* defined(__has_include) */
#endif /* BLAKE3_ATOMICS */
#if BLAKE3_ATOMICS
#define ATOMIC_INT _Atomic int
#define ATOMIC_LOAD(x) x
#define ATOMIC_STORE(x, y) x = y
#elif defined(_MSC_VER)
#define ATOMIC_INT LONG
#define ATOMIC_LOAD(x) InterlockedOr(&x, 0)
#define ATOMIC_STORE(x, y) InterlockedExchange(&x, y)
#else
#define ATOMIC_INT int
#define ATOMIC_LOAD(x) x
#define ATOMIC_STORE(x, y) x = y
#endif
#define MAYBE_UNUSED(x) (void)((x))
#if defined(IS_X86)
static uint64_t xgetbv(void) {
#if defined(_MSC_VER)
return _xgetbv(0);
#else
uint32_t eax = 0, edx = 0;
__asm__ __volatile__("xgetbv\n" : "=a"(eax), "=d"(edx) : "c"(0));
return ((uint64_t)edx << 32) | eax;
#endif
}
static void cpuid(uint32_t out[4], uint32_t id) {
#if defined(_MSC_VER)
__cpuid((int *)out, id);
#elif defined(__i386__) || defined(_M_IX86)
__asm__ __volatile__("movl %%ebx, %1\n"
"cpuid\n"
"xchgl %1, %%ebx\n"
: "=a"(out[0]), "=r"(out[1]), "=c"(out[2]), "=d"(out[3])
: "a"(id));
#else
__asm__ __volatile__("cpuid\n"
: "=a"(out[0]), "=b"(out[1]), "=c"(out[2]), "=d"(out[3])
: "a"(id));
#endif
}
static void cpuidex(uint32_t out[4], uint32_t id, uint32_t sid) {
#if defined(_MSC_VER)
__cpuidex((int *)out, id, sid);
#elif defined(__i386__) || defined(_M_IX86)
__asm__ __volatile__("movl %%ebx, %1\n"
"cpuid\n"
"xchgl %1, %%ebx\n"
: "=a"(out[0]), "=r"(out[1]), "=c"(out[2]), "=d"(out[3])
: "a"(id), "c"(sid));
#else
__asm__ __volatile__("cpuid\n"
: "=a"(out[0]), "=b"(out[1]), "=c"(out[2]), "=d"(out[3])
: "a"(id), "c"(sid));
#endif
}
#endif
enum cpu_feature {
SSE2 = 1 << 0,
SSSE3 = 1 << 1,
SSE41 = 1 << 2,
AVX = 1 << 3,
AVX2 = 1 << 4,
AVX512F = 1 << 5,
AVX512VL = 1 << 6,
/* ... */
UNDEFINED = 1 << 30
};
#if !defined(BLAKE3_TESTING)
static /* Allow the variable to be controlled manually for testing */
#endif
ATOMIC_INT g_cpu_features = UNDEFINED;
#if !defined(BLAKE3_TESTING)
static
#endif
enum cpu_feature
get_cpu_features(void) {
/* If TSAN detects a data race here, try compiling with -DBLAKE3_ATOMICS=1 */
enum cpu_feature features = ATOMIC_LOAD(g_cpu_features);
if (features != UNDEFINED) {
return features;
} else {
#if defined(IS_X86)
uint32_t regs[4] = {0};
uint32_t *eax = &regs[0], *ebx = &regs[1], *ecx = &regs[2], *edx = &regs[3];
(void)edx;
features = 0;
cpuid(regs, 0);
const int max_id = *eax;
cpuid(regs, 1);
#if defined(__amd64__) || defined(_M_X64)
features |= SSE2;
#else
if (*edx & (1UL << 26))
features |= SSE2;
#endif
if (*ecx & (1UL << 9))
features |= SSSE3;
if (*ecx & (1UL << 19))
features |= SSE41;
if (*ecx & (1UL << 27)) { // OSXSAVE
const uint64_t mask = xgetbv();
if ((mask & 6) == 6) { // SSE and AVX states
if (*ecx & (1UL << 28))
features |= AVX;
if (max_id >= 7) {
cpuidex(regs, 7, 0);
if (*ebx & (1UL << 5))
features |= AVX2;
if ((mask & 224) == 224) { // Opmask, ZMM_Hi256, Hi16_Zmm
if (*ebx & (1UL << 31))
features |= AVX512VL;
if (*ebx & (1UL << 16))
features |= AVX512F;
}
}
}
}
ATOMIC_STORE(g_cpu_features, features);
return features;
#else
/* How to detect NEON? */
return 0;
#endif
}
}
void blake3_compress_in_place(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags) {
#if defined(IS_X86)
const enum cpu_feature features = get_cpu_features();
MAYBE_UNUSED(features);
#if !defined(BLAKE3_NO_AVX512)
if (features & AVX512VL) {
blake3_compress_in_place_avx512(cv, block, block_len, counter, flags);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE41)
if (features & SSE41) {
blake3_compress_in_place_sse41(cv, block, block_len, counter, flags);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE2)
if (features & SSE2) {
blake3_compress_in_place_sse2(cv, block, block_len, counter, flags);
return;
}
#endif
#endif
blake3_compress_in_place_portable(cv, block, block_len, counter, flags);
}
void blake3_compress_xof(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags,
uint8_t out[64]) {
#if defined(IS_X86)
const enum cpu_feature features = get_cpu_features();
MAYBE_UNUSED(features);
#if !defined(BLAKE3_NO_AVX512)
if (features & AVX512VL) {
blake3_compress_xof_avx512(cv, block, block_len, counter, flags, out);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE41)
if (features & SSE41) {
blake3_compress_xof_sse41(cv, block, block_len, counter, flags, out);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE2)
if (features & SSE2) {
blake3_compress_xof_sse2(cv, block, block_len, counter, flags, out);
return;
}
#endif
#endif
blake3_compress_xof_portable(cv, block, block_len, counter, flags, out);
}
void blake3_xof_many(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags,
uint8_t out[64], size_t outblocks) {
if (outblocks == 0) {
// The current assembly implementation always outputs at least 1 block.
return;
}
#if defined(IS_X86)
const enum cpu_feature features = get_cpu_features();
MAYBE_UNUSED(features);
#if !defined(_WIN32) && !defined(BLAKE3_NO_AVX512)
if (features & AVX512VL) {
blake3_xof_many_avx512(cv, block, block_len, counter, flags, out, outblocks);
return;
}
#endif
#endif
for(size_t i = 0; i < outblocks; ++i) {
blake3_compress_xof(cv, block, block_len, counter + i, flags, out + 64*i);
}
}
void blake3_hash_many(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out) {
#if defined(IS_X86)
const enum cpu_feature features = get_cpu_features();
MAYBE_UNUSED(features);
#if !defined(BLAKE3_NO_AVX512)
if ((features & (AVX512F|AVX512VL)) == (AVX512F|AVX512VL)) {
blake3_hash_many_avx512(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
return;
}
#endif
#if !defined(BLAKE3_NO_AVX2)
if (features & AVX2) {
blake3_hash_many_avx2(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE41)
if (features & SSE41) {
blake3_hash_many_sse41(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
return;
}
#endif
#if !defined(BLAKE3_NO_SSE2)
if (features & SSE2) {
blake3_hash_many_sse2(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
return;
}
#endif
#endif
#if BLAKE3_USE_NEON == 1
blake3_hash_many_neon(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end, out);
return;
#endif
blake3_hash_many_portable(inputs, num_inputs, blocks, key, counter,
increment_counter, flags, flags_start, flags_end,
out);
}
// The dynamically detected SIMD degree of the current platform.
size_t blake3_simd_degree(void) {
#if defined(IS_X86)
const enum cpu_feature features = get_cpu_features();
MAYBE_UNUSED(features);
#if !defined(BLAKE3_NO_AVX512)
if ((features & (AVX512F|AVX512VL)) == (AVX512F|AVX512VL)) {
return 16;
}
#endif
#if !defined(BLAKE3_NO_AVX2)
if (features & AVX2) {
return 8;
}
#endif
#if !defined(BLAKE3_NO_SSE41)
if (features & SSE41) {
return 4;
}
#endif
#if !defined(BLAKE3_NO_SSE2)
if (features & SSE2) {
return 4;
}
#endif
#endif
#if BLAKE3_USE_NEON == 1
return 4;
#endif
return 1;
}

333
external/blake3/blake3_impl.h vendored Normal file
View File

@@ -0,0 +1,333 @@
#ifndef BLAKE3_IMPL_H
#define BLAKE3_IMPL_H
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include "blake3.h"
#ifdef __cplusplus
extern "C" {
#endif
// internal flags
enum blake3_flags {
CHUNK_START = 1 << 0,
CHUNK_END = 1 << 1,
PARENT = 1 << 2,
ROOT = 1 << 3,
KEYED_HASH = 1 << 4,
DERIVE_KEY_CONTEXT = 1 << 5,
DERIVE_KEY_MATERIAL = 1 << 6,
};
// This C implementation tries to support recent versions of GCC, Clang, and
// MSVC.
#if defined(_MSC_VER)
#define INLINE static __forceinline
#else
#define INLINE static inline __attribute__((always_inline))
#endif
#ifdef __cplusplus
#define NOEXCEPT noexcept
#else
#define NOEXCEPT
#endif
#if (defined(__x86_64__) || defined(_M_X64)) && !defined(_M_ARM64EC)
#define IS_X86
#define IS_X86_64
#endif
#if defined(__i386__) || defined(_M_IX86)
#define IS_X86
#define IS_X86_32
#endif
#if defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
#define IS_AARCH64
#endif
#if defined(IS_X86)
#if defined(_MSC_VER)
#include <intrin.h>
#endif
#endif
#if !defined(BLAKE3_USE_NEON)
// If BLAKE3_USE_NEON not manually set, autodetect based on AArch64ness
#if defined(IS_AARCH64)
#if defined(__ARM_BIG_ENDIAN)
#define BLAKE3_USE_NEON 0
#else
#define BLAKE3_USE_NEON 1
#endif
#else
#define BLAKE3_USE_NEON 0
#endif
#endif
#if defined(IS_X86)
#define MAX_SIMD_DEGREE 16
#elif BLAKE3_USE_NEON == 1
#define MAX_SIMD_DEGREE 4
#else
#define MAX_SIMD_DEGREE 1
#endif
// There are some places where we want a static size that's equal to the
// MAX_SIMD_DEGREE, but also at least 2.
#define MAX_SIMD_DEGREE_OR_2 (MAX_SIMD_DEGREE > 2 ? MAX_SIMD_DEGREE : 2)
static const uint32_t IV[8] = {0x6A09E667UL, 0xBB67AE85UL, 0x3C6EF372UL,
0xA54FF53AUL, 0x510E527FUL, 0x9B05688CUL,
0x1F83D9ABUL, 0x5BE0CD19UL};
static const uint8_t MSG_SCHEDULE[7][16] = {
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
{2, 6, 3, 10, 7, 0, 4, 13, 1, 11, 12, 5, 9, 14, 15, 8},
{3, 4, 10, 12, 13, 2, 7, 14, 6, 5, 9, 0, 11, 15, 8, 1},
{10, 7, 12, 9, 14, 3, 13, 15, 4, 0, 11, 2, 5, 8, 1, 6},
{12, 13, 9, 11, 15, 10, 14, 8, 7, 2, 5, 3, 0, 1, 6, 4},
{9, 14, 11, 5, 8, 12, 15, 1, 13, 3, 0, 10, 2, 6, 4, 7},
{11, 15, 5, 0, 1, 9, 8, 6, 14, 10, 2, 12, 3, 4, 7, 13},
};
/* Find index of the highest set bit */
/* x is assumed to be nonzero. */
static unsigned int highest_one(uint64_t x) {
#if defined(__GNUC__) || defined(__clang__)
return 63 ^ (unsigned int)__builtin_clzll(x);
#elif defined(_MSC_VER) && defined(IS_X86_64)
unsigned long index;
_BitScanReverse64(&index, x);
return index;
#elif defined(_MSC_VER) && defined(IS_X86_32)
if(x >> 32) {
unsigned long index;
_BitScanReverse(&index, (unsigned long)(x >> 32));
return 32 + index;
} else {
unsigned long index;
_BitScanReverse(&index, (unsigned long)x);
return index;
}
#else
unsigned int c = 0;
if(x & 0xffffffff00000000ULL) { x >>= 32; c += 32; }
if(x & 0x00000000ffff0000ULL) { x >>= 16; c += 16; }
if(x & 0x000000000000ff00ULL) { x >>= 8; c += 8; }
if(x & 0x00000000000000f0ULL) { x >>= 4; c += 4; }
if(x & 0x000000000000000cULL) { x >>= 2; c += 2; }
if(x & 0x0000000000000002ULL) { c += 1; }
return c;
#endif
}
// Count the number of 1 bits.
INLINE unsigned int popcnt(uint64_t x) {
#if defined(__GNUC__) || defined(__clang__)
return (unsigned int)__builtin_popcountll(x);
#else
unsigned int count = 0;
while (x != 0) {
count += 1;
x &= x - 1;
}
return count;
#endif
}
// Largest power of two less than or equal to x. As a special case, returns 1
// when x is 0.
INLINE uint64_t round_down_to_power_of_2(uint64_t x) {
return 1ULL << highest_one(x | 1);
}
INLINE uint32_t counter_low(uint64_t counter) { return (uint32_t)counter; }
INLINE uint32_t counter_high(uint64_t counter) {
return (uint32_t)(counter >> 32);
}
INLINE uint32_t load32(const void *src) {
const uint8_t *p = (const uint8_t *)src;
return ((uint32_t)(p[0]) << 0) | ((uint32_t)(p[1]) << 8) |
((uint32_t)(p[2]) << 16) | ((uint32_t)(p[3]) << 24);
}
INLINE void load_key_words(const uint8_t key[BLAKE3_KEY_LEN],
uint32_t key_words[8]) {
key_words[0] = load32(&key[0 * 4]);
key_words[1] = load32(&key[1 * 4]);
key_words[2] = load32(&key[2 * 4]);
key_words[3] = load32(&key[3 * 4]);
key_words[4] = load32(&key[4 * 4]);
key_words[5] = load32(&key[5 * 4]);
key_words[6] = load32(&key[6 * 4]);
key_words[7] = load32(&key[7 * 4]);
}
INLINE void load_block_words(const uint8_t block[BLAKE3_BLOCK_LEN],
uint32_t block_words[16]) {
for (size_t i = 0; i < 16; i++) {
block_words[i] = load32(&block[i * 4]);
}
}
INLINE void store32(void *dst, uint32_t w) {
uint8_t *p = (uint8_t *)dst;
p[0] = (uint8_t)(w >> 0);
p[1] = (uint8_t)(w >> 8);
p[2] = (uint8_t)(w >> 16);
p[3] = (uint8_t)(w >> 24);
}
INLINE void store_cv_words(uint8_t bytes_out[32], uint32_t cv_words[8]) {
store32(&bytes_out[0 * 4], cv_words[0]);
store32(&bytes_out[1 * 4], cv_words[1]);
store32(&bytes_out[2 * 4], cv_words[2]);
store32(&bytes_out[3 * 4], cv_words[3]);
store32(&bytes_out[4 * 4], cv_words[4]);
store32(&bytes_out[5 * 4], cv_words[5]);
store32(&bytes_out[6 * 4], cv_words[6]);
store32(&bytes_out[7 * 4], cv_words[7]);
}
void blake3_compress_in_place(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
void blake3_compress_xof(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags,
uint8_t out[64]);
void blake3_xof_many(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags,
uint8_t out[64], size_t outblocks);
void blake3_hash_many(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out);
size_t blake3_simd_degree(void);
BLAKE3_PRIVATE size_t blake3_compress_subtree_wide(const uint8_t *input, size_t input_len,
const uint32_t key[8],
uint64_t chunk_counter, uint8_t flags,
uint8_t *out, bool use_tbb);
#if defined(BLAKE3_USE_TBB)
BLAKE3_PRIVATE void blake3_compress_subtree_wide_join_tbb(
// shared params
const uint32_t key[8], uint8_t flags, bool use_tbb,
// left-hand side params
const uint8_t *l_input, size_t l_input_len, uint64_t l_chunk_counter,
uint8_t *l_cvs, size_t *l_n,
// right-hand side params
const uint8_t *r_input, size_t r_input_len, uint64_t r_chunk_counter,
uint8_t *r_cvs, size_t *r_n) NOEXCEPT;
#endif
// Declarations for implementation-specific functions.
void blake3_compress_in_place_portable(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
void blake3_compress_xof_portable(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]);
void blake3_hash_many_portable(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#if defined(IS_X86)
#if !defined(BLAKE3_NO_SSE2)
void blake3_compress_in_place_sse2(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
void blake3_compress_xof_sse2(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]);
void blake3_hash_many_sse2(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#endif
#if !defined(BLAKE3_NO_SSE41)
void blake3_compress_in_place_sse41(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
void blake3_compress_xof_sse41(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]);
void blake3_hash_many_sse41(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#endif
#if !defined(BLAKE3_NO_AVX2)
void blake3_hash_many_avx2(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#endif
#if !defined(BLAKE3_NO_AVX512)
void blake3_compress_in_place_avx512(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
void blake3_compress_xof_avx512(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]);
void blake3_hash_many_avx512(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#if !defined(_WIN32)
void blake3_xof_many_avx512(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags,
uint8_t* out, size_t outblocks);
#endif
#endif
#endif
#if BLAKE3_USE_NEON == 1
void blake3_hash_many_neon(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out);
#endif
#ifdef __cplusplus
}
#endif
#endif /* BLAKE3_IMPL_H */

366
external/blake3/blake3_neon.c vendored Normal file
View File

@@ -0,0 +1,366 @@
#include "blake3_impl.h"
#include <arm_neon.h>
#ifdef __ARM_BIG_ENDIAN
#error "This implementation only supports little-endian ARM."
// It might be that all we need for big-endian support here is to get the loads
// and stores right, but step zero would be finding a way to test it in CI.
#endif
INLINE uint32x4_t loadu_128(const uint8_t src[16]) {
// vld1q_u32 has alignment requirements. Don't use it.
return vreinterpretq_u32_u8(vld1q_u8(src));
}
INLINE void storeu_128(uint32x4_t src, uint8_t dest[16]) {
// vst1q_u32 has alignment requirements. Don't use it.
vst1q_u8(dest, vreinterpretq_u8_u32(src));
}
INLINE uint32x4_t add_128(uint32x4_t a, uint32x4_t b) {
return vaddq_u32(a, b);
}
INLINE uint32x4_t xor_128(uint32x4_t a, uint32x4_t b) {
return veorq_u32(a, b);
}
INLINE uint32x4_t set1_128(uint32_t x) { return vld1q_dup_u32(&x); }
INLINE uint32x4_t set4(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
uint32_t array[4] = {a, b, c, d};
return vld1q_u32(array);
}
INLINE uint32x4_t rot16_128(uint32x4_t x) {
// The straightforward implementation would be two shifts and an or, but that's
// slower on microarchitectures we've tested. See
// https://github.com/BLAKE3-team/BLAKE3/pull/319.
// return vorrq_u32(vshrq_n_u32(x, 16), vshlq_n_u32(x, 32 - 16));
return vreinterpretq_u32_u16(vrev32q_u16(vreinterpretq_u16_u32(x)));
}
INLINE uint32x4_t rot12_128(uint32x4_t x) {
// See comment in rot16_128.
// return vorrq_u32(vshrq_n_u32(x, 12), vshlq_n_u32(x, 32 - 12));
return vsriq_n_u32(vshlq_n_u32(x, 32-12), x, 12);
}
INLINE uint32x4_t rot8_128(uint32x4_t x) {
// See comment in rot16_128.
// return vorrq_u32(vshrq_n_u32(x, 8), vshlq_n_u32(x, 32 - 8));
#if defined(__clang__)
return vreinterpretq_u32_u8(__builtin_shufflevector(vreinterpretq_u8_u32(x), vreinterpretq_u8_u32(x), 1,2,3,0,5,6,7,4,9,10,11,8,13,14,15,12));
#elif __GNUC__ * 10000 + __GNUC_MINOR__ * 100 >=40700
static const uint8x16_t r8 = {1,2,3,0,5,6,7,4,9,10,11,8,13,14,15,12};
return vreinterpretq_u32_u8(__builtin_shuffle(vreinterpretq_u8_u32(x), vreinterpretq_u8_u32(x), r8));
#else
return vsriq_n_u32(vshlq_n_u32(x, 32-8), x, 8);
#endif
}
INLINE uint32x4_t rot7_128(uint32x4_t x) {
// See comment in rot16_128.
// return vorrq_u32(vshrq_n_u32(x, 7), vshlq_n_u32(x, 32 - 7));
return vsriq_n_u32(vshlq_n_u32(x, 32-7), x, 7);
}
// TODO: compress_neon
// TODO: hash2_neon
/*
* ----------------------------------------------------------------------------
* hash4_neon
* ----------------------------------------------------------------------------
*/
INLINE void round_fn4(uint32x4_t v[16], uint32x4_t m[16], size_t r) {
v[0] = add_128(v[0], m[(size_t)MSG_SCHEDULE[r][0]]);
v[1] = add_128(v[1], m[(size_t)MSG_SCHEDULE[r][2]]);
v[2] = add_128(v[2], m[(size_t)MSG_SCHEDULE[r][4]]);
v[3] = add_128(v[3], m[(size_t)MSG_SCHEDULE[r][6]]);
v[0] = add_128(v[0], v[4]);
v[1] = add_128(v[1], v[5]);
v[2] = add_128(v[2], v[6]);
v[3] = add_128(v[3], v[7]);
v[12] = xor_128(v[12], v[0]);
v[13] = xor_128(v[13], v[1]);
v[14] = xor_128(v[14], v[2]);
v[15] = xor_128(v[15], v[3]);
v[12] = rot16_128(v[12]);
v[13] = rot16_128(v[13]);
v[14] = rot16_128(v[14]);
v[15] = rot16_128(v[15]);
v[8] = add_128(v[8], v[12]);
v[9] = add_128(v[9], v[13]);
v[10] = add_128(v[10], v[14]);
v[11] = add_128(v[11], v[15]);
v[4] = xor_128(v[4], v[8]);
v[5] = xor_128(v[5], v[9]);
v[6] = xor_128(v[6], v[10]);
v[7] = xor_128(v[7], v[11]);
v[4] = rot12_128(v[4]);
v[5] = rot12_128(v[5]);
v[6] = rot12_128(v[6]);
v[7] = rot12_128(v[7]);
v[0] = add_128(v[0], m[(size_t)MSG_SCHEDULE[r][1]]);
v[1] = add_128(v[1], m[(size_t)MSG_SCHEDULE[r][3]]);
v[2] = add_128(v[2], m[(size_t)MSG_SCHEDULE[r][5]]);
v[3] = add_128(v[3], m[(size_t)MSG_SCHEDULE[r][7]]);
v[0] = add_128(v[0], v[4]);
v[1] = add_128(v[1], v[5]);
v[2] = add_128(v[2], v[6]);
v[3] = add_128(v[3], v[7]);
v[12] = xor_128(v[12], v[0]);
v[13] = xor_128(v[13], v[1]);
v[14] = xor_128(v[14], v[2]);
v[15] = xor_128(v[15], v[3]);
v[12] = rot8_128(v[12]);
v[13] = rot8_128(v[13]);
v[14] = rot8_128(v[14]);
v[15] = rot8_128(v[15]);
v[8] = add_128(v[8], v[12]);
v[9] = add_128(v[9], v[13]);
v[10] = add_128(v[10], v[14]);
v[11] = add_128(v[11], v[15]);
v[4] = xor_128(v[4], v[8]);
v[5] = xor_128(v[5], v[9]);
v[6] = xor_128(v[6], v[10]);
v[7] = xor_128(v[7], v[11]);
v[4] = rot7_128(v[4]);
v[5] = rot7_128(v[5]);
v[6] = rot7_128(v[6]);
v[7] = rot7_128(v[7]);
v[0] = add_128(v[0], m[(size_t)MSG_SCHEDULE[r][8]]);
v[1] = add_128(v[1], m[(size_t)MSG_SCHEDULE[r][10]]);
v[2] = add_128(v[2], m[(size_t)MSG_SCHEDULE[r][12]]);
v[3] = add_128(v[3], m[(size_t)MSG_SCHEDULE[r][14]]);
v[0] = add_128(v[0], v[5]);
v[1] = add_128(v[1], v[6]);
v[2] = add_128(v[2], v[7]);
v[3] = add_128(v[3], v[4]);
v[15] = xor_128(v[15], v[0]);
v[12] = xor_128(v[12], v[1]);
v[13] = xor_128(v[13], v[2]);
v[14] = xor_128(v[14], v[3]);
v[15] = rot16_128(v[15]);
v[12] = rot16_128(v[12]);
v[13] = rot16_128(v[13]);
v[14] = rot16_128(v[14]);
v[10] = add_128(v[10], v[15]);
v[11] = add_128(v[11], v[12]);
v[8] = add_128(v[8], v[13]);
v[9] = add_128(v[9], v[14]);
v[5] = xor_128(v[5], v[10]);
v[6] = xor_128(v[6], v[11]);
v[7] = xor_128(v[7], v[8]);
v[4] = xor_128(v[4], v[9]);
v[5] = rot12_128(v[5]);
v[6] = rot12_128(v[6]);
v[7] = rot12_128(v[7]);
v[4] = rot12_128(v[4]);
v[0] = add_128(v[0], m[(size_t)MSG_SCHEDULE[r][9]]);
v[1] = add_128(v[1], m[(size_t)MSG_SCHEDULE[r][11]]);
v[2] = add_128(v[2], m[(size_t)MSG_SCHEDULE[r][13]]);
v[3] = add_128(v[3], m[(size_t)MSG_SCHEDULE[r][15]]);
v[0] = add_128(v[0], v[5]);
v[1] = add_128(v[1], v[6]);
v[2] = add_128(v[2], v[7]);
v[3] = add_128(v[3], v[4]);
v[15] = xor_128(v[15], v[0]);
v[12] = xor_128(v[12], v[1]);
v[13] = xor_128(v[13], v[2]);
v[14] = xor_128(v[14], v[3]);
v[15] = rot8_128(v[15]);
v[12] = rot8_128(v[12]);
v[13] = rot8_128(v[13]);
v[14] = rot8_128(v[14]);
v[10] = add_128(v[10], v[15]);
v[11] = add_128(v[11], v[12]);
v[8] = add_128(v[8], v[13]);
v[9] = add_128(v[9], v[14]);
v[5] = xor_128(v[5], v[10]);
v[6] = xor_128(v[6], v[11]);
v[7] = xor_128(v[7], v[8]);
v[4] = xor_128(v[4], v[9]);
v[5] = rot7_128(v[5]);
v[6] = rot7_128(v[6]);
v[7] = rot7_128(v[7]);
v[4] = rot7_128(v[4]);
}
INLINE void transpose_vecs_128(uint32x4_t vecs[4]) {
// Individually transpose the four 2x2 sub-matrices in each corner.
uint32x4x2_t rows01 = vtrnq_u32(vecs[0], vecs[1]);
uint32x4x2_t rows23 = vtrnq_u32(vecs[2], vecs[3]);
// Swap the top-right and bottom-left 2x2s (which just got transposed).
vecs[0] =
vcombine_u32(vget_low_u32(rows01.val[0]), vget_low_u32(rows23.val[0]));
vecs[1] =
vcombine_u32(vget_low_u32(rows01.val[1]), vget_low_u32(rows23.val[1]));
vecs[2] =
vcombine_u32(vget_high_u32(rows01.val[0]), vget_high_u32(rows23.val[0]));
vecs[3] =
vcombine_u32(vget_high_u32(rows01.val[1]), vget_high_u32(rows23.val[1]));
}
INLINE void transpose_msg_vecs4(const uint8_t *const *inputs,
size_t block_offset, uint32x4_t out[16]) {
out[0] = loadu_128(&inputs[0][block_offset + 0 * sizeof(uint32x4_t)]);
out[1] = loadu_128(&inputs[1][block_offset + 0 * sizeof(uint32x4_t)]);
out[2] = loadu_128(&inputs[2][block_offset + 0 * sizeof(uint32x4_t)]);
out[3] = loadu_128(&inputs[3][block_offset + 0 * sizeof(uint32x4_t)]);
out[4] = loadu_128(&inputs[0][block_offset + 1 * sizeof(uint32x4_t)]);
out[5] = loadu_128(&inputs[1][block_offset + 1 * sizeof(uint32x4_t)]);
out[6] = loadu_128(&inputs[2][block_offset + 1 * sizeof(uint32x4_t)]);
out[7] = loadu_128(&inputs[3][block_offset + 1 * sizeof(uint32x4_t)]);
out[8] = loadu_128(&inputs[0][block_offset + 2 * sizeof(uint32x4_t)]);
out[9] = loadu_128(&inputs[1][block_offset + 2 * sizeof(uint32x4_t)]);
out[10] = loadu_128(&inputs[2][block_offset + 2 * sizeof(uint32x4_t)]);
out[11] = loadu_128(&inputs[3][block_offset + 2 * sizeof(uint32x4_t)]);
out[12] = loadu_128(&inputs[0][block_offset + 3 * sizeof(uint32x4_t)]);
out[13] = loadu_128(&inputs[1][block_offset + 3 * sizeof(uint32x4_t)]);
out[14] = loadu_128(&inputs[2][block_offset + 3 * sizeof(uint32x4_t)]);
out[15] = loadu_128(&inputs[3][block_offset + 3 * sizeof(uint32x4_t)]);
transpose_vecs_128(&out[0]);
transpose_vecs_128(&out[4]);
transpose_vecs_128(&out[8]);
transpose_vecs_128(&out[12]);
}
INLINE void load_counters4(uint64_t counter, bool increment_counter,
uint32x4_t *out_low, uint32x4_t *out_high) {
uint64_t mask = (increment_counter ? ~0 : 0);
*out_low = set4(
counter_low(counter + (mask & 0)), counter_low(counter + (mask & 1)),
counter_low(counter + (mask & 2)), counter_low(counter + (mask & 3)));
*out_high = set4(
counter_high(counter + (mask & 0)), counter_high(counter + (mask & 1)),
counter_high(counter + (mask & 2)), counter_high(counter + (mask & 3)));
}
void blake3_hash4_neon(const uint8_t *const *inputs, size_t blocks,
const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out) {
uint32x4_t h_vecs[8] = {
set1_128(key[0]), set1_128(key[1]), set1_128(key[2]), set1_128(key[3]),
set1_128(key[4]), set1_128(key[5]), set1_128(key[6]), set1_128(key[7]),
};
uint32x4_t counter_low_vec, counter_high_vec;
load_counters4(counter, increment_counter, &counter_low_vec,
&counter_high_vec);
uint8_t block_flags = flags | flags_start;
for (size_t block = 0; block < blocks; block++) {
if (block + 1 == blocks) {
block_flags |= flags_end;
}
uint32x4_t block_len_vec = set1_128(BLAKE3_BLOCK_LEN);
uint32x4_t block_flags_vec = set1_128(block_flags);
uint32x4_t msg_vecs[16];
transpose_msg_vecs4(inputs, block * BLAKE3_BLOCK_LEN, msg_vecs);
uint32x4_t v[16] = {
h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3],
h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7],
set1_128(IV[0]), set1_128(IV[1]), set1_128(IV[2]), set1_128(IV[3]),
counter_low_vec, counter_high_vec, block_len_vec, block_flags_vec,
};
round_fn4(v, msg_vecs, 0);
round_fn4(v, msg_vecs, 1);
round_fn4(v, msg_vecs, 2);
round_fn4(v, msg_vecs, 3);
round_fn4(v, msg_vecs, 4);
round_fn4(v, msg_vecs, 5);
round_fn4(v, msg_vecs, 6);
h_vecs[0] = xor_128(v[0], v[8]);
h_vecs[1] = xor_128(v[1], v[9]);
h_vecs[2] = xor_128(v[2], v[10]);
h_vecs[3] = xor_128(v[3], v[11]);
h_vecs[4] = xor_128(v[4], v[12]);
h_vecs[5] = xor_128(v[5], v[13]);
h_vecs[6] = xor_128(v[6], v[14]);
h_vecs[7] = xor_128(v[7], v[15]);
block_flags = flags;
}
transpose_vecs_128(&h_vecs[0]);
transpose_vecs_128(&h_vecs[4]);
// The first four vecs now contain the first half of each output, and the
// second four vecs contain the second half of each output.
storeu_128(h_vecs[0], &out[0 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[4], &out[1 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[1], &out[2 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[5], &out[3 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[2], &out[4 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[6], &out[5 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[3], &out[6 * sizeof(uint32x4_t)]);
storeu_128(h_vecs[7], &out[7 * sizeof(uint32x4_t)]);
}
/*
* ----------------------------------------------------------------------------
* hash_many_neon
* ----------------------------------------------------------------------------
*/
void blake3_compress_in_place_portable(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags);
INLINE void hash_one_neon(const uint8_t *input, size_t blocks,
const uint32_t key[8], uint64_t counter,
uint8_t flags, uint8_t flags_start, uint8_t flags_end,
uint8_t out[BLAKE3_OUT_LEN]) {
uint32_t cv[8];
memcpy(cv, key, BLAKE3_KEY_LEN);
uint8_t block_flags = flags | flags_start;
while (blocks > 0) {
if (blocks == 1) {
block_flags |= flags_end;
}
// TODO: Implement compress_neon. However note that according to
// https://github.com/BLAKE2/BLAKE2/commit/7965d3e6e1b4193438b8d3a656787587d2579227,
// compress_neon might not be any faster than compress_portable.
blake3_compress_in_place_portable(cv, input, BLAKE3_BLOCK_LEN, counter,
block_flags);
input = &input[BLAKE3_BLOCK_LEN];
blocks -= 1;
block_flags = flags;
}
memcpy(out, cv, BLAKE3_OUT_LEN);
}
void blake3_hash_many_neon(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out) {
while (num_inputs >= 4) {
blake3_hash4_neon(inputs, blocks, key, counter, increment_counter, flags,
flags_start, flags_end, out);
if (increment_counter) {
counter += 4;
}
inputs += 4;
num_inputs -= 4;
out = &out[4 * BLAKE3_OUT_LEN];
}
while (num_inputs > 0) {
hash_one_neon(inputs[0], blocks, key, counter, flags, flags_start,
flags_end, out);
if (increment_counter) {
counter += 1;
}
inputs += 1;
num_inputs -= 1;
out = &out[BLAKE3_OUT_LEN];
}
}

160
external/blake3/blake3_portable.c vendored Normal file
View File

@@ -0,0 +1,160 @@
#include "blake3_impl.h"
#include <string.h>
INLINE uint32_t rotr32(uint32_t w, uint32_t c) {
return (w >> c) | (w << (32 - c));
}
INLINE void g(uint32_t *state, size_t a, size_t b, size_t c, size_t d,
uint32_t x, uint32_t y) {
state[a] = state[a] + state[b] + x;
state[d] = rotr32(state[d] ^ state[a], 16);
state[c] = state[c] + state[d];
state[b] = rotr32(state[b] ^ state[c], 12);
state[a] = state[a] + state[b] + y;
state[d] = rotr32(state[d] ^ state[a], 8);
state[c] = state[c] + state[d];
state[b] = rotr32(state[b] ^ state[c], 7);
}
INLINE void round_fn(uint32_t state[16], const uint32_t *msg, size_t round) {
// Select the message schedule based on the round.
const uint8_t *schedule = MSG_SCHEDULE[round];
// Mix the columns.
g(state, 0, 4, 8, 12, msg[schedule[0]], msg[schedule[1]]);
g(state, 1, 5, 9, 13, msg[schedule[2]], msg[schedule[3]]);
g(state, 2, 6, 10, 14, msg[schedule[4]], msg[schedule[5]]);
g(state, 3, 7, 11, 15, msg[schedule[6]], msg[schedule[7]]);
// Mix the rows.
g(state, 0, 5, 10, 15, msg[schedule[8]], msg[schedule[9]]);
g(state, 1, 6, 11, 12, msg[schedule[10]], msg[schedule[11]]);
g(state, 2, 7, 8, 13, msg[schedule[12]], msg[schedule[13]]);
g(state, 3, 4, 9, 14, msg[schedule[14]], msg[schedule[15]]);
}
INLINE void compress_pre(uint32_t state[16], const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags) {
uint32_t block_words[16];
block_words[0] = load32(block + 4 * 0);
block_words[1] = load32(block + 4 * 1);
block_words[2] = load32(block + 4 * 2);
block_words[3] = load32(block + 4 * 3);
block_words[4] = load32(block + 4 * 4);
block_words[5] = load32(block + 4 * 5);
block_words[6] = load32(block + 4 * 6);
block_words[7] = load32(block + 4 * 7);
block_words[8] = load32(block + 4 * 8);
block_words[9] = load32(block + 4 * 9);
block_words[10] = load32(block + 4 * 10);
block_words[11] = load32(block + 4 * 11);
block_words[12] = load32(block + 4 * 12);
block_words[13] = load32(block + 4 * 13);
block_words[14] = load32(block + 4 * 14);
block_words[15] = load32(block + 4 * 15);
state[0] = cv[0];
state[1] = cv[1];
state[2] = cv[2];
state[3] = cv[3];
state[4] = cv[4];
state[5] = cv[5];
state[6] = cv[6];
state[7] = cv[7];
state[8] = IV[0];
state[9] = IV[1];
state[10] = IV[2];
state[11] = IV[3];
state[12] = counter_low(counter);
state[13] = counter_high(counter);
state[14] = (uint32_t)block_len;
state[15] = (uint32_t)flags;
round_fn(state, &block_words[0], 0);
round_fn(state, &block_words[0], 1);
round_fn(state, &block_words[0], 2);
round_fn(state, &block_words[0], 3);
round_fn(state, &block_words[0], 4);
round_fn(state, &block_words[0], 5);
round_fn(state, &block_words[0], 6);
}
void blake3_compress_in_place_portable(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags) {
uint32_t state[16];
compress_pre(state, cv, block, block_len, counter, flags);
cv[0] = state[0] ^ state[8];
cv[1] = state[1] ^ state[9];
cv[2] = state[2] ^ state[10];
cv[3] = state[3] ^ state[11];
cv[4] = state[4] ^ state[12];
cv[5] = state[5] ^ state[13];
cv[6] = state[6] ^ state[14];
cv[7] = state[7] ^ state[15];
}
void blake3_compress_xof_portable(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]) {
uint32_t state[16];
compress_pre(state, cv, block, block_len, counter, flags);
store32(&out[0 * 4], state[0] ^ state[8]);
store32(&out[1 * 4], state[1] ^ state[9]);
store32(&out[2 * 4], state[2] ^ state[10]);
store32(&out[3 * 4], state[3] ^ state[11]);
store32(&out[4 * 4], state[4] ^ state[12]);
store32(&out[5 * 4], state[5] ^ state[13]);
store32(&out[6 * 4], state[6] ^ state[14]);
store32(&out[7 * 4], state[7] ^ state[15]);
store32(&out[8 * 4], state[8] ^ cv[0]);
store32(&out[9 * 4], state[9] ^ cv[1]);
store32(&out[10 * 4], state[10] ^ cv[2]);
store32(&out[11 * 4], state[11] ^ cv[3]);
store32(&out[12 * 4], state[12] ^ cv[4]);
store32(&out[13 * 4], state[13] ^ cv[5]);
store32(&out[14 * 4], state[14] ^ cv[6]);
store32(&out[15 * 4], state[15] ^ cv[7]);
}
INLINE void hash_one_portable(const uint8_t *input, size_t blocks,
const uint32_t key[8], uint64_t counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t out[BLAKE3_OUT_LEN]) {
uint32_t cv[8];
memcpy(cv, key, BLAKE3_KEY_LEN);
uint8_t block_flags = flags | flags_start;
while (blocks > 0) {
if (blocks == 1) {
block_flags |= flags_end;
}
blake3_compress_in_place_portable(cv, input, BLAKE3_BLOCK_LEN, counter,
block_flags);
input = &input[BLAKE3_BLOCK_LEN];
blocks -= 1;
block_flags = flags;
}
store_cv_words(out, cv);
}
void blake3_hash_many_portable(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out) {
while (num_inputs > 0) {
hash_one_portable(inputs[0], blocks, key, counter, flags, flags_start,
flags_end, out);
if (increment_counter) {
counter += 1;
}
inputs += 1;
num_inputs -= 1;
out = &out[BLAKE3_OUT_LEN];
}
}

566
external/blake3/blake3_sse2.c vendored Normal file
View File

@@ -0,0 +1,566 @@
#include "blake3_impl.h"
#include <immintrin.h>
#define DEGREE 4
#define _mm_shuffle_ps2(a, b, c) \
(_mm_castps_si128( \
_mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))
INLINE __m128i loadu(const uint8_t src[16]) {
return _mm_loadu_si128((const __m128i *)src);
}
INLINE void storeu(__m128i src, uint8_t dest[16]) {
_mm_storeu_si128((__m128i *)dest, src);
}
INLINE __m128i addv(__m128i a, __m128i b) { return _mm_add_epi32(a, b); }
// Note that clang-format doesn't like the name "xor" for some reason.
INLINE __m128i xorv(__m128i a, __m128i b) { return _mm_xor_si128(a, b); }
INLINE __m128i set1(uint32_t x) { return _mm_set1_epi32((int32_t)x); }
INLINE __m128i set4(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
return _mm_setr_epi32((int32_t)a, (int32_t)b, (int32_t)c, (int32_t)d);
}
INLINE __m128i rot16(__m128i x) {
return _mm_shufflehi_epi16(_mm_shufflelo_epi16(x, 0xB1), 0xB1);
}
INLINE __m128i rot12(__m128i x) {
return xorv(_mm_srli_epi32(x, 12), _mm_slli_epi32(x, 32 - 12));
}
INLINE __m128i rot8(__m128i x) {
return xorv(_mm_srli_epi32(x, 8), _mm_slli_epi32(x, 32 - 8));
}
INLINE __m128i rot7(__m128i x) {
return xorv(_mm_srli_epi32(x, 7), _mm_slli_epi32(x, 32 - 7));
}
INLINE void g1(__m128i *row0, __m128i *row1, __m128i *row2, __m128i *row3,
__m128i m) {
*row0 = addv(addv(*row0, m), *row1);
*row3 = xorv(*row3, *row0);
*row3 = rot16(*row3);
*row2 = addv(*row2, *row3);
*row1 = xorv(*row1, *row2);
*row1 = rot12(*row1);
}
INLINE void g2(__m128i *row0, __m128i *row1, __m128i *row2, __m128i *row3,
__m128i m) {
*row0 = addv(addv(*row0, m), *row1);
*row3 = xorv(*row3, *row0);
*row3 = rot8(*row3);
*row2 = addv(*row2, *row3);
*row1 = xorv(*row1, *row2);
*row1 = rot7(*row1);
}
// Note the optimization here of leaving row1 as the unrotated row, rather than
// row0. All the message loads below are adjusted to compensate for this. See
// discussion at https://github.com/sneves/blake2-avx2/pull/4
INLINE void diagonalize(__m128i *row0, __m128i *row2, __m128i *row3) {
*row0 = _mm_shuffle_epi32(*row0, _MM_SHUFFLE(2, 1, 0, 3));
*row3 = _mm_shuffle_epi32(*row3, _MM_SHUFFLE(1, 0, 3, 2));
*row2 = _mm_shuffle_epi32(*row2, _MM_SHUFFLE(0, 3, 2, 1));
}
INLINE void undiagonalize(__m128i *row0, __m128i *row2, __m128i *row3) {
*row0 = _mm_shuffle_epi32(*row0, _MM_SHUFFLE(0, 3, 2, 1));
*row3 = _mm_shuffle_epi32(*row3, _MM_SHUFFLE(1, 0, 3, 2));
*row2 = _mm_shuffle_epi32(*row2, _MM_SHUFFLE(2, 1, 0, 3));
}
INLINE __m128i blend_epi16(__m128i a, __m128i b, const int16_t imm8) {
const __m128i bits = _mm_set_epi16(0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01);
__m128i mask = _mm_set1_epi16(imm8);
mask = _mm_and_si128(mask, bits);
mask = _mm_cmpeq_epi16(mask, bits);
return _mm_or_si128(_mm_and_si128(mask, b), _mm_andnot_si128(mask, a));
}
INLINE void compress_pre(__m128i rows[4], const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags) {
rows[0] = loadu((uint8_t *)&cv[0]);
rows[1] = loadu((uint8_t *)&cv[4]);
rows[2] = set4(IV[0], IV[1], IV[2], IV[3]);
rows[3] = set4(counter_low(counter), counter_high(counter),
(uint32_t)block_len, (uint32_t)flags);
__m128i m0 = loadu(&block[sizeof(__m128i) * 0]);
__m128i m1 = loadu(&block[sizeof(__m128i) * 1]);
__m128i m2 = loadu(&block[sizeof(__m128i) * 2]);
__m128i m3 = loadu(&block[sizeof(__m128i) * 3]);
__m128i t0, t1, t2, t3, tt;
// Round 1. The first round permutes the message words from the original
// input order, into the groups that get mixed in parallel.
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(2, 0, 2, 0)); // 6 4 2 0
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 3, 1)); // 7 5 3 1
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(2, 0, 2, 0)); // 14 12 10 8
t2 = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2, 1, 0, 3)); // 12 10 8 14
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 1, 3, 1)); // 15 13 11 9
t3 = _mm_shuffle_epi32(t3, _MM_SHUFFLE(2, 1, 0, 3)); // 13 11 9 15
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 2. This round and all following rounds apply a fixed permutation
// to the message words from the round before.
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 3
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 4
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 5
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 6
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 7
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
}
void blake3_compress_in_place_sse2(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags) {
__m128i rows[4];
compress_pre(rows, cv, block, block_len, counter, flags);
storeu(xorv(rows[0], rows[2]), (uint8_t *)&cv[0]);
storeu(xorv(rows[1], rows[3]), (uint8_t *)&cv[4]);
}
void blake3_compress_xof_sse2(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]) {
__m128i rows[4];
compress_pre(rows, cv, block, block_len, counter, flags);
storeu(xorv(rows[0], rows[2]), &out[0]);
storeu(xorv(rows[1], rows[3]), &out[16]);
storeu(xorv(rows[2], loadu((uint8_t *)&cv[0])), &out[32]);
storeu(xorv(rows[3], loadu((uint8_t *)&cv[4])), &out[48]);
}
INLINE void round_fn(__m128i v[16], __m128i m[16], size_t r) {
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][0]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][2]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][4]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][6]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[15] = rot16(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot12(v[4]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][1]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][3]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][5]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][7]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[15] = rot8(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot7(v[4]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][8]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][10]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][12]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][14]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot16(v[15]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[4] = rot12(v[4]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][9]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][11]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][13]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][15]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot8(v[15]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[4] = rot7(v[4]);
}
INLINE void transpose_vecs(__m128i vecs[DEGREE]) {
// Interleave 32-bit lanes. The low unpack is lanes 00/11 and the high is
// 22/33. Note that this doesn't split the vector into two lanes, as the
// AVX2 counterparts do.
__m128i ab_01 = _mm_unpacklo_epi32(vecs[0], vecs[1]);
__m128i ab_23 = _mm_unpackhi_epi32(vecs[0], vecs[1]);
__m128i cd_01 = _mm_unpacklo_epi32(vecs[2], vecs[3]);
__m128i cd_23 = _mm_unpackhi_epi32(vecs[2], vecs[3]);
// Interleave 64-bit lanes.
__m128i abcd_0 = _mm_unpacklo_epi64(ab_01, cd_01);
__m128i abcd_1 = _mm_unpackhi_epi64(ab_01, cd_01);
__m128i abcd_2 = _mm_unpacklo_epi64(ab_23, cd_23);
__m128i abcd_3 = _mm_unpackhi_epi64(ab_23, cd_23);
vecs[0] = abcd_0;
vecs[1] = abcd_1;
vecs[2] = abcd_2;
vecs[3] = abcd_3;
}
INLINE void transpose_msg_vecs(const uint8_t *const *inputs,
size_t block_offset, __m128i out[16]) {
out[0] = loadu(&inputs[0][block_offset + 0 * sizeof(__m128i)]);
out[1] = loadu(&inputs[1][block_offset + 0 * sizeof(__m128i)]);
out[2] = loadu(&inputs[2][block_offset + 0 * sizeof(__m128i)]);
out[3] = loadu(&inputs[3][block_offset + 0 * sizeof(__m128i)]);
out[4] = loadu(&inputs[0][block_offset + 1 * sizeof(__m128i)]);
out[5] = loadu(&inputs[1][block_offset + 1 * sizeof(__m128i)]);
out[6] = loadu(&inputs[2][block_offset + 1 * sizeof(__m128i)]);
out[7] = loadu(&inputs[3][block_offset + 1 * sizeof(__m128i)]);
out[8] = loadu(&inputs[0][block_offset + 2 * sizeof(__m128i)]);
out[9] = loadu(&inputs[1][block_offset + 2 * sizeof(__m128i)]);
out[10] = loadu(&inputs[2][block_offset + 2 * sizeof(__m128i)]);
out[11] = loadu(&inputs[3][block_offset + 2 * sizeof(__m128i)]);
out[12] = loadu(&inputs[0][block_offset + 3 * sizeof(__m128i)]);
out[13] = loadu(&inputs[1][block_offset + 3 * sizeof(__m128i)]);
out[14] = loadu(&inputs[2][block_offset + 3 * sizeof(__m128i)]);
out[15] = loadu(&inputs[3][block_offset + 3 * sizeof(__m128i)]);
for (size_t i = 0; i < 4; ++i) {
_mm_prefetch((const void *)&inputs[i][block_offset + 256], _MM_HINT_T0);
}
transpose_vecs(&out[0]);
transpose_vecs(&out[4]);
transpose_vecs(&out[8]);
transpose_vecs(&out[12]);
}
INLINE void load_counters(uint64_t counter, bool increment_counter,
__m128i *out_lo, __m128i *out_hi) {
const __m128i mask = _mm_set1_epi32(-(int32_t)increment_counter);
const __m128i add0 = _mm_set_epi32(3, 2, 1, 0);
const __m128i add1 = _mm_and_si128(mask, add0);
__m128i l = _mm_add_epi32(_mm_set1_epi32((int32_t)counter), add1);
__m128i carry = _mm_cmpgt_epi32(_mm_xor_si128(add1, _mm_set1_epi32(0x80000000)),
_mm_xor_si128( l, _mm_set1_epi32(0x80000000)));
__m128i h = _mm_sub_epi32(_mm_set1_epi32((int32_t)(counter >> 32)), carry);
*out_lo = l;
*out_hi = h;
}
static
void blake3_hash4_sse2(const uint8_t *const *inputs, size_t blocks,
const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out) {
__m128i h_vecs[8] = {
set1(key[0]), set1(key[1]), set1(key[2]), set1(key[3]),
set1(key[4]), set1(key[5]), set1(key[6]), set1(key[7]),
};
__m128i counter_low_vec, counter_high_vec;
load_counters(counter, increment_counter, &counter_low_vec,
&counter_high_vec);
uint8_t block_flags = flags | flags_start;
for (size_t block = 0; block < blocks; block++) {
if (block + 1 == blocks) {
block_flags |= flags_end;
}
__m128i block_len_vec = set1(BLAKE3_BLOCK_LEN);
__m128i block_flags_vec = set1(block_flags);
__m128i msg_vecs[16];
transpose_msg_vecs(inputs, block * BLAKE3_BLOCK_LEN, msg_vecs);
__m128i v[16] = {
h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3],
h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7],
set1(IV[0]), set1(IV[1]), set1(IV[2]), set1(IV[3]),
counter_low_vec, counter_high_vec, block_len_vec, block_flags_vec,
};
round_fn(v, msg_vecs, 0);
round_fn(v, msg_vecs, 1);
round_fn(v, msg_vecs, 2);
round_fn(v, msg_vecs, 3);
round_fn(v, msg_vecs, 4);
round_fn(v, msg_vecs, 5);
round_fn(v, msg_vecs, 6);
h_vecs[0] = xorv(v[0], v[8]);
h_vecs[1] = xorv(v[1], v[9]);
h_vecs[2] = xorv(v[2], v[10]);
h_vecs[3] = xorv(v[3], v[11]);
h_vecs[4] = xorv(v[4], v[12]);
h_vecs[5] = xorv(v[5], v[13]);
h_vecs[6] = xorv(v[6], v[14]);
h_vecs[7] = xorv(v[7], v[15]);
block_flags = flags;
}
transpose_vecs(&h_vecs[0]);
transpose_vecs(&h_vecs[4]);
// The first four vecs now contain the first half of each output, and the
// second four vecs contain the second half of each output.
storeu(h_vecs[0], &out[0 * sizeof(__m128i)]);
storeu(h_vecs[4], &out[1 * sizeof(__m128i)]);
storeu(h_vecs[1], &out[2 * sizeof(__m128i)]);
storeu(h_vecs[5], &out[3 * sizeof(__m128i)]);
storeu(h_vecs[2], &out[4 * sizeof(__m128i)]);
storeu(h_vecs[6], &out[5 * sizeof(__m128i)]);
storeu(h_vecs[3], &out[6 * sizeof(__m128i)]);
storeu(h_vecs[7], &out[7 * sizeof(__m128i)]);
}
INLINE void hash_one_sse2(const uint8_t *input, size_t blocks,
const uint32_t key[8], uint64_t counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t out[BLAKE3_OUT_LEN]) {
uint32_t cv[8];
memcpy(cv, key, BLAKE3_KEY_LEN);
uint8_t block_flags = flags | flags_start;
while (blocks > 0) {
if (blocks == 1) {
block_flags |= flags_end;
}
blake3_compress_in_place_sse2(cv, input, BLAKE3_BLOCK_LEN, counter,
block_flags);
input = &input[BLAKE3_BLOCK_LEN];
blocks -= 1;
block_flags = flags;
}
memcpy(out, cv, BLAKE3_OUT_LEN);
}
void blake3_hash_many_sse2(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out) {
while (num_inputs >= DEGREE) {
blake3_hash4_sse2(inputs, blocks, key, counter, increment_counter, flags,
flags_start, flags_end, out);
if (increment_counter) {
counter += DEGREE;
}
inputs += DEGREE;
num_inputs -= DEGREE;
out = &out[DEGREE * BLAKE3_OUT_LEN];
}
while (num_inputs > 0) {
hash_one_sse2(inputs[0], blocks, key, counter, flags, flags_start,
flags_end, out);
if (increment_counter) {
counter += 1;
}
inputs += 1;
num_inputs -= 1;
out = &out[BLAKE3_OUT_LEN];
}
}

2291
external/blake3/blake3_sse2_x86-64_unix.S vendored Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

560
external/blake3/blake3_sse41.c vendored Normal file
View File

@@ -0,0 +1,560 @@
#include "blake3_impl.h"
#include <immintrin.h>
#define DEGREE 4
#define _mm_shuffle_ps2(a, b, c) \
(_mm_castps_si128( \
_mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))
INLINE __m128i loadu(const uint8_t src[16]) {
return _mm_loadu_si128((const __m128i *)src);
}
INLINE void storeu(__m128i src, uint8_t dest[16]) {
_mm_storeu_si128((__m128i *)dest, src);
}
INLINE __m128i addv(__m128i a, __m128i b) { return _mm_add_epi32(a, b); }
// Note that clang-format doesn't like the name "xor" for some reason.
INLINE __m128i xorv(__m128i a, __m128i b) { return _mm_xor_si128(a, b); }
INLINE __m128i set1(uint32_t x) { return _mm_set1_epi32((int32_t)x); }
INLINE __m128i set4(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
return _mm_setr_epi32((int32_t)a, (int32_t)b, (int32_t)c, (int32_t)d);
}
INLINE __m128i rot16(__m128i x) {
return _mm_shuffle_epi8(
x, _mm_set_epi8(13, 12, 15, 14, 9, 8, 11, 10, 5, 4, 7, 6, 1, 0, 3, 2));
}
INLINE __m128i rot12(__m128i x) {
return xorv(_mm_srli_epi32(x, 12), _mm_slli_epi32(x, 32 - 12));
}
INLINE __m128i rot8(__m128i x) {
return _mm_shuffle_epi8(
x, _mm_set_epi8(12, 15, 14, 13, 8, 11, 10, 9, 4, 7, 6, 5, 0, 3, 2, 1));
}
INLINE __m128i rot7(__m128i x) {
return xorv(_mm_srli_epi32(x, 7), _mm_slli_epi32(x, 32 - 7));
}
INLINE void g1(__m128i *row0, __m128i *row1, __m128i *row2, __m128i *row3,
__m128i m) {
*row0 = addv(addv(*row0, m), *row1);
*row3 = xorv(*row3, *row0);
*row3 = rot16(*row3);
*row2 = addv(*row2, *row3);
*row1 = xorv(*row1, *row2);
*row1 = rot12(*row1);
}
INLINE void g2(__m128i *row0, __m128i *row1, __m128i *row2, __m128i *row3,
__m128i m) {
*row0 = addv(addv(*row0, m), *row1);
*row3 = xorv(*row3, *row0);
*row3 = rot8(*row3);
*row2 = addv(*row2, *row3);
*row1 = xorv(*row1, *row2);
*row1 = rot7(*row1);
}
// Note the optimization here of leaving row1 as the unrotated row, rather than
// row0. All the message loads below are adjusted to compensate for this. See
// discussion at https://github.com/sneves/blake2-avx2/pull/4
INLINE void diagonalize(__m128i *row0, __m128i *row2, __m128i *row3) {
*row0 = _mm_shuffle_epi32(*row0, _MM_SHUFFLE(2, 1, 0, 3));
*row3 = _mm_shuffle_epi32(*row3, _MM_SHUFFLE(1, 0, 3, 2));
*row2 = _mm_shuffle_epi32(*row2, _MM_SHUFFLE(0, 3, 2, 1));
}
INLINE void undiagonalize(__m128i *row0, __m128i *row2, __m128i *row3) {
*row0 = _mm_shuffle_epi32(*row0, _MM_SHUFFLE(0, 3, 2, 1));
*row3 = _mm_shuffle_epi32(*row3, _MM_SHUFFLE(1, 0, 3, 2));
*row2 = _mm_shuffle_epi32(*row2, _MM_SHUFFLE(2, 1, 0, 3));
}
INLINE void compress_pre(__m128i rows[4], const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter, uint8_t flags) {
rows[0] = loadu((uint8_t *)&cv[0]);
rows[1] = loadu((uint8_t *)&cv[4]);
rows[2] = set4(IV[0], IV[1], IV[2], IV[3]);
rows[3] = set4(counter_low(counter), counter_high(counter),
(uint32_t)block_len, (uint32_t)flags);
__m128i m0 = loadu(&block[sizeof(__m128i) * 0]);
__m128i m1 = loadu(&block[sizeof(__m128i) * 1]);
__m128i m2 = loadu(&block[sizeof(__m128i) * 2]);
__m128i m3 = loadu(&block[sizeof(__m128i) * 3]);
__m128i t0, t1, t2, t3, tt;
// Round 1. The first round permutes the message words from the original
// input order, into the groups that get mixed in parallel.
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(2, 0, 2, 0)); // 6 4 2 0
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 3, 1)); // 7 5 3 1
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(2, 0, 2, 0)); // 14 12 10 8
t2 = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2, 1, 0, 3)); // 12 10 8 14
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 1, 3, 1)); // 15 13 11 9
t3 = _mm_shuffle_epi32(t3, _MM_SHUFFLE(2, 1, 0, 3)); // 13 11 9 15
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 2. This round and all following rounds apply a fixed permutation
// to the message words from the round before.
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 3
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 4
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 5
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 6
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
m0 = t0;
m1 = t1;
m2 = t2;
m3 = t3;
// Round 7
t0 = _mm_shuffle_ps2(m0, m1, _MM_SHUFFLE(3, 1, 1, 2));
t0 = _mm_shuffle_epi32(t0, _MM_SHUFFLE(0, 3, 2, 1));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t0);
t1 = _mm_shuffle_ps2(m2, m3, _MM_SHUFFLE(3, 3, 2, 2));
tt = _mm_shuffle_epi32(m0, _MM_SHUFFLE(0, 0, 3, 3));
t1 = _mm_blend_epi16(tt, t1, 0xCC);
g2(&rows[0], &rows[1], &rows[2], &rows[3], t1);
diagonalize(&rows[0], &rows[2], &rows[3]);
t2 = _mm_unpacklo_epi64(m3, m1);
tt = _mm_blend_epi16(t2, m2, 0xC0);
t2 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(1, 3, 2, 0));
g1(&rows[0], &rows[1], &rows[2], &rows[3], t2);
t3 = _mm_unpackhi_epi32(m1, m3);
tt = _mm_unpacklo_epi32(m2, t3);
t3 = _mm_shuffle_epi32(tt, _MM_SHUFFLE(0, 1, 3, 2));
g2(&rows[0], &rows[1], &rows[2], &rows[3], t3);
undiagonalize(&rows[0], &rows[2], &rows[3]);
}
void blake3_compress_in_place_sse41(uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags) {
__m128i rows[4];
compress_pre(rows, cv, block, block_len, counter, flags);
storeu(xorv(rows[0], rows[2]), (uint8_t *)&cv[0]);
storeu(xorv(rows[1], rows[3]), (uint8_t *)&cv[4]);
}
void blake3_compress_xof_sse41(const uint32_t cv[8],
const uint8_t block[BLAKE3_BLOCK_LEN],
uint8_t block_len, uint64_t counter,
uint8_t flags, uint8_t out[64]) {
__m128i rows[4];
compress_pre(rows, cv, block, block_len, counter, flags);
storeu(xorv(rows[0], rows[2]), &out[0]);
storeu(xorv(rows[1], rows[3]), &out[16]);
storeu(xorv(rows[2], loadu((uint8_t *)&cv[0])), &out[32]);
storeu(xorv(rows[3], loadu((uint8_t *)&cv[4])), &out[48]);
}
INLINE void round_fn(__m128i v[16], __m128i m[16], size_t r) {
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][0]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][2]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][4]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][6]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[15] = rot16(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot12(v[4]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][1]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][3]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][5]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][7]]);
v[0] = addv(v[0], v[4]);
v[1] = addv(v[1], v[5]);
v[2] = addv(v[2], v[6]);
v[3] = addv(v[3], v[7]);
v[12] = xorv(v[12], v[0]);
v[13] = xorv(v[13], v[1]);
v[14] = xorv(v[14], v[2]);
v[15] = xorv(v[15], v[3]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[15] = rot8(v[15]);
v[8] = addv(v[8], v[12]);
v[9] = addv(v[9], v[13]);
v[10] = addv(v[10], v[14]);
v[11] = addv(v[11], v[15]);
v[4] = xorv(v[4], v[8]);
v[5] = xorv(v[5], v[9]);
v[6] = xorv(v[6], v[10]);
v[7] = xorv(v[7], v[11]);
v[4] = rot7(v[4]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][8]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][10]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][12]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][14]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot16(v[15]);
v[12] = rot16(v[12]);
v[13] = rot16(v[13]);
v[14] = rot16(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot12(v[5]);
v[6] = rot12(v[6]);
v[7] = rot12(v[7]);
v[4] = rot12(v[4]);
v[0] = addv(v[0], m[(size_t)MSG_SCHEDULE[r][9]]);
v[1] = addv(v[1], m[(size_t)MSG_SCHEDULE[r][11]]);
v[2] = addv(v[2], m[(size_t)MSG_SCHEDULE[r][13]]);
v[3] = addv(v[3], m[(size_t)MSG_SCHEDULE[r][15]]);
v[0] = addv(v[0], v[5]);
v[1] = addv(v[1], v[6]);
v[2] = addv(v[2], v[7]);
v[3] = addv(v[3], v[4]);
v[15] = xorv(v[15], v[0]);
v[12] = xorv(v[12], v[1]);
v[13] = xorv(v[13], v[2]);
v[14] = xorv(v[14], v[3]);
v[15] = rot8(v[15]);
v[12] = rot8(v[12]);
v[13] = rot8(v[13]);
v[14] = rot8(v[14]);
v[10] = addv(v[10], v[15]);
v[11] = addv(v[11], v[12]);
v[8] = addv(v[8], v[13]);
v[9] = addv(v[9], v[14]);
v[5] = xorv(v[5], v[10]);
v[6] = xorv(v[6], v[11]);
v[7] = xorv(v[7], v[8]);
v[4] = xorv(v[4], v[9]);
v[5] = rot7(v[5]);
v[6] = rot7(v[6]);
v[7] = rot7(v[7]);
v[4] = rot7(v[4]);
}
INLINE void transpose_vecs(__m128i vecs[DEGREE]) {
// Interleave 32-bit lanes. The low unpack is lanes 00/11 and the high is
// 22/33. Note that this doesn't split the vector into two lanes, as the
// AVX2 counterparts do.
__m128i ab_01 = _mm_unpacklo_epi32(vecs[0], vecs[1]);
__m128i ab_23 = _mm_unpackhi_epi32(vecs[0], vecs[1]);
__m128i cd_01 = _mm_unpacklo_epi32(vecs[2], vecs[3]);
__m128i cd_23 = _mm_unpackhi_epi32(vecs[2], vecs[3]);
// Interleave 64-bit lanes.
__m128i abcd_0 = _mm_unpacklo_epi64(ab_01, cd_01);
__m128i abcd_1 = _mm_unpackhi_epi64(ab_01, cd_01);
__m128i abcd_2 = _mm_unpacklo_epi64(ab_23, cd_23);
__m128i abcd_3 = _mm_unpackhi_epi64(ab_23, cd_23);
vecs[0] = abcd_0;
vecs[1] = abcd_1;
vecs[2] = abcd_2;
vecs[3] = abcd_3;
}
INLINE void transpose_msg_vecs(const uint8_t *const *inputs,
size_t block_offset, __m128i out[16]) {
out[0] = loadu(&inputs[0][block_offset + 0 * sizeof(__m128i)]);
out[1] = loadu(&inputs[1][block_offset + 0 * sizeof(__m128i)]);
out[2] = loadu(&inputs[2][block_offset + 0 * sizeof(__m128i)]);
out[3] = loadu(&inputs[3][block_offset + 0 * sizeof(__m128i)]);
out[4] = loadu(&inputs[0][block_offset + 1 * sizeof(__m128i)]);
out[5] = loadu(&inputs[1][block_offset + 1 * sizeof(__m128i)]);
out[6] = loadu(&inputs[2][block_offset + 1 * sizeof(__m128i)]);
out[7] = loadu(&inputs[3][block_offset + 1 * sizeof(__m128i)]);
out[8] = loadu(&inputs[0][block_offset + 2 * sizeof(__m128i)]);
out[9] = loadu(&inputs[1][block_offset + 2 * sizeof(__m128i)]);
out[10] = loadu(&inputs[2][block_offset + 2 * sizeof(__m128i)]);
out[11] = loadu(&inputs[3][block_offset + 2 * sizeof(__m128i)]);
out[12] = loadu(&inputs[0][block_offset + 3 * sizeof(__m128i)]);
out[13] = loadu(&inputs[1][block_offset + 3 * sizeof(__m128i)]);
out[14] = loadu(&inputs[2][block_offset + 3 * sizeof(__m128i)]);
out[15] = loadu(&inputs[3][block_offset + 3 * sizeof(__m128i)]);
for (size_t i = 0; i < 4; ++i) {
_mm_prefetch((const void *)&inputs[i][block_offset + 256], _MM_HINT_T0);
}
transpose_vecs(&out[0]);
transpose_vecs(&out[4]);
transpose_vecs(&out[8]);
transpose_vecs(&out[12]);
}
INLINE void load_counters(uint64_t counter, bool increment_counter,
__m128i *out_lo, __m128i *out_hi) {
const __m128i mask = _mm_set1_epi32(-(int32_t)increment_counter);
const __m128i add0 = _mm_set_epi32(3, 2, 1, 0);
const __m128i add1 = _mm_and_si128(mask, add0);
__m128i l = _mm_add_epi32(_mm_set1_epi32((int32_t)counter), add1);
__m128i carry = _mm_cmpgt_epi32(_mm_xor_si128(add1, _mm_set1_epi32(0x80000000)),
_mm_xor_si128( l, _mm_set1_epi32(0x80000000)));
__m128i h = _mm_sub_epi32(_mm_set1_epi32((int32_t)(counter >> 32)), carry);
*out_lo = l;
*out_hi = h;
}
static
void blake3_hash4_sse41(const uint8_t *const *inputs, size_t blocks,
const uint32_t key[8], uint64_t counter,
bool increment_counter, uint8_t flags,
uint8_t flags_start, uint8_t flags_end, uint8_t *out) {
__m128i h_vecs[8] = {
set1(key[0]), set1(key[1]), set1(key[2]), set1(key[3]),
set1(key[4]), set1(key[5]), set1(key[6]), set1(key[7]),
};
__m128i counter_low_vec, counter_high_vec;
load_counters(counter, increment_counter, &counter_low_vec,
&counter_high_vec);
uint8_t block_flags = flags | flags_start;
for (size_t block = 0; block < blocks; block++) {
if (block + 1 == blocks) {
block_flags |= flags_end;
}
__m128i block_len_vec = set1(BLAKE3_BLOCK_LEN);
__m128i block_flags_vec = set1(block_flags);
__m128i msg_vecs[16];
transpose_msg_vecs(inputs, block * BLAKE3_BLOCK_LEN, msg_vecs);
__m128i v[16] = {
h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3],
h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7],
set1(IV[0]), set1(IV[1]), set1(IV[2]), set1(IV[3]),
counter_low_vec, counter_high_vec, block_len_vec, block_flags_vec,
};
round_fn(v, msg_vecs, 0);
round_fn(v, msg_vecs, 1);
round_fn(v, msg_vecs, 2);
round_fn(v, msg_vecs, 3);
round_fn(v, msg_vecs, 4);
round_fn(v, msg_vecs, 5);
round_fn(v, msg_vecs, 6);
h_vecs[0] = xorv(v[0], v[8]);
h_vecs[1] = xorv(v[1], v[9]);
h_vecs[2] = xorv(v[2], v[10]);
h_vecs[3] = xorv(v[3], v[11]);
h_vecs[4] = xorv(v[4], v[12]);
h_vecs[5] = xorv(v[5], v[13]);
h_vecs[6] = xorv(v[6], v[14]);
h_vecs[7] = xorv(v[7], v[15]);
block_flags = flags;
}
transpose_vecs(&h_vecs[0]);
transpose_vecs(&h_vecs[4]);
// The first four vecs now contain the first half of each output, and the
// second four vecs contain the second half of each output.
storeu(h_vecs[0], &out[0 * sizeof(__m128i)]);
storeu(h_vecs[4], &out[1 * sizeof(__m128i)]);
storeu(h_vecs[1], &out[2 * sizeof(__m128i)]);
storeu(h_vecs[5], &out[3 * sizeof(__m128i)]);
storeu(h_vecs[2], &out[4 * sizeof(__m128i)]);
storeu(h_vecs[6], &out[5 * sizeof(__m128i)]);
storeu(h_vecs[3], &out[6 * sizeof(__m128i)]);
storeu(h_vecs[7], &out[7 * sizeof(__m128i)]);
}
INLINE void hash_one_sse41(const uint8_t *input, size_t blocks,
const uint32_t key[8], uint64_t counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t out[BLAKE3_OUT_LEN]) {
uint32_t cv[8];
memcpy(cv, key, BLAKE3_KEY_LEN);
uint8_t block_flags = flags | flags_start;
while (blocks > 0) {
if (blocks == 1) {
block_flags |= flags_end;
}
blake3_compress_in_place_sse41(cv, input, BLAKE3_BLOCK_LEN, counter,
block_flags);
input = &input[BLAKE3_BLOCK_LEN];
blocks -= 1;
block_flags = flags;
}
memcpy(out, cv, BLAKE3_OUT_LEN);
}
void blake3_hash_many_sse41(const uint8_t *const *inputs, size_t num_inputs,
size_t blocks, const uint32_t key[8],
uint64_t counter, bool increment_counter,
uint8_t flags, uint8_t flags_start,
uint8_t flags_end, uint8_t *out) {
while (num_inputs >= DEGREE) {
blake3_hash4_sse41(inputs, blocks, key, counter, increment_counter, flags,
flags_start, flags_end, out);
if (increment_counter) {
counter += DEGREE;
}
inputs += DEGREE;
num_inputs -= DEGREE;
out = &out[DEGREE * BLAKE3_OUT_LEN];
}
while (num_inputs > 0) {
hash_one_sse41(inputs[0], blocks, key, counter, flags, flags_start,
flags_end, out);
if (increment_counter) {
counter += 1;
}
inputs += 1;
num_inputs -= 1;
out = &out[BLAKE3_OUT_LEN];
}
}

2028
external/blake3/blake3_sse41_x86-64_unix.S vendored Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

37
external/blake3/blake3_tbb.cpp vendored Normal file
View File

@@ -0,0 +1,37 @@
#include <cstddef>
#include <cstdint>
#include <oneapi/tbb/parallel_invoke.h>
#include "blake3_impl.h"
static_assert(TBB_USE_EXCEPTIONS == 0,
"This file should be compiled with C++ exceptions disabled.");
extern "C" void blake3_compress_subtree_wide_join_tbb(
// shared params
const uint32_t key[8], uint8_t flags, bool use_tbb,
// left-hand side params
const uint8_t *l_input, size_t l_input_len, uint64_t l_chunk_counter,
uint8_t *l_cvs, size_t *l_n,
// right-hand side params
const uint8_t *r_input, size_t r_input_len, uint64_t r_chunk_counter,
uint8_t *r_cvs, size_t *r_n) noexcept {
if (!use_tbb) {
*l_n = blake3_compress_subtree_wide(l_input, l_input_len, key,
l_chunk_counter, flags, l_cvs, use_tbb);
*r_n = blake3_compress_subtree_wide(r_input, r_input_len, key,
r_chunk_counter, flags, r_cvs, use_tbb);
return;
}
oneapi::tbb::parallel_invoke(
[=]() {
*l_n = blake3_compress_subtree_wide(
l_input, l_input_len, key, l_chunk_counter, flags, l_cvs, use_tbb);
},
[=]() {
*r_n = blake3_compress_subtree_wide(
r_input, r_input_len, key, r_chunk_counter, flags, r_cvs, use_tbb);
});
}

View File

@@ -0,0 +1,235 @@
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
if(BUILD_SHARED_LIBS)
message(FATAL_ERROR "BUILD_SHARED_LIBS is incompatible with BLAKE3_TESTING_CI")
endif()
include(CTest)
# Declare a testing specific variant of the `blake3` library target.
#
# We use a separate library target in order to be able to perform compilation with various
# combinations of features which are too noisy to specify in the main CMake config as options for
# the normal `blake3` target.
#
# Initially this target has no properties but eventually we will populate them by copying all of the
# relevant properties from the normal `blake3` target.
add_library(blake3-testing
blake3.c
blake3_dispatch.c
blake3_portable.c
)
if(BLAKE3_USE_TBB AND TBB_FOUND)
target_sources(blake3-testing
PRIVATE
blake3_tbb.cpp)
endif()
if(BLAKE3_SIMD_TYPE STREQUAL "amd64-asm")
# Conditionally add amd64 asm files to `blake3-testing` sources
if(MSVC)
if(NOT BLAKE3_NO_AVX2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx2_x86-64_windows_msvc.asm)
endif()
if(NOT BLAKE3_NO_AVX512)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx512_x86-64_windows_msvc.asm)
endif()
if(NOT BLAKE3_NO_SSE2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse2_x86-64_windows_msvc.asm)
endif()
if(NOT BLAKE3_NO_SSE41)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse41_x86-64_windows_msvc.asm)
endif()
elseif(CMAKE_C_COMPILER_ID STREQUAL "GNU"
OR CMAKE_C_COMPILER_ID STREQUAL "Clang"
OR CMAKE_C_COMPILER_ID STREQUAL "AppleClang")
if (WIN32)
if(NOT BLAKE3_NO_AVX2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx2_x86-64_windows_gnu.S)
endif()
if(NOT BLAKE3_NO_AVX512)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx512_x86-64_windows_gnu.S)
endif()
if(NOT BLAKE3_NO_SSE2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse2_x86-64_windows_gnu.S)
endif()
if(NOT BLAKE3_NO_SSE41)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse41_x86-64_windows_gnu.S)
endif()
elseif(UNIX)
if(NOT BLAKE3_NO_AVX2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx2_x86-64_unix.S)
endif()
if(NOT BLAKE3_NO_AVX512)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_avx512_x86-64_unix.S)
endif()
if(NOT BLAKE3_NO_SSE2)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse2_x86-64_unix.S)
endif()
if(NOT BLAKE3_NO_SSE41)
list(APPEND BLAKE3_TESTING_AMD64_ASM_SOURCES blake3_sse41_x86-64_unix.S)
endif()
endif()
endif()
target_sources(blake3-testing PRIVATE ${BLAKE3_AMD64_ASM_SOURCES})
elseif(BLAKE3_SIMD_TYPE STREQUAL "x86-intrinsics")
# Conditionally add amd64 C files to `blake3-testing` sources
if (NOT DEFINED BLAKE3_CFLAGS_SSE2
OR NOT DEFINED BLAKE3_CFLAGS_SSE4.1
OR NOT DEFINED BLAKE3_CFLAGS_AVX2
OR NOT DEFINED BLAKE3_CFLAGS_AVX512)
message(WARNING "BLAKE3_SIMD_TYPE is set to 'x86-intrinsics' but no compiler flags are available for the target architecture.")
else()
set(BLAKE3_SIMD_X86_INTRINSICS ON)
endif()
if(NOT BLAKE3_NO_AVX2)
target_sources(blake3-testing PRIVATE blake3_avx2.c)
set_source_files_properties(blake3_avx2.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_AVX2}")
endif()
if(NOT BLAKE3_NO_AVX512)
target_sources(blake3-testing PRIVATE blake3_avx512.c)
set_source_files_properties(blake3_avx512.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_AVX512}")
endif()
if(NOT BLAKE3_NO_SSE2)
target_sources(blake3-testing PRIVATE blake3_sse2.c)
set_source_files_properties(blake3_sse2.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_SSE2}")
endif()
if(NOT BLAKE3_NO_SSE41)
target_sources(blake3-testing PRIVATE blake3_sse41.c)
set_source_files_properties(blake3_sse41.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_SSE4.1}")
endif()
elseif(BLAKE3_SIMD_TYPE STREQUAL "neon-intrinsics")
# Conditionally add neon C files to `blake3-testing` sources
target_sources(blake3-testing PRIVATE
blake3_neon.c
)
target_compile_definitions(blake3-testing PRIVATE
BLAKE3_USE_NEON=1
)
if (DEFINED BLAKE3_CFLAGS_NEON)
set_source_files_properties(blake3_neon.c PROPERTIES COMPILE_FLAGS "${BLAKE3_CFLAGS_NEON}")
endif()
elseif(BLAKE3_SIMD_TYPE STREQUAL "none")
# Disable neon if simd type is "none". We check for individual amd64 features further below.
target_compile_definitions(blake3-testing PRIVATE
BLAKE3_USE_NEON=0
)
endif()
if(BLAKE3_NO_AVX2)
target_compile_definitions(blake3-testing PRIVATE BLAKE3_NO_AVX2)
endif()
if(BLAKE3_NO_AVX512)
target_compile_definitions(blake3-testing PRIVATE BLAKE3_NO_AVX512)
endif()
if(BLAKE3_NO_SSE2)
target_compile_definitions(blake3-testing PRIVATE BLAKE3_NO_SSE2)
endif()
if(BLAKE3_NO_SSE41)
target_compile_definitions(blake3-testing PRIVATE BLAKE3_NO_SSE41)
endif()
target_compile_definitions(blake3-testing PUBLIC BLAKE3_TESTING)
get_target_property(BLAKE3_COMPILE_DEFINITIONS blake3 COMPILE_DEFINITIONS)
if(BLAKE3_COMPILE_DEFINITIONS)
target_compile_definitions(blake3-testing PUBLIC
${BLAKE3_COMPILE_DEFINITIONS})
endif()
get_target_property(BLAKE3_COMPILE_OPTIONS blake3 COMPILE_OPTIONS)
if(BLAKE3_COMPILE_OPTIONS)
target_compile_options(blake3-testing PRIVATE
${BLAKE3_COMPILE_OPTIONS}
-O3
-Wall
-Wextra
-pedantic
-fstack-protector-strong
-D_FORTIFY_SOURCE=2
-fPIE
-fvisibility=hidden
-fsanitize=address,undefined
)
endif()
get_target_property(BLAKE3_INCLUDE_DIRECTORIES blake3 INCLUDE_DIRECTORIES)
if(BLAKE3_INCLUDE_DIRECTORIES)
target_include_directories(blake3-testing PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
)
endif()
get_target_property(BLAKE3_LINK_LIBRARIES blake3 LINK_LIBRARIES)
if(BLAKE3_LINK_LIBRARIES)
target_link_libraries(blake3-testing PRIVATE ${BLAKE3_LINK_LIBRARIES})
endif()
get_target_property(BLAKE3_LINK_OPTIONS blake3 LINK_OPTIONS)
if(BLAKE3_LINK_OPTIONS)
target_link_options(blake3-testing PRIVATE
${BLAKE3_LINK_OPTIONS}
-fsanitize=address,undefined
-pie
-Wl,-z,relro,-z,now
)
endif()
# test asm target
add_executable(blake3-asm-test
main.c
)
set_target_properties(blake3-asm-test PROPERTIES
OUTPUT_NAME blake3
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR})
target_link_libraries(blake3-asm-test PRIVATE blake3-testing)
target_compile_definitions(blake3-asm-test PRIVATE BLAKE3_TESTING)
target_compile_options(blake3-asm-test PRIVATE
-O3
-Wall
-Wextra
-pedantic
-fstack-protector-strong
-D_FORTIFY_SOURCE=2
-fPIE
-fvisibility=hidden
-fsanitize=address,undefined
)
target_link_options(blake3-asm-test PRIVATE
-fsanitize=address,undefined
-pie
-Wl,-z,relro,-z,now
)
add_test(NAME blake3-testing
COMMAND "${CMAKE_CTEST_COMMAND}"
--verbose
--extra-verbose
--build-and-test "${CMAKE_SOURCE_DIR}" "${CMAKE_BINARY_DIR}"
--build-generator "${CMAKE_GENERATOR}"
--build-makeprogram "${CMAKE_MAKE_PROGRAM}"
--build-project libblake3
--build-target blake3-asm-test
--build-options
--fresh
"-DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}"
"-DBLAKE3_TESTING=${BLAKE3_TESTING}"
"-DBLAKE3_TESTING_CI=${BLAKE3_TESTING_CI}"
"-DBLAKE3_USE_TBB=${BLAKE3_USE_TBB}"
"-DBLAKE3_SIMD_TYPE=${BLAKE3_SIMD_TYPE}"
"-DBLAKE3_NO_SSE2=${BLAKE3_NO_SSE2}"
"-DBLAKE3_NO_SSE41=${BLAKE3_NO_SSE41}"
"-DBLAKE3_NO_AVX2=${BLAKE3_NO_AVX2}"
"-DBLAKE3_NO_AVX512=${BLAKE3_NO_AVX512}"
--test-command
"${CMAKE_SOURCE_DIR}/test.py"
)

View File

@@ -0,0 +1,13 @@
if(NOT WIN32)
add_executable(blake3-example
example.c)
target_link_libraries(blake3-example PRIVATE blake3)
install(TARGETS blake3-example)
if(BLAKE3_USE_TBB)
add_executable(blake3-example-tbb
example_tbb.c)
target_link_libraries(blake3-example-tbb PRIVATE blake3)
install(TARGETS blake3-example-tbb)
endif()
endif()

View File

@@ -0,0 +1,3 @@
if(BLAKE3_TESTING_CI)
include(BLAKE3/ContinuousIntegration)
endif()

View File

@@ -0,0 +1,3 @@
if(BLAKE3_USE_TBB)
add_subdirectory(tbb)
endif()

View File

@@ -0,0 +1,28 @@
find_package(TBB 2021.11.0 QUIET)
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.11)
include(FetchContent)
if(NOT TBB_FOUND AND BLAKE3_FETCH_TBB)
set(CMAKE_C_STANDARD 99)
set(CMAKE_C_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_EXTENSIONS ON)
option(TBB_TEST OFF "")
option(TBBMALLOC_BUILD OFF "")
mark_as_advanced(TBB_TEST)
mark_as_advanced(TBBMALLOC_BUILD)
FetchContent_Declare(
TBB
GIT_REPOSITORY https://github.com/uxlfoundation/oneTBB
GIT_TAG 0c0ff192a2304e114bc9e6557582dfba101360ff # v2022.0.0
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(TBB)
endif()
endif()

36
external/blake3/example.c vendored Normal file
View File

@@ -0,0 +1,36 @@
#include "blake3.h"
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(void) {
// Initialize the hasher.
blake3_hasher hasher;
blake3_hasher_init(&hasher);
// Read input bytes from stdin.
unsigned char buf[65536];
while (1) {
ssize_t n = read(STDIN_FILENO, buf, sizeof(buf));
if (n > 0) {
blake3_hasher_update(&hasher, buf, n);
} else if (n == 0) {
break; // end of file
} else {
fprintf(stderr, "read failed: %s\n", strerror(errno));
return 1;
}
}
// Finalize the hash. BLAKE3_OUT_LEN is the default output length, 32 bytes.
uint8_t output[BLAKE3_OUT_LEN];
blake3_hasher_finalize(&hasher, output, BLAKE3_OUT_LEN);
// Print the hash as hexadecimal.
for (size_t i = 0; i < BLAKE3_OUT_LEN; i++) {
printf("%02x", output[i]);
}
printf("\n");
return 0;
}

57
external/blake3/example_tbb.c vendored Normal file
View File

@@ -0,0 +1,57 @@
#include "blake3.h"
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char **argv) {
// For each filepath argument, memory map it and hash it.
for (int i = 1; i < argc; i++) {
// Open and memory map the file.
int fd = open(argv[i], O_RDONLY);
if (fd == -1) {
fprintf(stderr, "open failed: %s\n", strerror(errno));
return 1;
}
struct stat statbuf;
if (fstat(fd, &statbuf) == -1) {
fprintf(stderr, "stat failed: %s\n", strerror(errno));
return 1;
}
void *mapped = mmap(NULL, statbuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (mapped == MAP_FAILED) {
fprintf(stderr, "mmap failed: %s\n", strerror(errno));
return 1;
}
// Initialize the hasher.
blake3_hasher hasher;
blake3_hasher_init(&hasher);
// Hash the mapped file using multiple threads.
blake3_hasher_update_tbb(&hasher, mapped, statbuf.st_size);
// Unmap and close the file.
if (munmap(mapped, statbuf.st_size) == -1) {
fprintf(stderr, "munmap failed: %s\n", strerror(errno));
return 1;
}
if (close(fd) == -1) {
fprintf(stderr, "close failed: %s\n", strerror(errno));
return 1;
}
// Finalize the hash. BLAKE3_OUT_LEN is the default output length, 32 bytes.
uint8_t output[BLAKE3_OUT_LEN];
blake3_hasher_finalize(&hasher, output, BLAKE3_OUT_LEN);
// Print the hash as hexadecimal.
for (size_t i = 0; i < BLAKE3_OUT_LEN; i++) {
printf("%02x", output[i]);
}
printf("\n");
}
}

12
external/blake3/libblake3.pc.in vendored Normal file
View File

@@ -0,0 +1,12 @@
prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
libdir="@PKG_CONFIG_INSTALL_LIBDIR@"
includedir="@PKG_CONFIG_INSTALL_INCLUDEDIR@"
Name: @PROJECT_NAME@
Description: @PROJECT_DESCRIPTION@
Version: @PROJECT_VERSION@
Requires: @PKG_CONFIG_REQUIRES@
Libs: -L"${libdir}" -lblake3 @PKG_CONFIG_LIBS@
Cflags: -I"${includedir}" @PKG_CONFIG_CFLAGS@

166
external/blake3/main.c vendored Normal file
View File

@@ -0,0 +1,166 @@
/*
* This main file is intended for testing via `make test`. It does not build in
* other settings. See README.md in this directory for examples of how to build
* C code.
*/
#include <assert.h>
#include <errno.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include "blake3.h"
#include "blake3_impl.h"
#define HASH_MODE 0
#define KEYED_HASH_MODE 1
#define DERIVE_KEY_MODE 2
static void hex_char_value(uint8_t c, uint8_t *value, bool *valid) {
if ('0' <= c && c <= '9') {
*value = c - '0';
*valid = true;
} else if ('a' <= c && c <= 'f') {
*value = 10 + c - 'a';
*valid = true;
} else {
*valid = false;
}
}
static int parse_key(char *hex_key, uint8_t out[BLAKE3_KEY_LEN]) {
size_t hex_len = strlen(hex_key);
if (hex_len != 64) {
fprintf(stderr, "Expected a 64-char hexadecimal key, got %zu chars.\n",
hex_len);
return 1;
}
for (size_t i = 0; i < 64; i++) {
uint8_t value;
bool valid;
hex_char_value(hex_key[i], &value, &valid);
if (!valid) {
fprintf(stderr, "Invalid hex char.\n");
return 1;
}
if (i % 2 == 0) {
out[i / 2] = 0;
value <<= 4;
}
out[i / 2] += value;
}
return 0;
}
/* A little repetition here */
enum cpu_feature {
SSE2 = 1 << 0,
SSSE3 = 1 << 1,
SSE41 = 1 << 2,
AVX = 1 << 3,
AVX2 = 1 << 4,
AVX512F = 1 << 5,
AVX512VL = 1 << 6,
/* ... */
UNDEFINED = 1 << 30
};
extern enum cpu_feature g_cpu_features;
enum cpu_feature get_cpu_features(void);
int main(int argc, char **argv) {
size_t out_len = BLAKE3_OUT_LEN;
uint8_t key[BLAKE3_KEY_LEN];
char *context = "";
uint8_t mode = HASH_MODE;
while (argc > 1) {
if (argc <= 2) {
fprintf(stderr, "Odd number of arguments.\n");
return 1;
}
if (strcmp("--length", argv[1]) == 0) {
char *endptr = NULL;
errno = 0;
unsigned long long out_len_ll = strtoull(argv[2], &endptr, 10);
if (errno != 0 || out_len_ll > SIZE_MAX || endptr == argv[2] ||
*endptr != 0) {
fprintf(stderr, "Bad length argument.\n");
return 1;
}
out_len = (size_t)out_len_ll;
} else if (strcmp("--keyed", argv[1]) == 0) {
mode = KEYED_HASH_MODE;
int ret = parse_key(argv[2], key);
if (ret != 0) {
return ret;
}
} else if (strcmp("--derive-key", argv[1]) == 0) {
mode = DERIVE_KEY_MODE;
context = argv[2];
} else {
fprintf(stderr, "Unknown flag.\n");
return 1;
}
argc -= 2;
argv += 2;
}
/*
* We're going to hash the input multiple times, so we need to buffer it all.
* This is just for test cases, so go ahead and assume that the input is less
* than 1 MiB.
*/
size_t buf_capacity = 1 << 20;
uint8_t *buf = malloc(buf_capacity);
assert(buf != NULL);
size_t buf_len = 0;
while (1) {
size_t n = fread(&buf[buf_len], 1, buf_capacity - buf_len, stdin);
if (n == 0) {
break;
}
buf_len += n;
assert(buf_len < buf_capacity);
}
const int mask = get_cpu_features();
int feature = 0;
do {
fprintf(stderr, "Testing 0x%08X\n", feature);
g_cpu_features = feature;
blake3_hasher hasher;
switch (mode) {
case HASH_MODE:
blake3_hasher_init(&hasher);
break;
case KEYED_HASH_MODE:
blake3_hasher_init_keyed(&hasher, key);
break;
case DERIVE_KEY_MODE:
blake3_hasher_init_derive_key(&hasher, context);
break;
default:
abort();
}
blake3_hasher_update(&hasher, buf, buf_len);
/* TODO: An incremental output reader API to avoid this allocation. */
uint8_t *out = malloc(out_len);
if (out_len > 0 && out == NULL) {
fprintf(stderr, "malloc() failed.\n");
return 1;
}
blake3_hasher_finalize(&hasher, out, out_len);
for (size_t i = 0; i < out_len; i++) {
printf("%02x", out[i]);
}
printf("\n");
free(out);
feature = (feature - mask) & mask;
} while (feature != 0);
free(buf);
return 0;
}

97
external/blake3/test.py vendored Executable file
View File

@@ -0,0 +1,97 @@
#! /usr/bin/env python3
from binascii import hexlify
import json
from os import path
import subprocess
HERE = path.dirname(__file__)
TEST_VECTORS_PATH = path.join(HERE, "..", "test_vectors", "test_vectors.json")
TEST_VECTORS = json.load(open(TEST_VECTORS_PATH))
def run_blake3(args, input):
output = subprocess.run([path.join(HERE, "blake3")] + args,
input=input,
stdout=subprocess.PIPE,
check=True)
return output.stdout.decode().strip()
# Fill the input with a repeating byte pattern. We use a cycle length of 251,
# because that's the largest prime number less than 256. This makes it unlikely
# to swapping any two adjacent input blocks or chunks will give the same
# answer.
def make_test_input(length):
i = 0
buf = bytearray()
while len(buf) < length:
buf.append(i)
i = (i + 1) % 251
return buf
def main():
for case in TEST_VECTORS["cases"]:
input_len = case["input_len"]
input = make_test_input(input_len)
hex_key = hexlify(TEST_VECTORS["key"].encode())
context_string = TEST_VECTORS["context_string"]
expected_hash_xof = case["hash"]
expected_hash = expected_hash_xof[:64]
expected_keyed_hash_xof = case["keyed_hash"]
expected_keyed_hash = expected_keyed_hash_xof[:64]
expected_derive_key_xof = case["derive_key"]
expected_derive_key = expected_derive_key_xof[:64]
# Test the default hash.
test_hash = run_blake3([], input)
for line in test_hash.splitlines():
assert expected_hash == line, \
"hash({}): {} != {}".format(input_len, expected_hash, line)
# Test the extended hash.
xof_len = len(expected_hash_xof) // 2
test_hash_xof = run_blake3(["--length", str(xof_len)], input)
for line in test_hash_xof.splitlines():
assert expected_hash_xof == line, \
"hash_xof({}): {} != {}".format(
input_len, expected_hash_xof, line)
# Test the default keyed hash.
test_keyed_hash = run_blake3(["--keyed", hex_key], input)
for line in test_keyed_hash.splitlines():
assert expected_keyed_hash == line, \
"keyed_hash({}): {} != {}".format(
input_len, expected_keyed_hash, line)
# Test the extended keyed hash.
xof_len = len(expected_keyed_hash_xof) // 2
test_keyed_hash_xof = run_blake3(
["--keyed", hex_key, "--length",
str(xof_len)], input)
for line in test_keyed_hash_xof.splitlines():
assert expected_keyed_hash_xof == line, \
"keyed_hash_xof({}): {} != {}".format(
input_len, expected_keyed_hash_xof, line)
# Test the default derive key.
test_derive_key = run_blake3(["--derive-key", context_string], input)
for line in test_derive_key.splitlines():
assert expected_derive_key == line, \
"derive_key({}): {} != {}".format(
input_len, expected_derive_key, line)
# Test the extended derive key.
xof_len = len(expected_derive_key_xof) // 2
test_derive_key_xof = run_blake3(
["--derive-key", context_string, "--length",
str(xof_len)], input)
for line in test_derive_key_xof.splitlines():
assert expected_derive_key_xof == line, \
"derive_key_xof({}): {} != {}".format(
input_len, expected_derive_key_xof, line)
if __name__ == "__main__":
main()

View File

@@ -1,27 +1,12 @@
sources:
"6.29.5":
url: "https://github.com/facebook/rocksdb/archive/refs/tags/v6.29.5.tar.gz"
sha256: "ddbf84791f0980c0bbce3902feb93a2c7006f6f53bfd798926143e31d4d756f0"
"6.27.3":
url: "https://github.com/facebook/rocksdb/archive/refs/tags/v6.27.3.tar.gz"
sha256: "ee29901749b9132692b26f0a6c1d693f47d1a9ed8e3771e60556afe80282bf58"
"6.20.3":
url: "https://github.com/facebook/rocksdb/archive/refs/tags/v6.20.3.tar.gz"
sha256: "c6502c7aae641b7e20fafa6c2b92273d935d2b7b2707135ebd9a67b092169dca"
"8.8.1":
url: "https://github.com/facebook/rocksdb/archive/refs/tags/v8.8.1.tar.gz"
sha256: "056c7e21ad8ae36b026ac3b94b9d6e0fcc60e1d937fc80330921e4181be5c36e"
"9.7.3":
url: "https://github.com/facebook/rocksdb/archive/refs/tags/v9.7.3.tar.gz"
sha256: "acfabb989cbfb5b5c4d23214819b059638193ec33dad2d88373c46448d16d38b"
patches:
"6.29.5":
- patch_file: "patches/6.29.5-0001-add-include-cstdint-for-gcc-13.patch"
patch_description: "Fix build with gcc 13 by including cstdint"
patch_type: "portability"
patch_source: "https://github.com/facebook/rocksdb/pull/11118"
- patch_file: "patches/6.29.5-0002-exclude-thirdparty.patch"
"9.7.3":
- patch_file: "patches/9.x.x-0001-exclude-thirdparty.patch"
patch_description: "Do not include thirdparty.inc"
patch_type: "portability"
"6.27.3":
- patch_file: "patches/6.27.3-0001-add-include-cstdint-for-gcc-13.patch"
patch_description: "Fix build with gcc 13 by including cstdint"
- patch_file: "patches/9.7.3-0001-memory-leak.patch"
patch_description: "Fix a leak of obsolete blob files left open until DB::Close()"
patch_type: "portability"
patch_source: "https://github.com/facebook/rocksdb/pull/11118"

View File

@@ -15,10 +15,10 @@ required_conan_version = ">=1.53.0"
class RocksDBConan(ConanFile):
name = "rocksdb"
homepage = "https://github.com/facebook/rocksdb"
description = "A library that provides an embeddable, persistent key-value store for fast storage"
license = ("GPL-2.0-only", "Apache-2.0")
url = "https://github.com/conan-io/conan-center-index"
description = "A library that provides an embeddable, persistent key-value store for fast storage"
homepage = "https://github.com/facebook/rocksdb"
topics = ("database", "leveldb", "facebook", "key-value")
package_type = "library"
settings = "os", "arch", "compiler", "build_type"
@@ -58,12 +58,12 @@ class RocksDBConan(ConanFile):
@property
def _compilers_minimum_version(self):
return {} if self._min_cppstd == "11" else {
"apple-clang": "10",
"clang": "7",
"gcc": "7",
"msvc": "191",
"Visual Studio": "15",
}
"apple-clang": "10",
"clang": "7",
"gcc": "7",
"msvc": "191",
"Visual Studio": "15",
}
def export_sources(self):
export_conandata_patches(self)
@@ -115,9 +115,9 @@ class RocksDBConan(ConanFile):
check_min_vs(self, "191")
if self.version == "6.20.3" and \
self.settings.os == "Linux" and \
self.settings.compiler == "gcc" and \
Version(self.settings.compiler.version) < "5":
self.settings.os == "Linux" and \
self.settings.compiler == "gcc" and \
Version(self.settings.compiler.version) < "5":
raise ConanInvalidConfiguration("Rocksdb 6.20.3 is not compilable with gcc <5.") # See https://github.com/facebook/rocksdb/issues/3522
def source(self):
@@ -163,6 +163,8 @@ class RocksDBConan(ConanFile):
if self.options.with_jemalloc:
deps.set_property("jemalloc", "cmake_file_name", "JeMalloc")
deps.set_property("jemalloc", "cmake_target_name", "JeMalloc::JeMalloc")
if self.options.with_zstd:
deps.set_property("zstd", "cmake_target_name", "zstd::zstd")
deps.generate()
def build(self):

View File

@@ -1,30 +0,0 @@
--- a/include/rocksdb/utilities/checkpoint.h
+++ b/include/rocksdb/utilities/checkpoint.h
@@ -8,6 +8,7 @@
#pragma once
#ifndef ROCKSDB_LITE
+#include <cstdint>
#include <string>
#include <vector>
#include "rocksdb/status.h"
--- a/table/block_based/data_block_hash_index.h
+++ b/table/block_based/data_block_hash_index.h
@@ -5,6 +5,7 @@
#pragma once
+#include <cstdint>
#include <string>
#include <vector>
--- a/util/string_util.h
+++ b/util/string_util.h
@@ -6,6 +6,7 @@
#pragma once
+#include <cstdint>
#include <sstream>
#include <string>
#include <unordered_map>

View File

@@ -1,16 +0,0 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index ec59d4491..35577c998 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -101 +100,0 @@ if(MSVC)
- option(WITH_GFLAGS "build with GFlags" OFF)
@@ -103,2 +102,2 @@ if(MSVC)
- include(${CMAKE_CURRENT_SOURCE_DIR}/thirdparty.inc)
-else()
+endif()
+
@@ -117 +116 @@ else()
- if(MINGW)
+ if(MINGW OR MSVC)
@@ -183 +181,0 @@ else()
-endif()

View File

@@ -0,0 +1,319 @@
diff --git a/HISTORY.md b/HISTORY.md
index 36d472229..05ad1a202 100644
--- a/HISTORY.md
+++ b/HISTORY.md
@@ -1,6 +1,10 @@
# Rocksdb Change Log
> NOTE: Entries for next release do not go here. Follow instructions in `unreleased_history/README.txt`
+## 9.7.4 (10/31/2024)
+### Bug Fixes
+* Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
+
## 9.7.3 (10/16/2024)
### Behavior Changes
* OPTIONS file to be loaded by remote worker is now preserved so that it does not get purged by the primary host. A similar technique as how we are preserving new SST files from getting purged is used for this. min_options_file_numbers_ is tracked like pending_outputs_ is tracked.
diff --git a/db/blob/blob_file_cache.cc b/db/blob/blob_file_cache.cc
index 5f340aadf..1b9faa238 100644
--- a/db/blob/blob_file_cache.cc
+++ b/db/blob/blob_file_cache.cc
@@ -42,6 +42,7 @@ Status BlobFileCache::GetBlobFileReader(
assert(blob_file_reader);
assert(blob_file_reader->IsEmpty());
+ // NOTE: sharing same Cache with table_cache
const Slice key = GetSliceForKey(&blob_file_number);
assert(cache_);
@@ -98,4 +99,13 @@ Status BlobFileCache::GetBlobFileReader(
return Status::OK();
}
+void BlobFileCache::Evict(uint64_t blob_file_number) {
+ // NOTE: sharing same Cache with table_cache
+ const Slice key = GetSliceForKey(&blob_file_number);
+
+ assert(cache_);
+
+ cache_.get()->Erase(key);
+}
+
} // namespace ROCKSDB_NAMESPACE
diff --git a/db/blob/blob_file_cache.h b/db/blob/blob_file_cache.h
index 740e67ada..6858d012b 100644
--- a/db/blob/blob_file_cache.h
+++ b/db/blob/blob_file_cache.h
@@ -36,6 +36,15 @@ class BlobFileCache {
uint64_t blob_file_number,
CacheHandleGuard<BlobFileReader>* blob_file_reader);
+ // Called when a blob file is obsolete to ensure it is removed from the cache
+ // to avoid effectively leaking the open file and assicated memory
+ void Evict(uint64_t blob_file_number);
+
+ // Used to identify cache entries for blob files (not normally useful)
+ static const Cache::CacheItemHelper* GetHelper() {
+ return CacheInterface::GetBasicHelper();
+ }
+
private:
using CacheInterface =
BasicTypedCacheInterface<BlobFileReader, CacheEntryRole::kMisc>;
diff --git a/db/column_family.h b/db/column_family.h
index e4b7adde8..86637736a 100644
--- a/db/column_family.h
+++ b/db/column_family.h
@@ -401,6 +401,7 @@ class ColumnFamilyData {
SequenceNumber earliest_seq);
TableCache* table_cache() const { return table_cache_.get(); }
+ BlobFileCache* blob_file_cache() const { return blob_file_cache_.get(); }
BlobSource* blob_source() const { return blob_source_.get(); }
// See documentation in compaction_picker.h
diff --git a/db/db_impl/db_impl.cc b/db/db_impl/db_impl.cc
index 261593423..06573ac2e 100644
--- a/db/db_impl/db_impl.cc
+++ b/db/db_impl/db_impl.cc
@@ -659,8 +659,9 @@ Status DBImpl::CloseHelper() {
// We need to release them before the block cache is destroyed. The block
// cache may be destroyed inside versions_.reset(), when column family data
// list is destroyed, so leaving handles in table cache after
- // versions_.reset() may cause issues.
- // Here we clean all unreferenced handles in table cache.
+ // versions_.reset() may cause issues. Here we clean all unreferenced handles
+ // in table cache, and (for certain builds/conditions) assert that no obsolete
+ // files are hanging around unreferenced (leak) in the table/blob file cache.
// Now we assume all user queries have finished, so only version set itself
// can possibly hold the blocks from block cache. After releasing unreferenced
// handles here, only handles held by version set left and inside
@@ -668,6 +669,9 @@ Status DBImpl::CloseHelper() {
// time a handle is released, we erase it from the cache too. By doing that,
// we can guarantee that after versions_.reset(), table cache is empty
// so the cache can be safely destroyed.
+#ifndef NDEBUG
+ TEST_VerifyNoObsoleteFilesCached(/*db_mutex_already_held=*/true);
+#endif // !NDEBUG
table_cache_->EraseUnRefEntries();
for (auto& txn_entry : recovered_transactions_) {
@@ -3227,6 +3231,8 @@ Status DBImpl::MultiGetImpl(
s = Status::Aborted();
break;
}
+ // This could be a long-running operation
+ ROCKSDB_THREAD_YIELD_HOOK();
}
// Post processing (decrement reference counts and record statistics)
diff --git a/db/db_impl/db_impl.h b/db/db_impl/db_impl.h
index 5e4fa310b..ccc0abfa7 100644
--- a/db/db_impl/db_impl.h
+++ b/db/db_impl/db_impl.h
@@ -1241,9 +1241,14 @@ class DBImpl : public DB {
static Status TEST_ValidateOptions(const DBOptions& db_options) {
return ValidateOptions(db_options);
}
-
#endif // NDEBUG
+ // In certain configurations, verify that the table/blob file cache only
+ // contains entries for live files, to check for effective leaks of open
+ // files. This can only be called when purging of obsolete files has
+ // "settled," such as during parts of DB Close().
+ void TEST_VerifyNoObsoleteFilesCached(bool db_mutex_already_held) const;
+
// persist stats to column family "_persistent_stats"
void PersistStats();
diff --git a/db/db_impl/db_impl_debug.cc b/db/db_impl/db_impl_debug.cc
index 790a50d7a..67f5b4aaf 100644
--- a/db/db_impl/db_impl_debug.cc
+++ b/db/db_impl/db_impl_debug.cc
@@ -9,6 +9,7 @@
#ifndef NDEBUG
+#include "db/blob/blob_file_cache.h"
#include "db/column_family.h"
#include "db/db_impl/db_impl.h"
#include "db/error_handler.h"
@@ -328,5 +329,49 @@ size_t DBImpl::TEST_EstimateInMemoryStatsHistorySize() const {
InstrumentedMutexLock l(&const_cast<DBImpl*>(this)->stats_history_mutex_);
return EstimateInMemoryStatsHistorySize();
}
+
+void DBImpl::TEST_VerifyNoObsoleteFilesCached(
+ bool db_mutex_already_held) const {
+ // This check is somewhat expensive and obscure to make a part of every
+ // unit test in every build variety. Thus, we only enable it for ASAN builds.
+ if (!kMustFreeHeapAllocations) {
+ return;
+ }
+
+ std::optional<InstrumentedMutexLock> l;
+ if (db_mutex_already_held) {
+ mutex_.AssertHeld();
+ } else {
+ l.emplace(&mutex_);
+ }
+
+ std::vector<uint64_t> live_files;
+ for (auto cfd : *versions_->GetColumnFamilySet()) {
+ if (cfd->IsDropped()) {
+ continue;
+ }
+ // Sneakily add both SST and blob files to the same list
+ cfd->current()->AddLiveFiles(&live_files, &live_files);
+ }
+ std::sort(live_files.begin(), live_files.end());
+
+ auto fn = [&live_files](const Slice& key, Cache::ObjectPtr, size_t,
+ const Cache::CacheItemHelper* helper) {
+ if (helper != BlobFileCache::GetHelper()) {
+ // Skip non-blob files for now
+ // FIXME: diagnose and fix the leaks of obsolete SST files revealed in
+ // unit tests.
+ return;
+ }
+ // See TableCache and BlobFileCache
+ assert(key.size() == sizeof(uint64_t));
+ uint64_t file_number;
+ GetUnaligned(reinterpret_cast<const uint64_t*>(key.data()), &file_number);
+ // Assert file is in sorted live_files
+ assert(
+ std::binary_search(live_files.begin(), live_files.end(), file_number));
+ };
+ table_cache_->ApplyToAllEntries(fn, {});
+}
} // namespace ROCKSDB_NAMESPACE
#endif // NDEBUG
diff --git a/db/db_iter.cc b/db/db_iter.cc
index e02586377..bf4749eb9 100644
--- a/db/db_iter.cc
+++ b/db/db_iter.cc
@@ -540,6 +540,8 @@ bool DBIter::FindNextUserEntryInternal(bool skipping_saved_key,
} else {
iter_.Next();
}
+ // This could be a long-running operation due to tombstones, etc.
+ ROCKSDB_THREAD_YIELD_HOOK();
} while (iter_.Valid());
valid_ = false;
diff --git a/db/table_cache.cc b/db/table_cache.cc
index 71fc29c32..8a5be75e8 100644
--- a/db/table_cache.cc
+++ b/db/table_cache.cc
@@ -164,6 +164,7 @@ Status TableCache::GetTableReader(
}
Cache::Handle* TableCache::Lookup(Cache* cache, uint64_t file_number) {
+ // NOTE: sharing same Cache with BlobFileCache
Slice key = GetSliceForFileNumber(&file_number);
return cache->Lookup(key);
}
@@ -179,6 +180,7 @@ Status TableCache::FindTable(
size_t max_file_size_for_l0_meta_pin, Temperature file_temperature) {
PERF_TIMER_GUARD_WITH_CLOCK(find_table_nanos, ioptions_.clock);
uint64_t number = file_meta.fd.GetNumber();
+ // NOTE: sharing same Cache with BlobFileCache
Slice key = GetSliceForFileNumber(&number);
*handle = cache_.Lookup(key);
TEST_SYNC_POINT_CALLBACK("TableCache::FindTable:0",
diff --git a/db/version_builder.cc b/db/version_builder.cc
index ed8ab8214..c98f53f42 100644
--- a/db/version_builder.cc
+++ b/db/version_builder.cc
@@ -24,6 +24,7 @@
#include <vector>
#include "cache/cache_reservation_manager.h"
+#include "db/blob/blob_file_cache.h"
#include "db/blob/blob_file_meta.h"
#include "db/dbformat.h"
#include "db/internal_stats.h"
@@ -744,12 +745,9 @@ class VersionBuilder::Rep {
return Status::Corruption("VersionBuilder", oss.str());
}
- // Note: we use C++11 for now but in C++14, this could be done in a more
- // elegant way using generalized lambda capture.
- VersionSet* const vs = version_set_;
- const ImmutableCFOptions* const ioptions = ioptions_;
-
- auto deleter = [vs, ioptions](SharedBlobFileMetaData* shared_meta) {
+ auto deleter = [vs = version_set_, ioptions = ioptions_,
+ bc = cfd_ ? cfd_->blob_file_cache()
+ : nullptr](SharedBlobFileMetaData* shared_meta) {
if (vs) {
assert(ioptions);
assert(!ioptions->cf_paths.empty());
@@ -758,6 +756,9 @@ class VersionBuilder::Rep {
vs->AddObsoleteBlobFile(shared_meta->GetBlobFileNumber(),
ioptions->cf_paths.front().path);
}
+ if (bc) {
+ bc->Evict(shared_meta->GetBlobFileNumber());
+ }
delete shared_meta;
};
@@ -766,7 +767,7 @@ class VersionBuilder::Rep {
blob_file_number, blob_file_addition.GetTotalBlobCount(),
blob_file_addition.GetTotalBlobBytes(),
blob_file_addition.GetChecksumMethod(),
- blob_file_addition.GetChecksumValue(), deleter);
+ blob_file_addition.GetChecksumValue(), std::move(deleter));
mutable_blob_file_metas_.emplace(
blob_file_number, MutableBlobFileMetaData(std::move(shared_meta)));
diff --git a/db/version_set.h b/db/version_set.h
index 9336782b1..024f869e7 100644
--- a/db/version_set.h
+++ b/db/version_set.h
@@ -1514,7 +1514,6 @@ class VersionSet {
void GetLiveFilesMetaData(std::vector<LiveFileMetaData>* metadata);
void AddObsoleteBlobFile(uint64_t blob_file_number, std::string path) {
- // TODO: Erase file from BlobFileCache?
obsolete_blob_files_.emplace_back(blob_file_number, std::move(path));
}
diff --git a/include/rocksdb/version.h b/include/rocksdb/version.h
index 2a19796b8..0afa2cab1 100644
--- a/include/rocksdb/version.h
+++ b/include/rocksdb/version.h
@@ -13,7 +13,7 @@
// minor or major version number planned for release.
#define ROCKSDB_MAJOR 9
#define ROCKSDB_MINOR 7
-#define ROCKSDB_PATCH 3
+#define ROCKSDB_PATCH 4
// Do not use these. We made the mistake of declaring macros starting with
// double underscore. Now we have to live with our choice. We'll deprecate these
diff --git a/port/port.h b/port/port.h
index 13aa56d47..141716e5b 100644
--- a/port/port.h
+++ b/port/port.h
@@ -19,3 +19,19 @@
#elif defined(OS_WIN)
#include "port/win/port_win.h"
#endif
+
+#ifdef OS_LINUX
+// A temporary hook into long-running RocksDB threads to support modifying their
+// priority etc. This should become a public API hook once the requirements
+// are better understood.
+extern "C" void RocksDbThreadYield() __attribute__((__weak__));
+#define ROCKSDB_THREAD_YIELD_HOOK() \
+ { \
+ if (RocksDbThreadYield) { \
+ RocksDbThreadYield(); \
+ } \
+ }
+#else
+#define ROCKSDB_THREAD_YIELD_HOOK() \
+ {}
+#endif

View File

@@ -0,0 +1,30 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 93b884d..b715cb6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -106,14 +106,9 @@ endif()
include(CMakeDependentOption)
if(MSVC)
- option(WITH_GFLAGS "build with GFlags" OFF)
option(WITH_XPRESS "build with windows built in compression" OFF)
- option(ROCKSDB_SKIP_THIRDPARTY "skip thirdparty.inc" OFF)
-
- if(NOT ROCKSDB_SKIP_THIRDPARTY)
- include(${CMAKE_CURRENT_SOURCE_DIR}/thirdparty.inc)
- endif()
-else()
+endif()
+if(TRUE)
if(CMAKE_SYSTEM_NAME MATCHES "FreeBSD" AND NOT CMAKE_SYSTEM_NAME MATCHES "kFreeBSD")
# FreeBSD has jemalloc as default malloc
# but it does not have all the jemalloc files in include/...
@@ -126,7 +121,7 @@ else()
endif()
endif()
- if(MINGW)
+ if(MSVC OR MINGW)
option(WITH_GFLAGS "build with GFlags" OFF)
else()
option(WITH_GFLAGS "build with GFlags" ON)

View File

@@ -27,7 +27,6 @@
#include <algorithm>
#include <optional>
#include <ostream>
#include <string>
#include <unordered_map>
#include <vector>
@@ -368,7 +367,7 @@ get(Section const& section,
}
inline std::string
get(Section const& section, std::string const& name, const char* defaultValue)
get(Section const& section, std::string const& name, char const* defaultValue)
{
try
{

View File

@@ -22,10 +22,9 @@
#include <xrpl/basics/Slice.h>
#include <xrpl/beast/utility/instrumentation.h>
#include <cstdint>
#include <cstring>
#include <memory>
#include <utility>
namespace ripple {

View File

@@ -21,9 +21,11 @@
#define RIPPLED_COMPRESSIONALGORITHMS_H_INCLUDED
#include <xrpl/basics/contract.h>
#include <lz4.h>
#include <algorithm>
#include <cstdint>
#include <lz4.h>
#include <stdexcept>
#include <vector>
@@ -53,7 +55,7 @@ lz4Compress(void const* in, std::size_t inSize, BufferFactory&& bf)
auto compressed = bf(outCapacity);
auto compressedSize = LZ4_compress_default(
reinterpret_cast<const char*>(in),
reinterpret_cast<char const*>(in),
reinterpret_cast<char*>(compressed),
inSize,
outCapacity);
@@ -87,7 +89,7 @@ lz4Decompress(
Throw<std::runtime_error>("lz4Decompress: integer overflow (output)");
if (LZ4_decompress_safe(
reinterpret_cast<const char*>(in),
reinterpret_cast<char const*>(in),
reinterpret_cast<char*>(decompressed),
inSize,
decompressedSize) != decompressedSize)

View File

@@ -21,6 +21,7 @@
#define RIPPLE_BASICS_COUNTEDOBJECT_H_INCLUDED
#include <xrpl/beast/type_name.h>
#include <atomic>
#include <string>
#include <utility>

View File

@@ -24,9 +24,7 @@
#include <boost/outcome.hpp>
#include <concepts>
#include <stdexcept>
#include <type_traits>
namespace ripple {
@@ -95,7 +93,7 @@ public:
{
}
constexpr const E&
constexpr E const&
value() const&
{
return val_;
@@ -113,7 +111,7 @@ public:
return std::move(val_);
}
constexpr const E&&
constexpr E const&&
value() const&&
{
return std::move(val_);

View File

@@ -0,0 +1,515 @@
//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2023 Ripple Labs Inc.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
//==============================================================================
#ifndef RIPPLE_BASICS_INTRUSIVEPOINTER_H_INCLUDED
#define RIPPLE_BASICS_INTRUSIVEPOINTER_H_INCLUDED
#include <concepts>
#include <cstdint>
#include <type_traits>
#include <utility>
namespace ripple {
//------------------------------------------------------------------------------
/** Tag to create an intrusive pointer from another intrusive pointer by using a
static cast. This is useful to create an intrusive pointer to a derived
class from an intrusive pointer to a base class.
*/
struct StaticCastTagSharedIntrusive
{
};
/** Tag to create an intrusive pointer from another intrusive pointer by using a
dynamic cast. This is useful to create an intrusive pointer to a derived
class from an intrusive pointer to a base class. If the cast fails an empty
(null) intrusive pointer is created.
*/
struct DynamicCastTagSharedIntrusive
{
};
/** When creating or adopting a raw pointer, controls whether the strong count
is incremented or not. Use this tag to increment the strong count.
*/
struct SharedIntrusiveAdoptIncrementStrongTag
{
};
/** When creating or adopting a raw pointer, controls whether the strong count
is incremented or not. Use this tag to leave the strong count unchanged.
*/
struct SharedIntrusiveAdoptNoIncrementTag
{
};
//------------------------------------------------------------------------------
//
template <class T>
concept CAdoptTag = std::is_same_v<T, SharedIntrusiveAdoptIncrementStrongTag> ||
std::is_same_v<T, SharedIntrusiveAdoptNoIncrementTag>;
//------------------------------------------------------------------------------
/** A shared intrusive pointer class that supports weak pointers.
This is meant to be used for SHAMapInnerNodes, but may be useful for other
cases. Since the reference counts are stored on the pointee, the pointee is
not destroyed until both the strong _and_ weak pointer counts go to zero.
When the strong pointer count goes to zero, the "partialDestructor" is
called. This can be used to destroy as much of the object as possible while
still retaining the reference counts. For example, for SHAMapInnerNodes the
children may be reset in that function. Note that std::shared_poiner WILL
run the destructor when the strong count reaches zero, but may not free the
memory used by the object until the weak count reaches zero. In rippled, we
typically allocate shared pointers with the `make_shared` function. When
that is used, the memory is not reclaimed until the weak count reaches zero.
*/
template <class T>
class SharedIntrusive
{
public:
SharedIntrusive() = default;
template <CAdoptTag TAdoptTag>
SharedIntrusive(T* p, TAdoptTag) noexcept;
SharedIntrusive(SharedIntrusive const& rhs);
template <class TT>
// TODO: convertible_to isn't quite right. That include a static castable.
// Find the right concept.
requires std::convertible_to<TT*, T*>
SharedIntrusive(SharedIntrusive<TT> const& rhs);
SharedIntrusive(SharedIntrusive&& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive(SharedIntrusive<TT>&& rhs);
SharedIntrusive&
operator=(SharedIntrusive const& rhs);
bool
operator!=(std::nullptr_t) const;
bool
operator==(std::nullptr_t) const;
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive&
operator=(SharedIntrusive<TT> const& rhs);
SharedIntrusive&
operator=(SharedIntrusive&& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive&
operator=(SharedIntrusive<TT>&& rhs);
/** Adopt the raw pointer. The strong reference may or may not be
incremented, depending on the TAdoptTag
*/
template <CAdoptTag TAdoptTag = SharedIntrusiveAdoptIncrementStrongTag>
void
adopt(T* p);
~SharedIntrusive();
/** Create a new SharedIntrusive by statically casting the pointer
controlled by the rhs param.
*/
template <class TT>
SharedIntrusive(
StaticCastTagSharedIntrusive,
SharedIntrusive<TT> const& rhs);
/** Create a new SharedIntrusive by statically casting the pointer
controlled by the rhs param.
*/
template <class TT>
SharedIntrusive(StaticCastTagSharedIntrusive, SharedIntrusive<TT>&& rhs);
/** Create a new SharedIntrusive by dynamically casting the pointer
controlled by the rhs param.
*/
template <class TT>
SharedIntrusive(
DynamicCastTagSharedIntrusive,
SharedIntrusive<TT> const& rhs);
/** Create a new SharedIntrusive by dynamically casting the pointer
controlled by the rhs param.
*/
template <class TT>
SharedIntrusive(DynamicCastTagSharedIntrusive, SharedIntrusive<TT>&& rhs);
T&
operator*() const noexcept;
T*
operator->() const noexcept;
explicit
operator bool() const noexcept;
/** Set the pointer to null, decrement the strong count, and run the
appropriate release action.
*/
void
reset();
/** Get the raw pointer */
T*
get() const;
/** Return the strong count */
std::size_t
use_count() const;
template <class TT, class... Args>
friend SharedIntrusive<TT>
make_SharedIntrusive(Args&&... args);
template <class TT>
friend class SharedIntrusive;
template <class TT>
friend class SharedWeakUnion;
template <class TT>
friend class WeakIntrusive;
private:
/** Return the raw pointer held by this object. */
T*
unsafeGetRawPtr() const;
/** Exchange the current raw pointer held by this object with the given
pointer. Decrement the strong count of the raw pointer previously held
by this object and run the appropriate release action.
*/
void
unsafeReleaseAndStore(T* next);
/** Set the raw pointer directly. This is wrapped in a function so the class
can support both atomic and non-atomic pointers in a future patch.
*/
void
unsafeSetRawPtr(T* p);
/** Exchange the raw pointer directly.
This sets the raw pointer to the given value and returns the previous
value. This is wrapped in a function so the class can support both
atomic and non-atomic pointers in a future patch.
*/
T*
unsafeExchange(T* p);
/** pointer to the type with an intrusive count */
T* ptr_{nullptr};
};
//------------------------------------------------------------------------------
/** A weak intrusive pointer class for the SharedIntrusive pointer class.
Note that this weak pointer class asks differently from normal weak pointer
classes. When the strong pointer count goes to zero, the "partialDestructor"
is called. See the comment on SharedIntrusive for a fuller explanation.
*/
template <class T>
class WeakIntrusive
{
public:
WeakIntrusive() = default;
WeakIntrusive(WeakIntrusive const& rhs);
WeakIntrusive(WeakIntrusive&& rhs);
WeakIntrusive(SharedIntrusive<T> const& rhs);
// There is no move constructor from a strong intrusive ptr because
// moving would be move expensive than copying in this case (the strong
// ref would need to be decremented)
WeakIntrusive(SharedIntrusive<T> const&& rhs) = delete;
// Since there are no current use cases for copy assignment in
// WeakIntrusive, we delete this operator to simplify the implementation. If
// a need arises in the future, we can reintroduce it with proper
// consideration."
WeakIntrusive&
operator=(WeakIntrusive const&) = delete;
template <class TT>
requires std::convertible_to<TT*, T*>
WeakIntrusive&
operator=(SharedIntrusive<TT> const& rhs);
/** Adopt the raw pointer and increment the weak count. */
void
adopt(T* ptr);
~WeakIntrusive();
/** Get a strong pointer from the weak pointer, if possible. This will
only return a seated pointer if the strong count on the raw pointer
is non-zero before locking.
*/
SharedIntrusive<T>
lock() const;
/** Return true if the strong count is zero. */
bool
expired() const;
/** Set the pointer to null and decrement the weak count.
Note: This may run the destructor if the strong count is zero.
*/
void
reset();
private:
T* ptr_ = nullptr;
/** Decrement the weak count. This does _not_ set the raw pointer to
null.
Note: This may run the destructor if the strong count is zero.
*/
void
unsafeReleaseNoStore();
};
//------------------------------------------------------------------------------
/** A combination of a strong and a weak intrusive pointer stored in the
space of a single pointer.
This class is similar to a `std::variant<SharedIntrusive,WeakIntrusive>`
with some optimizations. In particular, it uses a low-order bit to
determine if the raw pointer represents a strong pointer or a weak
pointer. It can also be quickly switched between its strong pointer and
weak pointer representations. This class is useful for storing intrusive
pointers in tagged caches.
*/
template <class T>
class SharedWeakUnion
{
// Tagged pointer. Low bit determines if this is a strong or a weak
// pointer. The low bit must be masked to zero when converting back to a
// pointer. If the low bit is '1', this is a weak pointer.
static_assert(
alignof(T) >= 2,
"Bad alignment: Combo pointer requires low bit to be zero");
public:
SharedWeakUnion() = default;
SharedWeakUnion(SharedWeakUnion const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion(SharedIntrusive<TT> const& rhs);
SharedWeakUnion(SharedWeakUnion&& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion(SharedIntrusive<TT>&& rhs);
SharedWeakUnion&
operator=(SharedWeakUnion const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion&
operator=(SharedIntrusive<TT> const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion&
operator=(SharedIntrusive<TT>&& rhs);
~SharedWeakUnion();
/** Return a strong pointer if this is already a strong pointer (i.e.
don't lock the weak pointer. Use the `lock` method if that's what's
needed)
*/
SharedIntrusive<T>
getStrong() const;
/** Return true if this is a strong pointer and the strong pointer is
seated.
*/
explicit
operator bool() const noexcept;
/** Set the pointer to null, decrement the appropriate ref count, and
run the appropriate release action.
*/
void
reset();
/** If this is a strong pointer, return the raw pointer. Otherwise
return null.
*/
T*
get() const;
/** If this is a strong pointer, return the strong count. Otherwise
* return 0
*/
std::size_t
use_count() const;
/** Return true if there is a non-zero strong count. */
bool
expired() const;
/** If this is a strong pointer, return the strong pointer. Otherwise
attempt to lock the weak pointer.
*/
SharedIntrusive<T>
lock() const;
/** Return true is this represents a strong pointer. */
bool
isStrong() const;
/** Return true is this represents a weak pointer. */
bool
isWeak() const;
/** If this is a weak pointer, attempt to convert it to a strong
pointer.
@return true if successfully converted to a strong pointer (or was
already a strong pointer). Otherwise false.
*/
bool
convertToStrong();
/** If this is a strong pointer, attempt to convert it to a weak
pointer.
@return false if the pointer is null. Otherwise return true.
*/
bool
convertToWeak();
private:
// Tagged pointer. Low bit determines if this is a strong or a weak
// pointer. The low bit must be masked to zero when converting back to a
// pointer. If the low bit is '1', this is a weak pointer.
std::uintptr_t tp_{0};
static constexpr std::uintptr_t tagMask = 1;
static constexpr std::uintptr_t ptrMask = ~tagMask;
private:
/** Return the raw pointer held by this object.
*/
T*
unsafeGetRawPtr() const;
enum class RefStrength { strong, weak };
/** Set the raw pointer and tag bit directly.
*/
void
unsafeSetRawPtr(T* p, RefStrength rs);
/** Set the raw pointer and tag bit to all zeros (strong null pointer).
*/
void unsafeSetRawPtr(std::nullptr_t);
/** Decrement the appropriate ref count, and run the appropriate release
action. Note: this does _not_ set the raw pointer to null.
*/
void
unsafeReleaseNoStore();
};
//------------------------------------------------------------------------------
/** Create a shared intrusive pointer.
Note: unlike std::shared_ptr, where there is an advantage of allocating
the pointer and control block together, there is no benefit for intrusive
pointers.
*/
template <class TT, class... Args>
SharedIntrusive<TT>
make_SharedIntrusive(Args&&... args)
{
auto p = new TT(std::forward<Args>(args)...);
static_assert(
noexcept(SharedIntrusive<TT>(
std::declval<TT*>(),
std::declval<SharedIntrusiveAdoptNoIncrementTag>())),
"SharedIntrusive constructor should not throw or this can leak "
"memory");
return SharedIntrusive<TT>(p, SharedIntrusiveAdoptNoIncrementTag{});
}
//------------------------------------------------------------------------------
namespace intr_ptr {
template <class T>
using SharedPtr = SharedIntrusive<T>;
template <class T>
using WeakPtr = WeakIntrusive<T>;
template <class T>
using SharedWeakUnionPtr = SharedWeakUnion<T>;
template <class T, class... A>
SharedPtr<T>
make_shared(A&&... args)
{
return make_SharedIntrusive<T>(std::forward<A>(args)...);
}
template <class T, class TT>
SharedPtr<T>
static_pointer_cast(TT const& v)
{
return SharedPtr<T>(StaticCastTagSharedIntrusive{}, v);
}
template <class T, class TT>
SharedPtr<T>
dynamic_pointer_cast(TT const& v)
{
return SharedPtr<T>(DynamicCastTagSharedIntrusive{}, v);
}
} // namespace intr_ptr
} // namespace ripple
#endif

View File

@@ -0,0 +1,740 @@
//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2023 Ripple Labs Inc.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
//==============================================================================
#ifndef RIPPLE_BASICS_INTRUSIVEPOINTER_IPP_INCLUDED
#define RIPPLE_BASICS_INTRUSIVEPOINTER_IPP_INCLUDED
#include <xrpl/basics/IntrusivePointer.h>
#include <xrpl/basics/IntrusiveRefCounts.h>
#include <utility>
namespace ripple {
template <class T>
template <CAdoptTag TAdoptTag>
SharedIntrusive<T>::SharedIntrusive(T* p, TAdoptTag) noexcept : ptr_{p}
{
if constexpr (std::is_same_v<
TAdoptTag,
SharedIntrusiveAdoptIncrementStrongTag>)
{
if (p)
p->addStrongRef();
}
}
template <class T>
SharedIntrusive<T>::SharedIntrusive(SharedIntrusive const& rhs)
: ptr_{[&] {
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
return p;
}()}
{
}
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive<T>::SharedIntrusive(SharedIntrusive<TT> const& rhs)
: ptr_{[&] {
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
return p;
}()}
{
}
template <class T>
SharedIntrusive<T>::SharedIntrusive(SharedIntrusive&& rhs)
: ptr_{rhs.unsafeExchange(nullptr)}
{
}
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive<T>::SharedIntrusive(SharedIntrusive<TT>&& rhs)
: ptr_{rhs.unsafeExchange(nullptr)}
{
}
template <class T>
SharedIntrusive<T>&
SharedIntrusive<T>::operator=(SharedIntrusive const& rhs)
{
if (this == &rhs)
return *this;
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
unsafeReleaseAndStore(p);
return *this;
}
template <class T>
template <class TT>
// clang-format off
requires std::convertible_to<TT*, T*>
// clang-format on
SharedIntrusive<T>&
SharedIntrusive<T>::operator=(SharedIntrusive<TT> const& rhs)
{
if constexpr (std::is_same_v<T, TT>)
{
// This case should never be hit. The operator above will run instead.
// (The normal operator= is needed or it will be marked `deleted`)
if (this == &rhs)
return *this;
}
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
unsafeReleaseAndStore(p);
return *this;
}
template <class T>
SharedIntrusive<T>&
SharedIntrusive<T>::operator=(SharedIntrusive&& rhs)
{
if (this == &rhs)
return *this;
unsafeReleaseAndStore(rhs.unsafeExchange(nullptr));
return *this;
}
template <class T>
template <class TT>
// clang-format off
requires std::convertible_to<TT*, T*>
// clang-format on
SharedIntrusive<T>&
SharedIntrusive<T>::operator=(SharedIntrusive<TT>&& rhs)
{
static_assert(
!std::is_same_v<T, TT>,
"This overload should not be instantiated for T == TT");
unsafeReleaseAndStore(rhs.unsafeExchange(nullptr));
return *this;
}
template <class T>
bool
SharedIntrusive<T>::operator!=(std::nullptr_t) const
{
return this->get() != nullptr;
}
template <class T>
bool
SharedIntrusive<T>::operator==(std::nullptr_t) const
{
return this->get() == nullptr;
}
template <class T>
template <CAdoptTag TAdoptTag>
void
SharedIntrusive<T>::adopt(T* p)
{
if constexpr (std::is_same_v<
TAdoptTag,
SharedIntrusiveAdoptIncrementStrongTag>)
{
if (p)
p->addStrongRef();
}
unsafeReleaseAndStore(p);
}
template <class T>
SharedIntrusive<T>::~SharedIntrusive()
{
unsafeReleaseAndStore(nullptr);
};
template <class T>
template <class TT>
SharedIntrusive<T>::SharedIntrusive(
StaticCastTagSharedIntrusive,
SharedIntrusive<TT> const& rhs)
: ptr_{[&] {
auto p = static_cast<T*>(rhs.unsafeGetRawPtr());
if (p)
p->addStrongRef();
return p;
}()}
{
}
template <class T>
template <class TT>
SharedIntrusive<T>::SharedIntrusive(
StaticCastTagSharedIntrusive,
SharedIntrusive<TT>&& rhs)
: ptr_{static_cast<T*>(rhs.unsafeExchange(nullptr))}
{
}
template <class T>
template <class TT>
SharedIntrusive<T>::SharedIntrusive(
DynamicCastTagSharedIntrusive,
SharedIntrusive<TT> const& rhs)
: ptr_{[&] {
auto p = dynamic_cast<T*>(rhs.unsafeGetRawPtr());
if (p)
p->addStrongRef();
return p;
}()}
{
}
template <class T>
template <class TT>
SharedIntrusive<T>::SharedIntrusive(
DynamicCastTagSharedIntrusive,
SharedIntrusive<TT>&& rhs)
{
// This can be simplified without the `exchange`, but the `exchange` is kept
// in anticipation of supporting atomic operations.
auto toSet = rhs.unsafeExchange(nullptr);
if (toSet)
{
ptr_ = dynamic_cast<T*>(toSet);
if (!ptr_)
// need to set the pointer back or will leak
rhs.unsafeExchange(toSet);
}
}
template <class T>
T&
SharedIntrusive<T>::operator*() const noexcept
{
return *unsafeGetRawPtr();
}
template <class T>
T*
SharedIntrusive<T>::operator->() const noexcept
{
return unsafeGetRawPtr();
}
template <class T>
SharedIntrusive<T>::operator bool() const noexcept
{
return bool(unsafeGetRawPtr());
}
template <class T>
void
SharedIntrusive<T>::reset()
{
unsafeReleaseAndStore(nullptr);
}
template <class T>
T*
SharedIntrusive<T>::get() const
{
return unsafeGetRawPtr();
}
template <class T>
std::size_t
SharedIntrusive<T>::use_count() const
{
if (auto p = unsafeGetRawPtr())
return p->use_count();
return 0;
}
template <class T>
T*
SharedIntrusive<T>::unsafeGetRawPtr() const
{
return ptr_;
}
template <class T>
void
SharedIntrusive<T>::unsafeSetRawPtr(T* p)
{
ptr_ = p;
}
template <class T>
T*
SharedIntrusive<T>::unsafeExchange(T* p)
{
return std::exchange(ptr_, p);
}
template <class T>
void
SharedIntrusive<T>::unsafeReleaseAndStore(T* next)
{
auto prev = unsafeExchange(next);
if (!prev)
return;
using enum ReleaseStrongRefAction;
auto action = prev->releaseStrongRef();
switch (action)
{
case noop:
break;
case destroy:
delete prev;
break;
case partialDestroy:
prev->partialDestructor();
partialDestructorFinished(&prev);
// prev is null and may no longer be used
break;
}
}
//------------------------------------------------------------------------------
template <class T>
WeakIntrusive<T>::WeakIntrusive(WeakIntrusive const& rhs) : ptr_{rhs.ptr_}
{
if (ptr_)
ptr_->addWeakRef();
}
template <class T>
WeakIntrusive<T>::WeakIntrusive(WeakIntrusive&& rhs) : ptr_{rhs.ptr_}
{
rhs.ptr_ = nullptr;
}
template <class T>
WeakIntrusive<T>::WeakIntrusive(SharedIntrusive<T> const& rhs)
: ptr_{rhs.unsafeGetRawPtr()}
{
if (ptr_)
ptr_->addWeakRef();
}
template <class T>
template <class TT>
// clang-format off
requires std::convertible_to<TT*, T*>
// clang-format on
WeakIntrusive<T>&
WeakIntrusive<T>::operator=(SharedIntrusive<TT> const& rhs)
{
unsafeReleaseNoStore();
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addWeakRef();
return *this;
}
template <class T>
void
WeakIntrusive<T>::adopt(T* ptr)
{
unsafeReleaseNoStore();
if (ptr)
ptr->addWeakRef();
ptr_ = ptr;
}
template <class T>
WeakIntrusive<T>::~WeakIntrusive()
{
unsafeReleaseNoStore();
}
template <class T>
SharedIntrusive<T>
WeakIntrusive<T>::lock() const
{
if (ptr_ && ptr_->checkoutStrongRefFromWeak())
{
return SharedIntrusive<T>{ptr_, SharedIntrusiveAdoptNoIncrementTag{}};
}
return {};
}
template <class T>
bool
WeakIntrusive<T>::expired() const
{
return (!ptr_ || ptr_->expired());
}
template <class T>
void
WeakIntrusive<T>::reset()
{
unsafeReleaseNoStore();
ptr_ = nullptr;
}
template <class T>
void
WeakIntrusive<T>::unsafeReleaseNoStore()
{
if (!ptr_)
return;
using enum ReleaseWeakRefAction;
auto action = ptr_->releaseWeakRef();
switch (action)
{
case noop:
break;
case destroy:
delete ptr_;
break;
}
}
//------------------------------------------------------------------------------
template <class T>
SharedWeakUnion<T>::SharedWeakUnion(SharedWeakUnion const& rhs) : tp_{rhs.tp_}
{
auto p = rhs.unsafeGetRawPtr();
if (!p)
return;
if (rhs.isStrong())
p->addStrongRef();
else
p->addWeakRef();
}
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion<T>::SharedWeakUnion(SharedIntrusive<TT> const& rhs)
{
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
unsafeSetRawPtr(p, RefStrength::strong);
}
template <class T>
SharedWeakUnion<T>::SharedWeakUnion(SharedWeakUnion&& rhs) : tp_{rhs.tp_}
{
rhs.unsafeSetRawPtr(nullptr);
}
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakUnion<T>::SharedWeakUnion(SharedIntrusive<TT>&& rhs)
{
auto p = rhs.unsafeGetRawPtr();
if (p)
unsafeSetRawPtr(p, RefStrength::strong);
rhs.unsafeSetRawPtr(nullptr);
}
template <class T>
SharedWeakUnion<T>&
SharedWeakUnion<T>::operator=(SharedWeakUnion const& rhs)
{
if (this == &rhs)
return *this;
unsafeReleaseNoStore();
if (auto p = rhs.unsafeGetRawPtr())
{
if (rhs.isStrong())
{
p->addStrongRef();
unsafeSetRawPtr(p, RefStrength::strong);
}
else
{
p->addWeakRef();
unsafeSetRawPtr(p, RefStrength::weak);
}
}
else
{
unsafeSetRawPtr(nullptr);
}
return *this;
}
template <class T>
template <class TT>
// clang-format off
requires std::convertible_to<TT*, T*>
// clang-format on
SharedWeakUnion<T>&
SharedWeakUnion<T>::operator=(SharedIntrusive<TT> const& rhs)
{
unsafeReleaseNoStore();
auto p = rhs.unsafeGetRawPtr();
if (p)
p->addStrongRef();
unsafeSetRawPtr(p, RefStrength::strong);
return *this;
}
template <class T>
template <class TT>
// clang-format off
requires std::convertible_to<TT*, T*>
// clang-format on
SharedWeakUnion<T>&
SharedWeakUnion<T>::operator=(SharedIntrusive<TT>&& rhs)
{
unsafeReleaseNoStore();
unsafeSetRawPtr(rhs.unsafeGetRawPtr(), RefStrength::strong);
rhs.unsafeSetRawPtr(nullptr);
return *this;
}
template <class T>
SharedWeakUnion<T>::~SharedWeakUnion()
{
unsafeReleaseNoStore();
};
// Return a strong pointer if this is already a strong pointer (i.e. don't
// lock the weak pointer. Use the `lock` method if that's what's needed)
template <class T>
SharedIntrusive<T>
SharedWeakUnion<T>::getStrong() const
{
SharedIntrusive<T> result;
auto p = unsafeGetRawPtr();
if (p && isStrong())
{
result.template adopt<SharedIntrusiveAdoptIncrementStrongTag>(p);
}
return result;
}
template <class T>
SharedWeakUnion<T>::operator bool() const noexcept
{
return bool(get());
}
template <class T>
void
SharedWeakUnion<T>::reset()
{
unsafeReleaseNoStore();
unsafeSetRawPtr(nullptr);
}
template <class T>
T*
SharedWeakUnion<T>::get() const
{
return isStrong() ? unsafeGetRawPtr() : nullptr;
}
template <class T>
std::size_t
SharedWeakUnion<T>::use_count() const
{
if (auto p = get())
return p->use_count();
return 0;
}
template <class T>
bool
SharedWeakUnion<T>::expired() const
{
auto p = unsafeGetRawPtr();
return (!p || p->expired());
}
template <class T>
SharedIntrusive<T>
SharedWeakUnion<T>::lock() const
{
SharedIntrusive<T> result;
auto p = unsafeGetRawPtr();
if (!p)
return result;
if (isStrong())
{
result.template adopt<SharedIntrusiveAdoptIncrementStrongTag>(p);
return result;
}
if (p->checkoutStrongRefFromWeak())
{
result.template adopt<SharedIntrusiveAdoptNoIncrementTag>(p);
return result;
}
return result;
}
template <class T>
bool
SharedWeakUnion<T>::isStrong() const
{
return !(tp_ & tagMask);
}
template <class T>
bool
SharedWeakUnion<T>::isWeak() const
{
return tp_ & tagMask;
}
template <class T>
bool
SharedWeakUnion<T>::convertToStrong()
{
if (isStrong())
return true;
auto p = unsafeGetRawPtr();
if (p && p->checkoutStrongRefFromWeak())
{
[[maybe_unused]] auto action = p->releaseWeakRef();
XRPL_ASSERT(
(action == ReleaseWeakRefAction::noop),
"ripple::SharedWeakUnion::convertToStrong : "
"action is noop");
unsafeSetRawPtr(p, RefStrength::strong);
return true;
}
return false;
}
template <class T>
bool
SharedWeakUnion<T>::convertToWeak()
{
if (isWeak())
return true;
auto p = unsafeGetRawPtr();
if (!p)
return false;
using enum ReleaseStrongRefAction;
auto action = p->addWeakReleaseStrongRef();
switch (action)
{
case noop:
break;
case destroy:
// We just added a weak ref. How could we destroy?
UNREACHABLE(
"ripple::SharedWeakUnion::convertToWeak : destroying freshly "
"added ref");
delete p;
unsafeSetRawPtr(nullptr);
return true; // Should never happen
case partialDestroy:
// This is a weird case. We just converted the last strong
// pointer to a weak pointer.
p->partialDestructor();
partialDestructorFinished(&p);
// p is null and may no longer be used
break;
}
unsafeSetRawPtr(p, RefStrength::weak);
return true;
}
template <class T>
T*
SharedWeakUnion<T>::unsafeGetRawPtr() const
{
return reinterpret_cast<T*>(tp_ & ptrMask);
}
template <class T>
void
SharedWeakUnion<T>::unsafeSetRawPtr(T* p, RefStrength rs)
{
tp_ = reinterpret_cast<std::uintptr_t>(p);
if (tp_ && rs == RefStrength::weak)
tp_ |= tagMask;
}
template <class T>
void
SharedWeakUnion<T>::unsafeSetRawPtr(std::nullptr_t)
{
tp_ = 0;
}
template <class T>
void
SharedWeakUnion<T>::unsafeReleaseNoStore()
{
auto p = unsafeGetRawPtr();
if (!p)
return;
if (isStrong())
{
using enum ReleaseStrongRefAction;
auto strongAction = p->releaseStrongRef();
switch (strongAction)
{
case noop:
break;
case destroy:
delete p;
break;
case partialDestroy:
p->partialDestructor();
partialDestructorFinished(&p);
// p is null and may no longer be used
break;
}
}
else
{
using enum ReleaseWeakRefAction;
auto weakAction = p->releaseWeakRef();
switch (weakAction)
{
case noop:
break;
case destroy:
delete p;
break;
}
}
}
} // namespace ripple
#endif

View File

@@ -0,0 +1,502 @@
//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2023 Ripple Labs Inc.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
//==============================================================================
#ifndef RIPPLE_BASICS_INTRUSIVEREFCOUNTS_H_INCLUDED
#define RIPPLE_BASICS_INTRUSIVEREFCOUNTS_H_INCLUDED
#include <xrpl/beast/utility/instrumentation.h>
#include <atomic>
#include <cstdint>
namespace ripple {
/** Action to perform when releasing a strong pointer.
noop: Do nothing. For example, a `noop` action will occur when a count is
decremented to a non-zero value.
partialDestroy: Run the `partialDestructor`. This action will happen when a
strong count is decremented to zero and the weak count is non-zero.
destroy: Run the destructor. This action will occur when either the strong
count or weak count is decremented and the other count is also zero.
*/
enum class ReleaseStrongRefAction { noop, partialDestroy, destroy };
/** Action to perform when releasing a weak pointer.
noop: Do nothing. For example, a `noop` action will occur when a count is
decremented to a non-zero value.
destroy: Run the destructor. This action will occur when either the strong
count or weak count is decremented and the other count is also zero.
*/
enum class ReleaseWeakRefAction { noop, destroy };
/** Implement the strong count, weak count, and bit flags for an intrusive
pointer.
A class can satisfy the requirements of a ripple::IntrusivePointer by
inheriting from this class.
*/
struct IntrusiveRefCounts
{
virtual ~IntrusiveRefCounts() noexcept;
// This must be `noexcept` or the make_SharedIntrusive function could leak
// memory.
void
addStrongRef() const noexcept;
void
addWeakRef() const noexcept;
ReleaseStrongRefAction
releaseStrongRef() const;
// Same as:
// {
// addWeakRef();
// return releaseStrongRef;
// }
// done as one atomic operation
ReleaseStrongRefAction
addWeakReleaseStrongRef() const;
ReleaseWeakRefAction
releaseWeakRef() const;
// Returns true is able to checkout a strong ref. False otherwise
bool
checkoutStrongRefFromWeak() const noexcept;
bool
expired() const noexcept;
std::size_t
use_count() const noexcept;
// This function MUST be called after a partial destructor finishes running.
// Calling this function may cause other threads to delete the object
// pointed to by `o`, so `o` should never be used after calling this
// function. The parameter will be set to a `nullptr` after calling this
// function to emphasize that it should not be used.
// Note: This is intentionally NOT called at the end of `partialDestructor`.
// The reason for this is if new classes are written to support this smart
// pointer class, they need to write their own `partialDestructor` function
// and ensure `partialDestructorFinished` is called at the end. Putting this
// call inside the smart pointer class itself is expected to be less error
// prone.
// Note: The "two-star" programming is intentional. It emphasizes that `o`
// may be deleted and the unergonomic API is meant to signal the special
// nature of this function call to callers.
// Note: This is a template to support incompletely defined classes.
template <class T>
friend void
partialDestructorFinished(T** o);
private:
// TODO: We may need to use a uint64_t for both counts. This will reduce the
// memory savings. We need to audit the code to make sure 16 bit counts are
// enough for strong pointers and 14 bit counts are enough for weak
// pointers. Use type aliases to make it easy to switch types.
using CountType = std::uint16_t;
static constexpr size_t StrongCountNumBits = sizeof(CountType) * 8;
static constexpr size_t WeakCountNumBits = StrongCountNumBits - 2;
using FieldType = std::uint32_t;
static constexpr size_t FieldTypeBits = sizeof(FieldType) * 8;
static constexpr FieldType one = 1;
/** `refCounts` consists of four fields that are treated atomically:
1. Strong count. This is a count of the number of shared pointers that
hold a reference to this object. When the strong counts goes to zero,
if the weak count is zero, the destructor is run. If the weak count is
non-zero when the strong count goes to zero then the partialDestructor
is run.
2. Weak count. This is a count of the number of weak pointer that hold
a reference to this object. When the weak count goes to zero and the
strong count is also zero, then the destructor is run.
3. Partial destroy started bit. This bit is set if the
`partialDestructor` function has been started (or is about to be
started). This is used to prevent the destructor from running
concurrently with the partial destructor. This can easily happen when
the last strong pointer release its reference in one thread and starts
the partialDestructor, while in another thread the last weak pointer
goes out of scope and starts the destructor while the partialDestructor
is still running. Both a start and finished bit is needed to handle a
corner-case where the last strong pointer goes out of scope, then then
last `weakPointer` goes out of scope, but this happens before the
`partialDestructor` bit is set. It would be possible to use a single
bit if it could also be set atomically when the strong count goes to
zero and the weak count is non-zero, but that would add complexity (and
likely slow down common cases as well).
4. Partial destroy finished bit. This bit is set when the
`partialDestructor` has finished running. See (3) above for more
information.
*/
mutable std::atomic<FieldType> refCounts{strongDelta};
/** Amount to change the strong count when adding or releasing a reference
Note: The strong count is stored in the low `StrongCountNumBits` bits
of refCounts
*/
static constexpr FieldType strongDelta = 1;
/** Amount to change the weak count when adding or releasing a reference
Note: The weak count is stored in the high `WeakCountNumBits` bits of
refCounts
*/
static constexpr FieldType weakDelta = (one << StrongCountNumBits);
/** Flag that is set when the partialDestroy function has started running
(or is about to start running).
See description of the `refCounts` field for a fuller description of
this field.
*/
static constexpr FieldType partialDestroyStartedMask =
(one << (FieldTypeBits - 1));
/** Flag that is set when the partialDestroy function has finished running
See description of the `refCounts` field for a fuller description of
this field.
*/
static constexpr FieldType partialDestroyFinishedMask =
(one << (FieldTypeBits - 2));
/** Mask that will zero out all the `count` bits and leave the tag bits
unchanged.
*/
static constexpr FieldType tagMask =
partialDestroyStartedMask | partialDestroyFinishedMask;
/** Mask that will zero out the `tag` bits and leave the count bits
unchanged.
*/
static constexpr FieldType valueMask = ~tagMask;
/** Mask that will zero out everything except the strong count.
*/
static constexpr FieldType strongMask =
((one << StrongCountNumBits) - 1) & valueMask;
/** Mask that will zero out everything except the weak count.
*/
static constexpr FieldType weakMask =
(((one << WeakCountNumBits) - 1) << StrongCountNumBits) & valueMask;
/** Unpack the count and tag fields from the packed atomic integer form. */
struct RefCountPair
{
CountType strong;
CountType weak;
/** The `partialDestroyStartedBit` is set to on when the partial
destroy function is started. It is not a boolean; it is a uint32
with all bits zero with the possible exception of the
`partialDestroyStartedMask` bit. This is done so it can be directly
masked into the `combinedValue`.
*/
FieldType partialDestroyStartedBit{0};
/** The `partialDestroyFinishedBit` is set to on when the partial
destroy function has finished.
*/
FieldType partialDestroyFinishedBit{0};
RefCountPair(FieldType v) noexcept;
RefCountPair(CountType s, CountType w) noexcept;
/** Convert back to the packed integer form. */
FieldType
combinedValue() const noexcept;
static constexpr CountType maxStrongValue =
static_cast<CountType>((one << StrongCountNumBits) - 1);
static constexpr CountType maxWeakValue =
static_cast<CountType>((one << WeakCountNumBits) - 1);
/** Put an extra margin to detect when running up against limits.
This is only used in debug code, and is useful if we reduce the
number of bits in the strong and weak counts (to 16 and 14 bits).
*/
static constexpr CountType checkStrongMaxValue = maxStrongValue - 32;
static constexpr CountType checkWeakMaxValue = maxWeakValue - 32;
};
};
inline void
IntrusiveRefCounts::addStrongRef() const noexcept
{
refCounts.fetch_add(strongDelta, std::memory_order_acq_rel);
}
inline void
IntrusiveRefCounts::addWeakRef() const noexcept
{
refCounts.fetch_add(weakDelta, std::memory_order_acq_rel);
}
inline ReleaseStrongRefAction
IntrusiveRefCounts::releaseStrongRef() const
{
// Subtract `strongDelta` from refCounts. If this releases the last strong
// ref, set the `partialDestroyStarted` bit. It is important that the ref
// count and the `partialDestroyStartedBit` are changed atomically (hence
// the loop and `compare_exchange` op). If this didn't need to be done
// atomically, the loop could be replaced with a `fetch_sub` and a
// conditional `fetch_or`. This loop will almost always run once.
using enum ReleaseStrongRefAction;
auto prevIntVal = refCounts.load(std::memory_order_acquire);
while (1)
{
RefCountPair const prevVal{prevIntVal};
XRPL_ASSERT(
(prevVal.strong >= strongDelta),
"ripple::IntrusiveRefCounts::releaseStrongRef : previous ref "
"higher than new");
auto nextIntVal = prevIntVal - strongDelta;
ReleaseStrongRefAction action = noop;
if (prevVal.strong == 1)
{
if (prevVal.weak == 0)
{
action = destroy;
}
else
{
nextIntVal |= partialDestroyStartedMask;
action = partialDestroy;
}
}
if (refCounts.compare_exchange_weak(
prevIntVal, nextIntVal, std::memory_order_acq_rel))
{
// Can't be in partial destroy because only decrementing the strong
// count to zero can start a partial destroy, and that can't happen
// twice.
XRPL_ASSERT(
(action == noop) || !(prevIntVal & partialDestroyStartedMask),
"ripple::IntrusiveRefCounts::releaseStrongRef : not in partial "
"destroy");
return action;
}
}
}
inline ReleaseStrongRefAction
IntrusiveRefCounts::addWeakReleaseStrongRef() const
{
using enum ReleaseStrongRefAction;
static_assert(weakDelta > strongDelta);
auto constexpr delta = weakDelta - strongDelta;
auto prevIntVal = refCounts.load(std::memory_order_acquire);
// This loop will almost always run once. The loop is needed to atomically
// change the counts and flags (the count could be atomically changed, but
// the flags depend on the current value of the counts).
//
// Note: If this becomes a perf bottleneck, the `partialDestoryStartedMask`
// may be able to be set non-atomically. But it is easier to reason about
// the code if the flag is set atomically.
while (1)
{
RefCountPair const prevVal{prevIntVal};
// Converted the last strong pointer to a weak pointer.
//
// Can't be in partial destroy because only decrementing the
// strong count to zero can start a partial destroy, and that
// can't happen twice.
XRPL_ASSERT(
(!prevVal.partialDestroyStartedBit),
"ripple::IntrusiveRefCounts::addWeakReleaseStrongRef : not in "
"partial destroy");
auto nextIntVal = prevIntVal + delta;
ReleaseStrongRefAction action = noop;
if (prevVal.strong == 1)
{
if (prevVal.weak == 0)
{
action = noop;
}
else
{
nextIntVal |= partialDestroyStartedMask;
action = partialDestroy;
}
}
if (refCounts.compare_exchange_weak(
prevIntVal, nextIntVal, std::memory_order_acq_rel))
{
XRPL_ASSERT(
(!(prevIntVal & partialDestroyStartedMask)),
"ripple::IntrusiveRefCounts::addWeakReleaseStrongRef : not "
"started partial destroy");
return action;
}
}
}
inline ReleaseWeakRefAction
IntrusiveRefCounts::releaseWeakRef() const
{
auto prevIntVal = refCounts.fetch_sub(weakDelta, std::memory_order_acq_rel);
RefCountPair prev = prevIntVal;
if (prev.weak == 1 && prev.strong == 0)
{
if (!prev.partialDestroyStartedBit)
{
// This case should only be hit if the partialDestroyStartedBit is
// set non-atomically (and even then very rarely). The code is kept
// in case we need to set the flag non-atomically for perf reasons.
refCounts.wait(prevIntVal, std::memory_order_acquire);
prevIntVal = refCounts.load(std::memory_order_acquire);
prev = RefCountPair{prevIntVal};
}
if (!prev.partialDestroyFinishedBit)
{
// partial destroy MUST finish before running a full destroy (when
// using weak pointers)
refCounts.wait(prevIntVal - weakDelta, std::memory_order_acquire);
}
return ReleaseWeakRefAction::destroy;
}
return ReleaseWeakRefAction::noop;
}
inline bool
IntrusiveRefCounts::checkoutStrongRefFromWeak() const noexcept
{
auto curValue = RefCountPair{1, 1}.combinedValue();
auto desiredValue = RefCountPair{2, 1}.combinedValue();
while (!refCounts.compare_exchange_weak(
curValue, desiredValue, std::memory_order_acq_rel))
{
RefCountPair const prev{curValue};
if (!prev.strong)
return false;
desiredValue = curValue + strongDelta;
}
return true;
}
inline bool
IntrusiveRefCounts::expired() const noexcept
{
RefCountPair const val = refCounts.load(std::memory_order_acquire);
return val.strong == 0;
}
inline std::size_t
IntrusiveRefCounts::use_count() const noexcept
{
RefCountPair const val = refCounts.load(std::memory_order_acquire);
return val.strong;
}
inline IntrusiveRefCounts::~IntrusiveRefCounts() noexcept
{
#ifndef NDEBUG
auto v = refCounts.load(std::memory_order_acquire);
XRPL_ASSERT(
(!(v & valueMask)),
"ripple::IntrusiveRefCounts::~IntrusiveRefCounts : count must be zero");
auto t = v & tagMask;
XRPL_ASSERT(
(!t || t == tagMask),
"ripple::IntrusiveRefCounts::~IntrusiveRefCounts : valid tag");
#endif
}
//------------------------------------------------------------------------------
inline IntrusiveRefCounts::RefCountPair::RefCountPair(
IntrusiveRefCounts::FieldType v) noexcept
: strong{static_cast<CountType>(v & strongMask)}
, weak{static_cast<CountType>((v & weakMask) >> StrongCountNumBits)}
, partialDestroyStartedBit{v & partialDestroyStartedMask}
, partialDestroyFinishedBit{v & partialDestroyFinishedMask}
{
XRPL_ASSERT(
(strong < checkStrongMaxValue && weak < checkWeakMaxValue),
"ripple::IntrusiveRefCounts::RefCountPair(FieldType) : inputs inside "
"range");
}
inline IntrusiveRefCounts::RefCountPair::RefCountPair(
IntrusiveRefCounts::CountType s,
IntrusiveRefCounts::CountType w) noexcept
: strong{s}, weak{w}
{
XRPL_ASSERT(
(strong < checkStrongMaxValue && weak < checkWeakMaxValue),
"ripple::IntrusiveRefCounts::RefCountPair(CountType, CountType) : "
"inputs inside range");
}
inline IntrusiveRefCounts::FieldType
IntrusiveRefCounts::RefCountPair::combinedValue() const noexcept
{
XRPL_ASSERT(
(strong < checkStrongMaxValue && weak < checkWeakMaxValue),
"ripple::IntrusiveRefCounts::RefCountPair::combinedValue : inputs "
"inside range");
return (static_cast<IntrusiveRefCounts::FieldType>(weak)
<< IntrusiveRefCounts::StrongCountNumBits) |
static_cast<IntrusiveRefCounts::FieldType>(strong) |
partialDestroyStartedBit | partialDestroyFinishedBit;
}
template <class T>
inline void
partialDestructorFinished(T** o)
{
T& self = **o;
IntrusiveRefCounts::RefCountPair p =
self.refCounts.fetch_or(IntrusiveRefCounts::partialDestroyFinishedMask);
XRPL_ASSERT(
(!p.partialDestroyFinishedBit && p.partialDestroyStartedBit &&
!p.strong),
"ripple::partialDestructorFinished : not a weak ref");
if (!p.weak)
{
// There was a weak count before the partial destructor ran (or we would
// have run the full destructor) and now there isn't a weak count. Some
// thread is waiting to run the destructor.
self.refCounts.notify_one();
}
// Set the pointer to null to emphasize that the object shouldn't be used
// after calling this function as it may be destroyed in another thread.
*o = nullptr;
}
//------------------------------------------------------------------------------
} // namespace ripple
#endif

View File

@@ -21,6 +21,7 @@
#define RIPPLE_BASICS_LOCALVALUE_H_INCLUDED
#include <boost/thread/tss.hpp>
#include <memory>
#include <unordered_map>

View File

@@ -22,8 +22,10 @@
#include <xrpl/basics/UnorderedContainers.h>
#include <xrpl/beast/utility/Journal.h>
#include <boost/beast/core/string.hpp>
#include <boost/filesystem.hpp>
#include <map>
#include <memory>
#include <mutex>

View File

@@ -20,11 +20,11 @@
#ifndef RIPPLE_BASICS_RESOLVER_H_INCLUDED
#define RIPPLE_BASICS_RESOLVER_H_INCLUDED
#include <xrpl/beast/net/IPEndpoint.h>
#include <functional>
#include <vector>
#include <xrpl/beast/net/IPEndpoint.h>
namespace ripple {
class Resolver

View File

@@ -22,6 +22,7 @@
#include <xrpl/basics/Resolver.h>
#include <xrpl/beast/utility/Journal.h>
#include <boost/asio/io_service.hpp>
namespace ripple {

View File

@@ -0,0 +1,135 @@
//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2023 Ripple Labs Inc.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
//==============================================================================
#ifndef RIPPLE_BASICS_SHAREDWEAKCACHEPOINTER_H_INCLUDED
#define RIPPLE_BASICS_SHAREDWEAKCACHEPOINTER_H_INCLUDED
#include <memory>
#include <variant>
namespace ripple {
/** A combination of a std::shared_ptr and a std::weak_pointer.
This class is a wrapper to a `std::variant<std::shared_ptr,std::weak_ptr>`
This class is useful for storing intrusive pointers in tagged caches using less
memory than storing both pointers directly.
*/
template <class T>
class SharedWeakCachePointer
{
public:
SharedWeakCachePointer() = default;
SharedWeakCachePointer(SharedWeakCachePointer const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer(std::shared_ptr<TT> const& rhs);
SharedWeakCachePointer(SharedWeakCachePointer&& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer(std::shared_ptr<TT>&& rhs);
SharedWeakCachePointer&
operator=(SharedWeakCachePointer const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer&
operator=(std::shared_ptr<TT> const& rhs);
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer&
operator=(std::shared_ptr<TT>&& rhs);
~SharedWeakCachePointer();
/** Return a strong pointer if this is already a strong pointer (i.e. don't
lock the weak pointer. Use the `lock` method if that's what's needed)
*/
std::shared_ptr<T> const&
getStrong() const;
/** Return true if this is a strong pointer and the strong pointer is
seated.
*/
explicit
operator bool() const noexcept;
/** Set the pointer to null, decrement the appropriate ref count, and run
the appropriate release action.
*/
void
reset();
/** If this is a strong pointer, return the raw pointer. Otherwise return
null.
*/
T*
get() const;
/** If this is a strong pointer, return the strong count. Otherwise return 0
*/
std::size_t
use_count() const;
/** Return true if there is a non-zero strong count. */
bool
expired() const;
/** If this is a strong pointer, return the strong pointer. Otherwise
attempt to lock the weak pointer.
*/
std::shared_ptr<T>
lock() const;
/** Return true is this represents a strong pointer. */
bool
isStrong() const;
/** Return true is this represents a weak pointer. */
bool
isWeak() const;
/** If this is a weak pointer, attempt to convert it to a strong pointer.
@return true if successfully converted to a strong pointer (or was
already a strong pointer). Otherwise false.
*/
bool
convertToStrong();
/** If this is a strong pointer, attempt to convert it to a weak pointer.
@return false if the pointer is null. Otherwise return true.
*/
bool
convertToWeak();
private:
std::variant<std::shared_ptr<T>, std::weak_ptr<T>> combo_;
};
} // namespace ripple
#endif

View File

@@ -0,0 +1,192 @@
//------------------------------------------------------------------------------
/*
This file is part of rippled: https://github.com/ripple/rippled
Copyright (c) 2023 Ripple Labs Inc.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL , DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
//==============================================================================
#ifndef RIPPLE_BASICS_SHAREDWEAKCACHEPOINTER_IPP_INCLUDED
#define RIPPLE_BASICS_SHAREDWEAKCACHEPOINTER_IPP_INCLUDED
#include <xrpl/basics/SharedWeakCachePointer.h>
namespace ripple {
template <class T>
SharedWeakCachePointer<T>::SharedWeakCachePointer(
SharedWeakCachePointer const& rhs) = default;
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer<T>::SharedWeakCachePointer(
std::shared_ptr<TT> const& rhs)
: combo_{rhs}
{
}
template <class T>
SharedWeakCachePointer<T>::SharedWeakCachePointer(
SharedWeakCachePointer&& rhs) = default;
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer<T>::SharedWeakCachePointer(std::shared_ptr<TT>&& rhs)
: combo_{std::move(rhs)}
{
}
template <class T>
SharedWeakCachePointer<T>&
SharedWeakCachePointer<T>::operator=(SharedWeakCachePointer const& rhs) =
default;
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer<T>&
SharedWeakCachePointer<T>::operator=(std::shared_ptr<TT> const& rhs)
{
combo_ = rhs;
return *this;
}
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedWeakCachePointer<T>&
SharedWeakCachePointer<T>::operator=(std::shared_ptr<TT>&& rhs)
{
combo_ = std::move(rhs);
return *this;
}
template <class T>
SharedWeakCachePointer<T>::~SharedWeakCachePointer() = default;
// Return a strong pointer if this is already a strong pointer (i.e. don't
// lock the weak pointer. Use the `lock` method if that's what's needed)
template <class T>
std::shared_ptr<T> const&
SharedWeakCachePointer<T>::getStrong() const
{
static std::shared_ptr<T> const empty;
if (auto p = std::get_if<std::shared_ptr<T>>(&combo_))
return *p;
return empty;
}
template <class T>
SharedWeakCachePointer<T>::operator bool() const noexcept
{
return !!std::get_if<std::shared_ptr<T>>(&combo_);
}
template <class T>
void
SharedWeakCachePointer<T>::reset()
{
combo_ = std::shared_ptr<T>{};
}
template <class T>
T*
SharedWeakCachePointer<T>::get() const
{
return std::get_if<std::shared_ptr<T>>(&combo_).get();
}
template <class T>
std::size_t
SharedWeakCachePointer<T>::use_count() const
{
if (auto p = std::get_if<std::shared_ptr<T>>(&combo_))
return p->use_count();
return 0;
}
template <class T>
bool
SharedWeakCachePointer<T>::expired() const
{
if (auto p = std::get_if<std::weak_ptr<T>>(&combo_))
return p->expired();
return !std::get_if<std::shared_ptr<T>>(&combo_);
}
template <class T>
std::shared_ptr<T>
SharedWeakCachePointer<T>::lock() const
{
if (auto p = std::get_if<std::shared_ptr<T>>(&combo_))
return *p;
if (auto p = std::get_if<std::weak_ptr<T>>(&combo_))
return p->lock();
return {};
}
template <class T>
bool
SharedWeakCachePointer<T>::isStrong() const
{
if (auto p = std::get_if<std::shared_ptr<T>>(&combo_))
return !!p->get();
return false;
}
template <class T>
bool
SharedWeakCachePointer<T>::isWeak() const
{
return !isStrong();
}
template <class T>
bool
SharedWeakCachePointer<T>::convertToStrong()
{
if (isStrong())
return true;
if (auto p = std::get_if<std::weak_ptr<T>>(&combo_))
{
if (auto s = p->lock())
{
combo_ = std::move(s);
return true;
}
}
return false;
}
template <class T>
bool
SharedWeakCachePointer<T>::convertToWeak()
{
if (isWeak())
return true;
if (auto p = std::get_if<std::shared_ptr<T>>(&combo_))
{
combo_ = std::weak_ptr<T>(*p);
return true;
}
return false;
}
} // namespace ripple
#endif

View File

@@ -23,6 +23,7 @@
#include <xrpl/basics/contract.h>
#include <xrpl/basics/strHex.h>
#include <xrpl/beast/utility/instrumentation.h>
#include <algorithm>
#include <array>
#include <cstdint>

View File

@@ -20,11 +20,14 @@
#ifndef RIPPLE_BASICS_TAGGEDCACHE_H_INCLUDED
#define RIPPLE_BASICS_TAGGEDCACHE_H_INCLUDED
#include <xrpl/basics/IntrusivePointer.h>
#include <xrpl/basics/Log.h>
#include <xrpl/basics/SharedWeakCachePointer.ipp>
#include <xrpl/basics/UnorderedContainers.h>
#include <xrpl/basics/hardened_hash.h>
#include <xrpl/beast/clock/abstract_clock.h>
#include <xrpl/beast/insight/Insight.h>
#include <atomic>
#include <functional>
#include <mutex>
@@ -50,6 +53,8 @@ template <
class Key,
class T,
bool IsKeyCache = false,
class SharedWeakUnionPointerType = SharedWeakCachePointer<T>,
class SharedPointerType = std::shared_ptr<T>,
class Hash = hardened_hash<>,
class KeyEqual = std::equal_to<Key>,
class Mutex = std::recursive_mutex>
@@ -60,6 +65,8 @@ public:
using key_type = Key;
using mapped_type = T;
using clock_type = beast::abstract_clock<std::chrono::steady_clock>;
using shared_weak_combo_pointer_type = SharedWeakUnionPointerType;
using shared_pointer_type = SharedPointerType;
public:
TaggedCache(
@@ -69,231 +76,48 @@ public:
clock_type& clock,
beast::Journal journal,
beast::insight::Collector::ptr const& collector =
beast::insight::NullCollector::New())
: m_journal(journal)
, m_clock(clock)
, m_stats(
name,
std::bind(&TaggedCache::collect_metrics, this),
collector)
, m_name(name)
, m_target_size(size)
, m_target_age(expiration)
, m_cache_count(0)
, m_hits(0)
, m_misses(0)
{
}
beast::insight::NullCollector::New());
public:
/** Return the clock associated with the cache. */
clock_type&
clock()
{
return m_clock;
}
clock();
/** Returns the number of items in the container. */
std::size_t
size() const
{
std::lock_guard lock(m_mutex);
return m_cache.size();
}
void
setTargetSize(int s)
{
std::lock_guard lock(m_mutex);
m_target_size = s;
if (s > 0)
{
for (auto& partition : m_cache.map())
{
partition.rehash(static_cast<std::size_t>(
(s + (s >> 2)) /
(partition.max_load_factor() * m_cache.partitions()) +
1));
}
}
JLOG(m_journal.debug()) << m_name << " target size set to " << s;
}
clock_type::duration
getTargetAge() const
{
std::lock_guard lock(m_mutex);
return m_target_age;
}
void
setTargetAge(clock_type::duration s)
{
std::lock_guard lock(m_mutex);
m_target_age = s;
JLOG(m_journal.debug())
<< m_name << " target age set to " << m_target_age.count();
}
size() const;
int
getCacheSize() const
{
std::lock_guard lock(m_mutex);
return m_cache_count;
}
getCacheSize() const;
int
getTrackSize() const
{
std::lock_guard lock(m_mutex);
return m_cache.size();
}
getTrackSize() const;
float
getHitRate()
{
std::lock_guard lock(m_mutex);
auto const total = static_cast<float>(m_hits + m_misses);
return m_hits * (100.0f / std::max(1.0f, total));
}
getHitRate();
void
clear()
{
std::lock_guard lock(m_mutex);
m_cache.clear();
m_cache_count = 0;
}
clear();
void
reset()
{
std::lock_guard lock(m_mutex);
m_cache.clear();
m_cache_count = 0;
m_hits = 0;
m_misses = 0;
}
reset();
/** Refresh the last access time on a key if present.
@return `true` If the key was found.
*/
template <class KeyComparable>
bool
touch_if_exists(KeyComparable const& key)
{
std::lock_guard lock(m_mutex);
auto const iter(m_cache.find(key));
if (iter == m_cache.end())
{
++m_stats.misses;
return false;
}
iter->second.touch(m_clock.now());
++m_stats.hits;
return true;
}
touch_if_exists(KeyComparable const& key);
using SweptPointersVector = std::pair<
std::vector<std::shared_ptr<mapped_type>>,
std::vector<std::weak_ptr<mapped_type>>>;
using SweptPointersVector = std::vector<SharedWeakUnionPointerType>;
void
sweep()
{
// Keep references to all the stuff we sweep
// For performance, each worker thread should exit before the swept data
// is destroyed but still within the main cache lock.
std::vector<SweptPointersVector> allStuffToSweep(m_cache.partitions());
clock_type::time_point const now(m_clock.now());
clock_type::time_point when_expire;
auto const start = std::chrono::steady_clock::now();
{
std::lock_guard lock(m_mutex);
if (m_target_size == 0 ||
(static_cast<int>(m_cache.size()) <= m_target_size))
{
when_expire = now - m_target_age;
}
else
{
when_expire =
now - m_target_age * m_target_size / m_cache.size();
clock_type::duration const minimumAge(std::chrono::seconds(1));
if (when_expire > (now - minimumAge))
when_expire = now - minimumAge;
JLOG(m_journal.trace())
<< m_name << " is growing fast " << m_cache.size() << " of "
<< m_target_size << " aging at "
<< (now - when_expire).count() << " of "
<< m_target_age.count();
}
std::vector<std::thread> workers;
workers.reserve(m_cache.partitions());
std::atomic<int> allRemovals = 0;
for (std::size_t p = 0; p < m_cache.partitions(); ++p)
{
workers.push_back(sweepHelper(
when_expire,
now,
m_cache.map()[p],
allStuffToSweep[p],
allRemovals,
lock));
}
for (std::thread& worker : workers)
worker.join();
m_cache_count -= allRemovals;
}
// At this point allStuffToSweep will go out of scope outside the lock
// and decrement the reference count on each strong pointer.
JLOG(m_journal.debug())
<< m_name << " TaggedCache sweep lock duration "
<< std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now() - start)
.count()
<< "ms";
}
sweep();
bool
del(const key_type& key, bool valid)
{
// Remove from cache, if !valid, remove from map too. Returns true if
// removed from cache
std::lock_guard lock(m_mutex);
auto cit = m_cache.find(key);
if (cit == m_cache.end())
return false;
Entry& entry = cit->second;
bool ret = false;
if (entry.isCached())
{
--m_cache_count;
entry.ptr.reset();
ret = true;
}
if (!valid || entry.isExpired())
m_cache.erase(cit);
return ret;
}
del(key_type const& key, bool valid);
public:
/** Replace aliased objects with originals.
Due to concurrency it is possible for two separate objects with
@@ -307,100 +131,23 @@ public:
@return `true` If the key already existed.
*/
public:
template <class R>
bool
canonicalize(
const key_type& key,
std::shared_ptr<T>& data,
std::function<bool(std::shared_ptr<T> const&)>&& replace)
{
// Return canonical value, store if needed, refresh in cache
// Return values: true=we had the data already
std::lock_guard lock(m_mutex);
auto cit = m_cache.find(key);
if (cit == m_cache.end())
{
m_cache.emplace(
std::piecewise_construct,
std::forward_as_tuple(key),
std::forward_as_tuple(m_clock.now(), data));
++m_cache_count;
return false;
}
Entry& entry = cit->second;
entry.touch(m_clock.now());
if (entry.isCached())
{
if (replace(entry.ptr))
{
entry.ptr = data;
entry.weak_ptr = data;
}
else
{
data = entry.ptr;
}
return true;
}
auto cachedData = entry.lock();
if (cachedData)
{
if (replace(entry.ptr))
{
entry.ptr = data;
entry.weak_ptr = data;
}
else
{
entry.ptr = cachedData;
data = cachedData;
}
++m_cache_count;
return true;
}
entry.ptr = data;
entry.weak_ptr = data;
++m_cache_count;
return false;
}
key_type const& key,
SharedPointerType& data,
R&& replaceCallback);
bool
canonicalize_replace_cache(
const key_type& key,
std::shared_ptr<T> const& data)
{
return canonicalize(
key,
const_cast<std::shared_ptr<T>&>(data),
[](std::shared_ptr<T> const&) { return true; });
}
key_type const& key,
SharedPointerType const& data);
bool
canonicalize_replace_client(const key_type& key, std::shared_ptr<T>& data)
{
return canonicalize(
key, data, [](std::shared_ptr<T> const&) { return false; });
}
canonicalize_replace_client(key_type const& key, SharedPointerType& data);
std::shared_ptr<T>
fetch(const key_type& key)
{
std::lock_guard<mutex_type> l(m_mutex);
auto ret = initialFetch(key, l);
if (!ret)
++m_misses;
return ret;
}
SharedPointerType
fetch(key_type const& key);
/** Insert the element into the container.
If the key already exists, nothing happens.
@@ -409,26 +156,11 @@ public:
template <class ReturnType = bool>
auto
insert(key_type const& key, T const& value)
-> std::enable_if_t<!IsKeyCache, ReturnType>
{
auto p = std::make_shared<T>(std::cref(value));
return canonicalize_replace_client(key, p);
}
-> std::enable_if_t<!IsKeyCache, ReturnType>;
template <class ReturnType = bool>
auto
insert(key_type const& key) -> std::enable_if_t<IsKeyCache, ReturnType>
{
std::lock_guard lock(m_mutex);
clock_type::time_point const now(m_clock.now());
auto [it, inserted] = m_cache.emplace(
std::piecewise_construct,
std::forward_as_tuple(key),
std::forward_as_tuple(now));
if (!inserted)
it->second.last_access = now;
return inserted;
}
insert(key_type const& key) -> std::enable_if_t<IsKeyCache, ReturnType>;
// VFALCO NOTE It looks like this returns a copy of the data in
// the output parameter 'data'. This could be expensive.
@@ -436,50 +168,18 @@ public:
// simply return an iterator.
//
bool
retrieve(const key_type& key, T& data)
{
// retrieve the value of the stored data
auto entry = fetch(key);
if (!entry)
return false;
data = *entry;
return true;
}
retrieve(key_type const& key, T& data);
mutex_type&
peekMutex()
{
return m_mutex;
}
peekMutex();
std::vector<key_type>
getKeys() const
{
std::vector<key_type> v;
{
std::lock_guard lock(m_mutex);
v.reserve(m_cache.size());
for (auto const& _ : m_cache)
v.push_back(_.first);
}
return v;
}
getKeys() const;
// CachedSLEs functions.
/** Returns the fraction of cache hits. */
double
rate() const
{
std::lock_guard lock(m_mutex);
auto const tot = m_hits + m_misses;
if (tot == 0)
return 0;
return double(m_hits) / tot;
}
rate() const;
/** Fetch an item from the cache.
If the digest was not found, Handler
@@ -487,73 +187,16 @@ public:
std::shared_ptr<SLE const>(void)
*/
template <class Handler>
std::shared_ptr<T>
fetch(key_type const& digest, Handler const& h)
{
{
std::lock_guard l(m_mutex);
if (auto ret = initialFetch(digest, l))
return ret;
}
auto sle = h();
if (!sle)
return {};
std::lock_guard l(m_mutex);
++m_misses;
auto const [it, inserted] =
m_cache.emplace(digest, Entry(m_clock.now(), std::move(sle)));
if (!inserted)
it->second.touch(m_clock.now());
return it->second.ptr;
}
SharedPointerType
fetch(key_type const& digest, Handler const& h);
// End CachedSLEs functions.
private:
std::shared_ptr<T>
initialFetch(key_type const& key, std::lock_guard<mutex_type> const& l)
{
auto cit = m_cache.find(key);
if (cit == m_cache.end())
return {};
Entry& entry = cit->second;
if (entry.isCached())
{
++m_hits;
entry.touch(m_clock.now());
return entry.ptr;
}
entry.ptr = entry.lock();
if (entry.isCached())
{
// independent of cache size, so not counted as a hit
++m_cache_count;
entry.touch(m_clock.now());
return entry.ptr;
}
m_cache.erase(cit);
return {};
}
SharedPointerType
initialFetch(key_type const& key, std::lock_guard<mutex_type> const& l);
void
collect_metrics()
{
m_stats.size.set(getCacheSize());
{
beast::insight::Gauge::value_type hit_rate(0);
{
std::lock_guard lock(m_mutex);
auto const total(m_hits + m_misses);
if (total != 0)
hit_rate = (m_hits * 100) / total;
}
m_stats.hit_rate.set(hit_rate);
}
}
collect_metrics();
private:
struct Stats
@@ -599,36 +242,37 @@ private:
class ValueEntry
{
public:
std::shared_ptr<mapped_type> ptr;
std::weak_ptr<mapped_type> weak_ptr;
shared_weak_combo_pointer_type ptr;
clock_type::time_point last_access;
ValueEntry(
clock_type::time_point const& last_access_,
std::shared_ptr<mapped_type> const& ptr_)
: ptr(ptr_), weak_ptr(ptr_), last_access(last_access_)
shared_pointer_type const& ptr_)
: ptr(ptr_), last_access(last_access_)
{
}
bool
isWeak() const
{
return ptr == nullptr;
if (!ptr)
return true;
return ptr.isWeak();
}
bool
isCached() const
{
return ptr != nullptr;
return ptr && ptr.isStrong();
}
bool
isExpired() const
{
return weak_ptr.expired();
return ptr.expired();
}
std::shared_ptr<mapped_type>
SharedPointerType
lock()
{
return weak_ptr.lock();
return ptr.lock();
}
void
touch(clock_type::time_point const& now)
@@ -657,72 +301,7 @@ private:
typename KeyValueCacheType::map_type& partition,
SweptPointersVector& stuffToSweep,
std::atomic<int>& allRemovals,
std::lock_guard<std::recursive_mutex> const&)
{
return std::thread([&, this]() {
int cacheRemovals = 0;
int mapRemovals = 0;
// Keep references to all the stuff we sweep
// so that we can destroy them outside the lock.
stuffToSweep.first.reserve(partition.size());
stuffToSweep.second.reserve(partition.size());
{
auto cit = partition.begin();
while (cit != partition.end())
{
if (cit->second.isWeak())
{
// weak
if (cit->second.isExpired())
{
stuffToSweep.second.push_back(
std::move(cit->second.weak_ptr));
++mapRemovals;
cit = partition.erase(cit);
}
else
{
++cit;
}
}
else if (cit->second.last_access <= when_expire)
{
// strong, expired
++cacheRemovals;
if (cit->second.ptr.use_count() == 1)
{
stuffToSweep.first.push_back(
std::move(cit->second.ptr));
++mapRemovals;
cit = partition.erase(cit);
}
else
{
// remains weakly cached
cit->second.ptr.reset();
++cit;
}
}
else
{
// strong, not expired
++cit;
}
}
}
if (mapRemovals || cacheRemovals)
{
JLOG(m_journal.debug())
<< "TaggedCache partition sweep " << m_name
<< ": cache = " << partition.size() << "-" << cacheRemovals
<< ", map-=" << mapRemovals;
}
allRemovals += cacheRemovals;
});
}
std::lock_guard<std::recursive_mutex> const&);
[[nodiscard]] std::thread
sweepHelper(
@@ -731,45 +310,7 @@ private:
typename KeyOnlyCacheType::map_type& partition,
SweptPointersVector&,
std::atomic<int>& allRemovals,
std::lock_guard<std::recursive_mutex> const&)
{
return std::thread([&, this]() {
int cacheRemovals = 0;
int mapRemovals = 0;
// Keep references to all the stuff we sweep
// so that we can destroy them outside the lock.
{
auto cit = partition.begin();
while (cit != partition.end())
{
if (cit->second.last_access > now)
{
cit->second.last_access = now;
++cit;
}
else if (cit->second.last_access <= when_expire)
{
cit = partition.erase(cit);
}
else
{
++cit;
}
}
}
if (mapRemovals || cacheRemovals)
{
JLOG(m_journal.debug())
<< "TaggedCache partition sweep " << m_name
<< ": cache = " << partition.size() << "-" << cacheRemovals
<< ", map-=" << mapRemovals;
}
allRemovals += cacheRemovals;
});
};
std::lock_guard<std::recursive_mutex> const&);
beast::Journal m_journal;
clock_type& m_clock;
@@ -781,10 +322,10 @@ private:
std::string m_name;
// Desired number of cache entries (0 = ignore)
int m_target_size;
int const m_target_size;
// Desired maximum cache age
clock_type::duration m_target_age;
clock_type::duration const m_target_age;
// Number of items cached
int m_cache_count;

File diff suppressed because it is too large Load Diff

View File

@@ -25,6 +25,7 @@
#include <xrpl/beast/hash/hash_append.h>
#include <xrpl/beast/hash/uhash.h>
#include <xrpl/beast/hash/xxhasher.h>
#include <unordered_map>
#include <unordered_set>

View File

@@ -33,12 +33,13 @@
#include <xrpl/basics/strHex.h>
#include <xrpl/beast/utility/Zero.h>
#include <xrpl/beast/utility/instrumentation.h>
#include <boost/endian/conversion.hpp>
#include <boost/functional/hash.hpp>
#include <algorithm>
#include <array>
#include <cstring>
#include <functional>
#include <type_traits>
namespace ripple {
@@ -373,7 +374,7 @@ public:
}
base_uint&
operator^=(const base_uint& b)
operator^=(base_uint const& b)
{
for (int i = 0; i < WIDTH; i++)
data_[i] ^= b.data_[i];
@@ -382,7 +383,7 @@ public:
}
base_uint&
operator&=(const base_uint& b)
operator&=(base_uint const& b)
{
for (int i = 0; i < WIDTH; i++)
data_[i] &= b.data_[i];
@@ -391,7 +392,7 @@ public:
}
base_uint&
operator|=(const base_uint& b)
operator|=(base_uint const& b)
{
for (int i = 0; i < WIDTH; i++)
data_[i] |= b.data_[i];
@@ -414,11 +415,11 @@ public:
return *this;
}
const base_uint
base_uint const
operator++(int)
{
// postfix operator
const base_uint ret = *this;
base_uint const ret = *this;
++(*this);
return ret;
@@ -440,11 +441,11 @@ public:
return *this;
}
const base_uint
base_uint const
operator--(int)
{
// postfix operator
const base_uint ret = *this;
base_uint const ret = *this;
--(*this);
return ret;
@@ -465,7 +466,7 @@ public:
}
base_uint&
operator+=(const base_uint& b)
operator+=(base_uint const& b)
{
std::uint64_t carry = 0;
@@ -510,7 +511,7 @@ public:
}
[[nodiscard]] constexpr bool
parseHex(const char* str)
parseHex(char const* str)
{
return parseHex(std::string_view{str});
}

View File

@@ -20,17 +20,16 @@
#ifndef RIPPLE_BASICS_CHRONO_H_INCLUDED
#define RIPPLE_BASICS_CHRONO_H_INCLUDED
#include <date/date.h>
#include <xrpl/beast/clock/abstract_clock.h>
#include <xrpl/beast/clock/basic_seconds_clock.h>
#include <xrpl/beast/clock/manual_clock.h>
#include <date/date.h>
#include <chrono>
#include <cstdint>
#include <ratio>
#include <string>
#include <type_traits>
namespace ripple {

View File

@@ -43,7 +43,7 @@ struct less
using result_type = bool;
constexpr bool
operator()(const T& left, const T& right) const
operator()(T const& left, T const& right) const
{
return std::less<T>()(left, right);
}
@@ -55,7 +55,7 @@ struct equal_to
using result_type = bool;
constexpr bool
operator()(const T& left, const T& right) const
operator()(T const& left, T const& right) const
{
return std::equal_to<T>()(left, right);
}

Some files were not shown because too many files have changed in this diff Show More