Merge pull request #3664 from XRPLF/fix-release-notes-script

Fix release notes script
2026-07-31 02:50:16 +00:00 · 2026-06-05 10:15:55 -07:00
parent 08c8114da2 c0925e1bb3
commit d4e9c6aeff
2 changed files with 234 additions and 65 deletions
--- a/.claude/skills/generate-release-notes/SKILL.md
+++ b/.claude/skills/generate-release-notes/SKILL.md
@@ -13,7 +13,7 @@ This skill generates a draft release notes blog post for a new rippled version,
 ## Execution constraints

 - **Do NOT write scripts** to sort or process the file. Prefer the Edit tool for targeted changes. Use Write only when replacing large sections that are impractical to edit incrementally.
- **Output progress**: Before each major step (generating raw release notes, reviewing file, processing amendments, sorting entries, reformatting, cleanup), output a brief status message so the user can see progress.
+- **Output progress**: Before each major step (generating raw release notes, reviewing file, verifying potential duplicates, processing amendments, sorting entries, reformatting, cleanup), output a brief status message so the user can see progress.

 ## Step 1: Generate the raw release notes

@@ -36,26 +36,52 @@ The script will:
 ## Step 2: Review the generated file

 Read the output file (path shown in script output). Note the **Full Changelog** structure:
- **Amendments section**: Contains auto-sorted entries and an HTML comment listing which amendments to include or remove
+- **Amendments section**: Contains auto-sorted entries and an HTML comment listing which amendments to include or remove. Entries are auto-sorted here by touching `features.macro`, so the section sometimes catches PRs that aren't actually about a specific amendment (e.g. macro renames, list-sorting, formatting cleanups).
 - **Empty subsections**: Features, Breaking Changes, Bug Fixes, Refactors, Documentation, Testing, CI/Build
 - **Unsorted entries**: After the **Bug Bounties and Responsible Disclosures** section is an unsorted list of entries with title, link, labels, files, and description for context
+- **Potential duplicates**: Some entries are prefixed with `[POTENTIAL DUPE — VERIFY]`. These are reviewer scaffolding and must be handled in Step 3 before sorting.

-## Step 3: Process amendments
+## Step 3: Verify and remove potential duplicate entries
+
+Some entries are prefixed with `[POTENTIAL DUPE — VERIFY]` at the start of their title. They represent changes that already shipped in an earlier release. They are scaffolding for the reviewer only — they must NOT appear in the published release notes.
+
+Their purpose is to help spot cherry-picked commits that already went into an earlier release, but their original commits may falsely show up again as a new change.
+
+For each entry with the `[POTENTIAL DUPE — VERIFY]` prefix:
+
+1. Read its title (the text after the dupe marker).
+2. Scan the other (unmarked) entries for one that describes the same logical change with a slightly different title. Consider:
+  - Typos or missing words (e.g. `overwriting` vs `overwritting`, missing `the`)
+  - Different PR-number suffixes from a backport (e.g. `(#6217)` vs `(#6217) (#6957)`)
+  - Same PR number with reworded title
+  - Same author and same general topic
+  - Use additional context provided in dupe and comparison entry if the title match is borderline
+3. **If a match is found**:
+  - Delete the `[POTENTIAL DUPE — VERIFY]` entry.
+  - **Also delete the matched entry** — it represents a change that already shipped in a previous release, so it does NOT belong in this release's notes.
+4. **If no match is found**:
+  - Delete the `[POTENTIAL DUPE — VERIFY]` entry only.
+
+By the end of this step, **every** `[POTENTIAL DUPE — VERIFY]` entry must be removed.
+
+## Step 4: Process amendments

 Handle Amendments first, before sorting other entries.

-**3a. Process the auto-sorted Amendments subsection:**
-The HTML comment contains three lists — follow them exactly:
- **Include**: Keep these entries.
- **Exclude**: Remove these entries.
- Entries on **neither** list: Remove these entries.
+1. **Process the auto-sorted Amendments subsection**:
+  If an entry's title and description don't introduce, enable, retire, or fix a named amendment, it was auto-sorted here by mistake — move it to the appropriate other section (typically Refactors or CI/Build) instead of applying the rules below.

-**3b. Scan unsorted entries for unreleased amendment work:**
-Search through ALL unsorted entries for titles, labels, descriptions, or files that reference amendments on the "Exclude" or "Other amendments not part of this release" lists. Remove entries that directly implement, enable, fix, or refactor these amendments. Keep entries that are general changes that merely reference the amendment as motivation — if the code change is useful on its own regardless of whether the amendment ships, keep it.
+  The HTML comment contains three lists — follow them exactly:
+  - **Include**: Keep these entries.
+  - **Exclude**: Remove these entries.
+  - Entries on **neither** list: Remove these entries.

-**3c. If you disagree with any amendment decisions, make a note to the user but do NOT deviate from the rules.**
+2. **Scan unsorted entries for unreleased amendment work**:
+  Search through ALL unsorted entries for titles, labels, descriptions, or files that reference amendments on the "Exclude" or "Other amendments not part of this release" lists. Remove entries that directly implement, enable, fix, or refactor these amendments. Keep entries that are general changes that merely reference the amendment as motivation — if the code change is useful on its own regardless of whether the amendment ships, keep it.

-## Step 4: Sort remaining unsorted entries into subsections
+3. If you disagree with any amendment decisions, make a note to the user but do NOT deviate from the rules.
+
+## Step 5: Sort remaining unsorted entries into subsections

 Move each remaining unsorted entry into the appropriate subsection.

@@ -84,7 +110,7 @@ Use these signals to categorize:
 Additional sorting guidance:
 - Watch for revert pairs: If a PR was committed and then reverted (or vice versa), check that the net effect is accounted for — don't include both.

-## Step 5: Reformat sorted entries
+## Step 6: Reformat sorted entries

 After sorting, reformat each entry to match the release notes style.

@@ -107,10 +133,10 @@ After sorting, reformat each entry to match the release notes style.
 ```
 - Copy the PR title as-is. Only fix capitalization, remove conventional commit prefixes (fix:, feat:, ci:, refactor:, docs:, test:, chore:, build:), and adjust to past tense if needed. Do NOT rewrite, paraphrase, or summarize.

-## Step 6: Clean up
+## Step 7: Clean up

 - Add a short and generic description of changes to the existing `seo.description` frontmatter, e.g., "This version introduces new amendments and bug fixes." Do not create long lists of detailed changes.
- Add a more detailed summary of the release to the existing "Introducing XRP Ledger Version X.Y.Z" section. Include amendment names (organized in a list if more than 2), featuress, and breaking changes. Limit this to 1 paragraph.
+- Add a more detailed summary of the release to the existing "Introducing XRP Ledger version X.Y.Z" section. Include amendment names (organized in a list if more than 2), features, and breaking changes. Limit this to 1 paragraph.
 - Do NOT delete the **Credits** or **Bug Bounties and Responsible Disclosures** sections
 - Remove empty subsections that have no entries
 - Remove all HTML comments (sorting instructions)
--- a/tools/generate-release-notes.py
+++ b/tools/generate-release-notes.py
@@ -23,26 +23,8 @@ import sys
 from datetime import date, datetime


-# Emails to exclude from credits (Ripple employees not using @ripple.com).
-# Commits from @ripple.com addresses are already filtered automatically.
-EXCLUDED_EMAILS = {
-    "3maisons@gmail.com",                                   # Luc des Trois Maisons
-    "a1q123456@users.noreply.github.com",                   # Jingchen Wu
-    "bthomee@users.noreply.github.com",                     # Bart Thomee
-    "21219765+ckeshava@users.noreply.github.com",           # Chenna Keshava B S
-    "gregtatcam@users.noreply.github.com",                  # Gregory Tsipenyuk
-    "kuzzz99@gmail.com",                                    # Sergey Kuznetsov
-    "legleux@users.noreply.github.com",                     # Michael Legleux
-    "mathbunnyru@users.noreply.github.com",                 # Ayaz Salikhov
-    "mvadari@gmail.com",                                    # Mayukha Vadari
-    "115580134+oleks-rip@users.noreply.github.com",         # Oleksandr Pidskopnyi
-    "3397372+pratikmankawde@users.noreply.github.com",      # Pratik Mankawde
-    "35279399+shawnxie999@users.noreply.github.com",        # Shawn Xie
-    "5780819+Tapanito@users.noreply.github.com",            # Vito Tumas
-    "13349202+vlntb@users.noreply.github.com",              # Valentin Balaschenko
-    "129996061+vvysokikh1@users.noreply.github.com",        # Vladislav Vysokikh
-    "vvysokikh@gmail.com",                                  # Vladislav Vysokikh
-}
+# Repo root, so paths resolve correctly regardless of where the script is invoked from.
+REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))


 # Pre-compiled patterns for skipping version commits
@@ -54,6 +36,16 @@ SKIP_PATTERNS = [
 ]


+# Patterns for normalizing commit titles when detecting cherry-pick duplicates.
+# Strips trailing "(#NNNN)" PR-number suffixes and conventional-commit prefixes.
+PR_NUM_RE = re.compile(r"\s*\(#\d+\)")
+CONV_COMMIT_RE = re.compile(
+    r"^(fix|feat|refactor|chore|docs|test|tests|ci|build|style|perf|revert|release|bugfix)"
+    r"(\([^)]*\))?\s*:\s*",
+    re.IGNORECASE,
+)
+
+
 # --- API helpers ---

 def run_gh_rest(endpoint):
@@ -164,19 +156,54 @@ def fetch_version_info(ref):


 def fetch_commits(from_ref, to_ref):
-    """Fetch all commits between two refs using the GitHub compare API."""
-    commits = []
-    page = 1
-    while True:
-        data = run_gh_rest(
-            f"repos/XRPLF/rippled/compare/{from_ref}...{to_ref}?per_page=250&page={page}"
-        )
-        batch = data.get("commits", [])
-        commits.extend(batch)
-        if len(batch) < 250:
-            break
-        page += 1
-    return commits
+    """Fetch commits between two refs, filtering out incoming cherry-pick duplicates."""
+
+    def key(c):
+        t = c["commit"]["message"].split("\n")[0]
+        t = PR_NUM_RE.sub("", t)
+        t = CONV_COMMIT_RE.sub("", t)
+        return re.sub(r"\s+", " ", t).strip().lower()
+
+    def paginate(base, head):
+        results, page = [], 1
+        while True:
+            data = run_gh_rest(
+                f"repos/XRPLF/rippled/compare/{base}...{head}?per_page=250&page={page}"
+            )
+            batch = data.get("commits", [])
+            results.extend(batch)
+            if len(batch) < 250:
+                break
+            page += 1
+        return results
+
+    incoming = paginate(from_ref, to_ref)
+    shipped = paginate(to_ref, from_ref)
+    incoming_keys = {key(c) for c in incoming}
+    shipped_keys = {key(c) for c in shipped}
+
+    before = len(incoming)
+    deduped = [c for c in incoming if key(c) not in shipped_keys]
+    dropped = before - len(deduped)
+    if dropped:
+        print(f"  Filtered {dropped} cherry-pick duplicates.")
+
+    # Surface backward-diff commits with no forward-diff match. These are
+    # either real release-branch originals or cherry-pick dupes that drifted
+    # enough to escape matching.
+    unmatched = [
+        c for c in shipped
+        if key(c) not in incoming_keys
+        and not should_skip(c["commit"]["message"].split("\n")[0])
+    ]
+    for c in unmatched:
+        c["_potential_dupe"] = True
+    if unmatched:
+        print(f"  Adding {len(unmatched)} unmatched {from_ref} commit(s) to draft "
+              f"flagged as [POTENTIAL DUPE — VERIFY].")
+    deduped.extend(unmatched)
+
+    return deduped


 def parse_features_macro(text):
@@ -332,6 +359,39 @@ def fetch_prs_graphql(pr_numbers):
    return results


+def filter_to_ripple_members(logins, org="ripple"):
+    """Return the subset of `logins` that are members of ripple, batched via GraphQL.
+
+    Uses the authenticated viewer's privileges, so this part of the script probably won't
+    work for non-Ripple org members. In this case most Ripple org members will show up
+    in the credits section.
+    """
+    if not logins:
+        return set()
+
+    members = set()
+    batch_size = 50
+    target = org.lower()
+    logins = list(logins)
+
+    for i in range(0, len(logins), batch_size):
+        batch = logins[i:i + batch_size]
+        fragments = [
+            f'u{idx}: user(login: "{l}") {{ organizations(first: 20) {{ nodes {{ login }} }} }}'
+            for idx, l in enumerate(batch)
+        ]
+        data = run_gh_graphql("{ " + " ".join(fragments) + " }")
+        nodes = data.get("data") or {}
+        for idx, l in enumerate(batch):
+            user = nodes.get(f"u{idx}")
+            if not user:
+                continue
+            orgs = {n["login"].lower() for n in user.get("organizations", {}).get("nodes", [])}
+            if target in orgs:
+                members.add(l)
+    return members
+
+
 # --- Utilities ---

 def clean_pr_body(text):
@@ -531,7 +591,11 @@ def main():
    print(f"Version: {version}")

    year = args.date.split("-")[0]
-    output_path = args.output or f"blog/{year}/rippled-{version}.md"
+    # Resolve --output relative to REPO_ROOT (not CWD). Absolute paths pass through unchanged.
+    if args.output:
+        output_path = args.output if os.path.isabs(args.output) else os.path.join(REPO_ROOT, args.output)
+    else:
+        output_path = os.path.join(REPO_ROOT, "blog", year, f"rippled-{version}.md")

    print(f"Fetching commits: {args.from_ref}...{args.to_ref}")
    commits = fetch_commits(args.from_ref, args.to_ref)
@@ -542,7 +606,16 @@ def main():
    pr_shas = {}       # PR/issue number → commit SHA (for file lookups on Issues)
    pr_bodies = {}     # PR/issue number → commit body (for fallback descriptions)
    orphan_commits = []  # Commits with no PR/Issues link
-    authors = set()
+    # Potential dupe commits are kept in their own parallel buckets so they
+    # don't collide with real entries by PR number. They go through the same
+    # PR-enrichment pipeline to give reviewers full side-by-side context.
+    dupe_pr_numbers = {}
+    dupe_pr_shas = {}
+    dupe_pr_bodies = {}
+    dupe_orphan_commits = []
+    # Contributors are collected here and filtered against the Ripple org
+    contributor_logins = set()
+    contributors_without_login = set()

    for commit in commits:
        full_message = commit["commit"]["message"]
@@ -550,20 +623,30 @@ def main():
        body = "\n".join(full_message.split("\n")[1:]).strip()
        sha = commit["sha"]
        author = commit["commit"]["author"]["name"]
-        email = commit["commit"]["author"].get("email", "")

-        # Skip Ripple employees from credits
-        login = (commit.get("author") or {}).get("login")
-        if not email.lower().endswith("@ripple.com") and email not in EXCLUDED_EMAILS:
-            if login:
-                authors.add(f"@{login}")
-            else:
-                authors.add(author)
+        # Collect contributors for the credits section. Dupe commits and bots are skipped.
+        if not commit.get("_potential_dupe"):
+            github_user = commit.get("author") or {}
+            if github_user.get("type") != "Bot":
+                login = github_user.get("login")
+                if login:
+                    contributor_logins.add(login)
+                else:
+                    contributors_without_login.add(author)

        if should_skip(message):
            continue

        pr_number = extract_pr_number(message)
+        if commit.get("_potential_dupe"):
+            if pr_number:
+                dupe_pr_numbers[pr_number] = message
+                dupe_pr_shas[pr_number] = sha
+                dupe_pr_bodies[pr_number] = body
+            else:
+                dupe_orphan_commits.append({"sha": sha, "message": message, "body": body})
+            continue
+
        if pr_number:
            pr_numbers[pr_number] = message
            pr_shas[pr_number] = sha
@@ -586,12 +669,14 @@ def main():

    print(f"Building changelog entries...")

-    # Fetch all PR details in batches via GraphQL
-    pr_details = fetch_prs_graphql(list(pr_numbers.keys()))
+    # Fetch all PR details in batches via GraphQL.
+    all_pr_numbers = list(set(pr_numbers.keys()) | set(dupe_pr_numbers.keys()))
+    pr_details = fetch_prs_graphql(all_pr_numbers)

    # Build entries, sorting amendments automatically
    amendment_entries = []
    entries = []
+    DUPE_MARKER = "[POTENTIAL DUPE — VERIFY]"
    for pr_number, commit_msg in pr_numbers.items():
        pr_data = pr_details.get(pr_number)

@@ -638,6 +723,59 @@ def main():
            entry = format_commit_entry(sha, orphan["message"], orphan["body"], files)
            entries.append(entry)

+    # Build entries for potential dupes
+    for pr_number, commit_msg in dupe_pr_numbers.items():
+        sha = dupe_pr_shas[pr_number]
+        pr_data = pr_details.get(pr_number)
+        print(f"  Building potential-dupe entry for #{pr_number} ({sha[:7]})...")
+
+        if pr_data:
+            title = f"{DUPE_MARKER} {pr_data['title']}"
+            body = pr_data.get("body", "")
+            labels = pr_data.get("labels", [])
+            files = pr_data.get("files", [])
+            link_type = pr_data.get("type", "pull")
+            if not files:
+                files = fetch_commit_files(sha)
+            if is_amendment(files) and amendment_diff:
+                entry = format_uncategorized_entry(pr_number, title, labels, body, link_type=link_type)
+                amendment_entries.append(entry)
+            else:
+                entry = format_uncategorized_entry(pr_number, title, labels, body, files, link_type)
+                entries.append(entry)
+        else:
+            # PR/Issue lookup failed — fall back to commit-only entry
+            files = fetch_commit_files(sha)
+            title = f"{DUPE_MARKER} {commit_msg}"
+            if is_amendment(files) and amendment_diff:
+                entry = format_commit_entry(sha, title, dupe_pr_bodies[pr_number])
+                amendment_entries.append(entry)
+            else:
+                entry = format_commit_entry(sha, title, dupe_pr_bodies[pr_number], files)
+                entries.append(entry)
+
+    # Potential dupe orphans (no PR link at all)
+    for orphan in dupe_orphan_commits:
+        sha = orphan["sha"]
+        print(f"  Building potential-dupe orphan entry for {sha[:7]}...")
+        files = fetch_commit_files(sha)
+        title = f"{DUPE_MARKER} {orphan['message']}"
+        if is_amendment(files) and amendment_diff:
+            entry = format_commit_entry(sha, title, orphan["body"])
+            amendment_entries.append(entry)
+        else:
+            entry = format_commit_entry(sha, title, orphan["body"], files)
+            entries.append(entry)
+
+    # Build the credits list.
+    print(f"Checking Ripple org membership for {len(contributor_logins)} contributor login(s)...")
+    ripple_members = filter_to_ripple_members(contributor_logins)
+    authors = set()
+    for login in contributor_logins:
+        if login not in ripple_members:
+            authors.add(f"@{login}")
+    authors |= contributors_without_login
+
    # Generate markdown
    markdown = generate_markdown(version, args.date, amendment_diff, amendment_unchanged, amendment_entries, entries, authors, version_commit)

@@ -648,11 +786,16 @@ def main():

    print(f"\nRelease notes written to: {output_path}")

-    # Update blog/sidebars.yaml
-    sidebars_path = "blog/sidebars.yaml"
-    # Derive sidebar path and year from actual output path
-    relative_path = output_path.removeprefix("blog/")
-    sidebar_year = relative_path.split("/")[0]
+    # Update blog/sidebars.yaml only if the output actually lives under blog/.
+    # Custom --output paths outside blog/ are skipped.
+    sidebars_path = os.path.join(REPO_ROOT, "blog", "sidebars.yaml")
+    blog_dir = os.path.join(REPO_ROOT, "blog")
+    abs_output = os.path.abspath(output_path)
+    if not abs_output.startswith(blog_dir + os.sep):
+        print(f"Output {output_path} is outside {blog_dir} — skipping sidebar update.")
+        return
+    relative_path = os.path.relpath(abs_output, blog_dir)
+    sidebar_year = relative_path.split(os.sep)[0]
    new_entry = f"        - page: {relative_path}"
    try:
        with open(sidebars_path, "r") as f: