rename parse_pages to Dactyl; add experimental style checking

2025-11-04 11:55:50 +00:00 · 2016-03-17 02:53:54 -07:00
parent dc808f32f4
commit 6b5297c2e0
14 changed files with 614 additions and 156 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,4 @@
 *~
 .DS_Store
 content/code_samples/*/node_modules/
+__pycache__
--- a/README.md
+++ b/README.md
@@ -9,14 +9,25 @@ Repository Layout

 The HTML pages in this portal are generated from the markdown files in the [content/](content/) folder. Always edit the markdown files, not the HTML files.

-The [tool/](tool/) folder contains tools and templates for generating the HTML files in the top level. The `parse_pages.py` script (requires Python 3, [Jinja 2](http://jinja.pocoo.org/), and various pip modules) uses the templates and the a YAML config file to generate HTML output. In general, `parse_pages.py` assumes you are running it with `tool/` as the current working directory, and the default config uses the top level of this repository as the output directory.
+The [tool/](tool/) folder contains a tool, called **Dactyl**, for generating the HTML files in the top level. The `dactyl_build.py` script uses the templates and the a YAML config file to generate HTML output. The `dactyl_link_checker.py` script checks the generated HTML content for broken links. The `dactyl_style_checker.py` script (experimental) checks the content for style guide violations.
+
+In general, Dactyl assumes you are running it with `tool/` as the current working directory, and the included config uses the top level of this repository as the output directory.
+
+Dactyl Setup
+------------
+
+Dactyl uses Python 3 and a number of modules. First, make sure you have Python 3 installed in your local operating system, then use [PIP](https://pip.pypa.io/en/stable/) to install the dependencies:
+
+`pip3 install -r tool/packages.txt`
+

 Building
 --------

-The default configuration is [`devportal-config.yml`](tool/devportal-config.yml). You can specify an alternate config file with the `-c` or `--config` parameter:
+The default configuration is [`dactyl-config.yml`](tool/dactyl-config.yml). You can specify an alternate config file with the `-c` or `--config` parameter:

-`./parse_pages.py -c alt-config.yml`
+`./dactyl_build.py -c alt-config.yml`
+`./dactyl_link_checker.py -c alt-config.yml`

 This script does the following:

@@ -26,12 +37,7 @@ This script does the following:
 4. For all pages that have a URL as the `md` parameter, it fetches the file via HTTP(S). No pre-processing occurs on such contents.
 5. For all the retrieved and preprocessed markdown files, it parses them using Python's markdown library, with extras enabled to approximate GitHub-flavored markdown.
 6. For each page, it parses the **template** configured for the page using Jinja2, falling back to the **default_template** (or **pdf_template**). If it produced HTML from a Markdown source, it passes that HTML as the `content` parameter to the template. It also passes in several other arguments from the config file, including definition of the current page as the `currentpage` parameter.
-7. It applies several post-processing steps on the generated HTML, including:
-  * Automatic link substitution by target
-  * The removal of any content between certain DOCTOC comments
-  * Un-comments any &lt;div class="multicode"&gt; and &lt;/div&gt; tags (if building for a target with `multicode_tabs: true` configured).
-  * Modifies the `id` fields of generated header (&lt;h#&gt;) elements to use dashes instead of underscores. (This is for compatibility with previously-used doc tools.)
-  * Adds the `button` class to links whose text ends in &gt;
+7. It applies several post-processing steps on the generated HTML. Additional [filters](#filters) can be configured as plugins to run on a per-page or per-target basis. Dactyl always performs link substitution by target.
 8. If building for PDF: It outputs the resulting HTML files to the configured **temporary_files_path**, using filenames specified in the **html** parameter of each page in the config file. It also copies the **template_static_path** and **content_static_path** folders to the temporary folder. Then it runs [Prince](http://www.princexml.com/) to generate a PDF. It writes the generated PDF to the **out_path**.
 9. Otherwise: It outputs the resulting HTML files to the **out_path**. This does not generate a working copy of the site unless the necessary static files are also available at the **out_path**. (This is true by default, since the default output directory is the top level of the repository.) You can have it copy the configured **template_static_path** (containing files referenced by the templates) and the **content_static_path** (containing files referenced by the content) to the output directory using the `--copy_static` or `-s` parameter.

@@ -41,23 +47,43 @@ Githubify Mode

 Alternate usage: `-g` produces a GitHub-flavored Markdown version of a single file. This parameter takes one argument, which should be a markdown file in the **content_path**. The tool uses Jinja2 to "preprocess" the file, as above, but stops after assembling GitHub-flavored Markdown and writes the output to the same filename in the **out_path**.

-**Note:** The tool never erases files from the **out_path**. You may want to do that yourself, especially if you remove files.
+**Note:** The tool never erases files from the **out_path** or the **temporary_files_path**. You may want to do that yourself, especially if you remove files from your config or rename them.


 Multiple Targets
 ----------------

-You can define multiple **targets** in the config file with arbitrary key-value parameters. The two parameters that the tool cares about by default are **name** (used to identify the target on the commandline and in the pages section of the config) and **multicode_tabs** (which determines whether or not to uncomment certain divs).
+You can define multiple **targets** in the config file with arbitrary key-value parameters. The two parameters that the tool cares about by default are **name** (used to identify the target on the commandline and in the pages section of the config) and **filters** (which lists filter plugins to apply, if provided).

 By default, the tool builds the first target in the list. Every page in the `pages` array is included in every target unless the page definition includes an explicit list of **targets** to build. (Each member in the list should be the **name** of a target.)

 The tool can perform automatic substitution of links in the resulting HTML (or Markdown, when using [githubify](#githubify-mode)). For each parameter in the page definition that matches the name of a target, it replaces links to the `html` file with the contents of the target-name-parameter. Anchors from the original link carry over. This allows you to link to other pages using the filenames from the local version of the site, but replace them with different links for a live site. (As seen in the default configuration, Ripple.com has very different URLs for many pages.)

+Filters
+-------
+
+Dactyl can apply various filters on document content, which is useful for handling compatibility with alternate Markdown flavors, among other things. The **filters** option, at the target or page level of the config, should contain an array of filters to apply. (Filters at the target level apply to all pages in that target; filters at the page level apply to that page only.) Each filter is implemented by a Python script in the Dactyl directory named `filter_{filter_name}.py`, based on the {filter_name} specified in the config.
+
+Filters can apply at any or all of three steps: Raw markdown, raw HTML, or BeautifulSoup, as follows:
+
+Raw markdown filters implement a `filter_markdown(md)` function, which inputs and outputs a string of markdown text.
+
+Raw HTML filters implement a `filter_html(html)` function, which inputs and outputs a string of HTML text.
+
+BeautifulSoup filters implement a `filter_soup(soup)` method, which takes a BeautifulSoup4 representation of the parsed HTML content as input. Because the input is passed by reference, the function modifies the soup in place instead of returning it.
+
+Dactyl comes with the following filters:
+
+  * `remove_doctoc` - Remove DOCTOC-generated tables of contents
+  * `multicode_tabs` - Un-comment &lt;div class="multicode"&gt; and &lt;/div&gt; tags
+  * `standardize_header_ids` - Modify the `id` fields of generated header (&lt;h#&gt;) elements to use dashes instead of underscores. (This is for compatibility with previously-used doc tools.)
+  * `buttonize` - Adds the `button` class to links whose text ends in &gt;
+  * `markdown_in_divs` - Automatically add the `markdown="1"` element to &lt;div&gt; elements so that their contents get parsed as Markdown. (No longer used by the Dev Portal, but useful for compatibility with Markdown flavors that do this automatically.)
+

 Contributing
 ------------

-The Developer Portal welcomes outside contributions, especially to the documentation contents. If you have any corrections, improvements, or expansions of the portal, please contribute pull requests to the **gh-pages** branch.
+The Developer Portal welcomes outside contributions, especially to the documentation contents. If you have any corrections, improvements, or expansions of the portal, please contribute pull requests to the **master** branch.

 Contributions become copyright Ripple and are provided under the MIT [LICENSE](LICENSE).
-
--- a/circle.yml
+++ b/circle.yml
@@ -4,8 +4,8 @@ dependencies:
  pre:
    - pyenv global 3.4.0
  override:
-    - pip3 install jinja2 requests beautifulsoup4 markdown watchdog
+    - pip3 install -r packages.txt
 test:
  override:
-    - ./parse_pages.py
-    - ./check_links.py
+    - ./dactyl_build.py
+    - ./dactyl_link_checker.py
--- a/tool/devportal-config.yml
+++ b/tool/devportal-config.yml
@@ -25,9 +25,15 @@ prince_executable: prince
 targets:
    # First member is the default that gets built when target not specified
    -   name: local
-        multicode_tabs: true
+        filters:
+            - multicode_tabs
+            - standardize_header_ids
+            - buttonize
    -   name: ripple.com
-        multicode_tabs: true
+        filters:
+            - multicode_tabs
+            - standardize_header_ids
+            - buttonize

 pages:
 # Intro pages is not directly replicated on ripple.com at this time
@@ -64,6 +70,8 @@ pages:
        # Currently this is the only page that's fetched remotely.
        md: https://raw.githubusercontent.com/ripple/ripple-lib/0.16.7/docs/index.md
        ripple.com: https://ripple.com/build/rippleapi/
+        filters:
+            - remove_doctoc
        sidebar: true

    -   name: Ripple Data API v2
@@ -214,4 +222,3 @@ known_broken_links:
    - https://validators.ripple.com
    # Zendesk hasn't updated the cert. See ticket IN-1168
    - https://support.ripplelabs.com/hc/en-us/categories/200194196-Set-Up-Activation
-    
--- a/tool/dactyl_build.py
+++ b/tool/dactyl_build.py
@@ -1,12 +1,13 @@
 #!/usr/bin/env python3

 ################################################################################
-# ripple-dev-portal doc parser
+# Dactyl - a tool for heroic epics of documentation
 #
-# Generate the html for all the Ripple Dev  Portal files from a template
+# Generates a website from Markdown and Jinja templates, with filtering
+# along the way.
 ################################################################################

-DEFAULT_CONFIG_FILE = "devportal-config.yml"
+DEFAULT_CONFIG_FILE = "dactyl-config.yml"

 import os
 import re
@@ -17,6 +18,9 @@ import logging
 # Necessary to copy static files to the output dir
 from distutils.dir_util import copy_tree

+# Used to import filters.
+from importlib import import_module
+
 # Necessary for prince
 import subprocess

@@ -36,10 +40,10 @@ from watchdog.events import PatternMatchingEventHandler
 # The log level is configurable at runtime (see __main__ below)
 logger = logging.getLogger()

-
+filters = {}
 def load_config(config_file=DEFAULT_CONFIG_FILE):
    """Reload config from a YAML file."""
-    global config
+    global config, filters
    logger.info("loading config file %s..." % config_file)
    with open(config_file, "r") as f:
        config = yaml.load(f)
@@ -55,73 +59,23 @@ def load_config(config_file=DEFAULT_CONFIG_FILE):
        if "prince_executable" not in config or not config["prince_executable"]:
            config["prince_executable"] = "prince" # A reasonable default

+        # Figure out which filters we need and import them
+        filternames = set()
+        for target in config["targets"]:
+            if "filters" in target:
+                filternames.update(target["filters"])
+        for page in config["pages"]:
+            if "filters" in page:
+                filternames.update(page["filters"])
+        for filter_name in filternames:
+            filters[filter_name] = import_module("filter_"+filter_name)

-def remove_doctoc(md):
-    """Strip out doctoc Table of Contents for RippleAPI"""
-    DOCTOC_START = "<!-- START doctoc generated TOC please keep comment here to allow auto update -->"
-    DOCTOC_END = "<!-- END doctoc generated TOC please keep comment here to allow auto update -->"
-    
-    doctoc_start_i = md.find(DOCTOC_START)
-    doctoc_end_i = md.find(DOCTOC_END)
-    if doctoc_start_i != -1 and doctoc_end_i != -1:
-        logger.info("... stripping doctoc...")
-        md = md[:doctoc_start_i]+md[doctoc_end_i+len(DOCTOC_END):]
-    return md
-
-
-def enable_multicode(html):
-    """Uncomment multicode tab divs"""
-    MC_START_REGEX = re.compile("<!-- *<div class='multicode'[^>]*> *-->")
-    MC_END_REGEX = re.compile("<!-- *</div> *-->")
-    
-    logger.info("... enabling multicode tabs...")
-    
-    html = re.sub(MC_START_REGEX, "<div class='multicode'>", html)
-    html = re.sub(MC_END_REGEX, "</div>", html)
-    return html
-
-
-def standardize_header_ids(soup):
-    """replace underscores with dashes in h1,h2,etc. for backwards compatibility"""
-    logger.info("... standardizing headers...")
-    headers = soup.find_all(name=re.compile("h[0-9]"), id=True)
-    for h in headers:
-        if "_" in h["id"]:
-            h["id"] = h["id"].replace("_", "-")
-
-
-def buttonize_try_it(soup):
-    """make links ending in > render like buttons"""
-    logger.info("... buttonizing try-it links...")
-    buttonlinks = soup.find_all("a", string=re.compile(">$"))
-    for link in buttonlinks:
-        if "class" in link.attrs:
-            link["class"].append("button")
-        else:
-            link["class"] = "button"
-
-
-def markdown_in_div_elements(md):
-    """Python markdown requires markdown="1" on HTML block elements
-       that contain markdown. AND there's a bug where if you use
-       markdown.extensions.extra, it replaces code fences in HTML
-       block elements with garbled text."""
-    def add_markdown_class(m):
-        if m.group(0).find("markdown=") == -1:
-            return m.group(1) + ' markdown="1">'
-        else:
-            return m.group(0)
-
-    logger.info("... adding markdown class to embedded divs...")
-    md = re.sub(r"(<div[^>]*)>", add_markdown_class, md)
-    return md
-    

 def substitute_links_for_target(soup, target):
    """Replaces local-html-links with appropriate substitutions
       for the given target"""
    target = get_target(target)
-    
+
    logger.info("... modifying links for target: %s" % target["name"])
    # We actually want to get all pages, even the ones that aren't built as
    # part of this target, in case those pages have replacement links.
@@ -140,49 +94,71 @@ def substitute_links_for_target(soup, target):

 def get_target(target):
    """Get a target by name, or return the default target object.
-       We can't use default args in function defs because the default is 
+       We can't use default args in function defs because the default is
       set at runtime based on config"""
    if target == None:
        return config["targets"][0]
-    
+
    if type(target) == str:
        try:
            return next(t for t in config["targets"] if t["name"] == target)
        except StopIteration:
            logger.critical("Unknown target: %s" % target)
            exit(1)
-    
+
    if "name" in target:
        # Eh, it's probably a target, just return it
        return target

-def parse_markdown(md, target=None, pages=None):
+def get_filters_for_page(page, target=None):
+    ffp = set()
+    target = get_target(target)
+    if "filters" in target:
+        ffp.update(target["filters"])
+    if "filters" in page:
+        ffp.update(page["filters"])
+    return ffp
+
+def parse_markdown(page, target=None, pages=None):
    """Take a markdown string and output HTML for that content"""
    target = get_target(target)
+    logging.info("Preparing page %s" % page["name"])

-    # Mostly unnecessary as long as the multicode divs are commented out
-##    markdown_in_div_elements(md)
+    # Preprocess Markdown using this Jinja environment
+    pp_env = setup_pp_env()

-    # RippleAPI doc file has an extra table-of-contents at the start
-    md = remove_doctoc(md)
+    # We'll apply these filters to the page
+    page_filters = get_filters_for_page(page, target)
+
+    md = get_markdown_for_page(page["md"], pp_env=pp_env, target=target)
+
+    # Apply markdown-based filters here
+    for filter_name in page_filters:
+        if "filter_markdown" in dir(filters[filter_name]):
+            logging.info("... applying markdown filter %s" % filter_name)
+            md = filters[filter_name].filter_markdown(md)

    # Actually parse the markdown
    logger.info("... parsing markdown...")
    html = markdown(md, extensions=["markdown.extensions.extra",
                                    "markdown.extensions.toc"])

-    # If target uses multicode tabs, uncomment the divs now
-    if "multicode_tabs" in target and target["multicode_tabs"]:
-        html = enable_multicode(html)
+    # Apply raw-HTML-string-based filters here
+    for filter_name in page_filters:
+        if "filter_html" in dir(filters[filter_name]):
+            logging.info("... applying HTML filter %s" % filter_name)
+            html = filters[filter_name].filter_html(html)

-    # At this point, HTML manipulations are easier on a soup than a string
+    # Some filters would rather operate on a soup than a string.
+    # May as well parse once and re-serialize once.
    soup = BeautifulSoup(html, "html.parser")
-    
-    # Mostly necessary for compatibility with legacy content
-    standardize_header_ids(soup)

-    #buttonize links ending in >
-    buttonize_try_it(soup)
+    # Apply soup-based filters here
+    for filter_name in page_filters:
+        if "filter_soup" in dir(filters[filter_name]):
+            logging.info("... applying soup filter %s" % filter_name)
+            filters[filter_name].filter_soup(soup)
+            # ^ the soup filters apply to the same object, passed by reference

    # Replace links for any non-default target
    if target["name"] != config["targets"][0]["name"]:
@@ -197,7 +173,7 @@ def githubify_markdown(md, target=None, pages=None):
    """Github-friendly markdown has absolute links, no md in divs"""
    MARKDOWN_LINK_REGEX = re.compile(
        r"(\[([^\]]+)\]\(([^:)]+)\)|\[([^\]]+)\]:\s*(\S+)$)", re.MULTILINE)
-    
+
    target = get_target(target)
    if not pages:
        pages = get_pages(target["name"])
@@ -262,7 +238,7 @@ def get_pages(target=None):


 def get_categories(pages):
-    """Produce an ordered, de-duplicated list of categories from 
+    """Produce an ordered, de-duplicated list of categories from
       the page list"""
    categories = []
    for page in pages:
@@ -304,17 +280,17 @@ def copy_static_files(template_static=True, content_static=True, out_path=None):
    """Copy static files to the output directory."""
    if out_path == None:
        out_path = config["out_path"]
-    
-    
+
+
    if template_static:
        template_static_src = config["template_static_path"]
-        template_static_dst = os.path.join(out_path, 
+        template_static_dst = os.path.join(out_path,
                                       os.path.basename(template_static_src))
        copy_tree(template_static_src, template_static_dst)
-    
+
    if content_static:
        content_static_src = config["content_static_path"]
-        content_static_dst = os.path.join(out_path, 
+        content_static_dst = os.path.join(out_path,
                                           os.path.basename(content_static_src))
        copy_tree(content_static_src, content_static_dst)

@@ -336,12 +312,9 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
    pages = get_pages(target)
    categories = get_categories(pages)

-    # Preprocess Markdown using this Jinja environment
-    pp_env = setup_pp_env()
-
    # Insert generated HTML into templates using this Jinja environment
    env = setup_html_env()
-    
+
    if for_pdf:
        logging.info("reading pdf template...")
        default_template = env.get_template(config["pdf_template"])
@@ -352,11 +325,11 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
    for currentpage in pages:
        if "md" in currentpage:
            # Read and parse the markdown
-            
+
            try:
-                md_content = get_markdown_for_page(currentpage["md"],
-                                                   pp_env=pp_env,
-                                                   target=target)
+                html_content = parse_markdown(currentpage, target=target,
+                                              pages=pages)
+
            except Exception as e:
                if bypass_errors:
                    logging.warning( ("Skipping page %s " +
@@ -366,22 +339,20 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
                else:
                    exit("Error when fetching page %s: %s" %
                         (currentpage["name"], e) )
-            
-            html_content = parse_markdown(md_content, target, pages)
        else:
            html_content = ""
-        
+
        if "template" in currentpage:
            # Use a template other than the default one
            template = env.get_template(currentpage["template"])
-            
+
            #do link substitution for "doc_page" param
            if "doc_page" in currentpage:
-                doc_page = next(p for p in pages 
+                doc_page = next(p for p in pages
                    if p["html"] == currentpage["doc_page"])
                if target["name"] in doc_page:
                    currentpage["doc_page"] = doc_page[target["name"]]
-            
+
            out_html = template.render(currentpage=currentpage,
                                       categories=categories,
                                       pages=pages,
@@ -393,13 +364,13 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
                                               pages=pages,
                                               content=html_content,
                                               target=target)
-        
+
        # Experimental: replace links in full HTML, not just content
        soup = BeautifulSoup(out_html, "html.parser")
        if target["name"] != config["targets"][0]["name"]:
            substitute_links_for_target(soup, target)
        out_html = str(soup)
-        
+
        if for_pdf:
            out_path = config["temporary_files_path"]
        else:
@@ -414,7 +385,7 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):


 def watch(pdf_file, target):
-    """Look for changed files and re-generate HTML (and optionally 
+    """Look for changed files and re-generate HTML (and optionally
       PDF whenever there's an update. Runs until interrupted."""
    target = get_target(target)

@@ -456,21 +427,21 @@ def make_pdf(outfile, target=None, bypass_errors=False):
    render_pages(target=target, for_pdf=True, bypass_errors=bypass_errors)

    temp_files_path = config["temporary_files_path"]
-    
+
    # Prince will need the static files, so copy them over
    copy_static_files(out_path=temp_files_path)

-    # Make sure the path we're going to write the PDF to exists    
+    # Make sure the path we're going to write the PDF to exists
    if not os.path.isdir(config["out_path"]):
        logging.info("creating build folder %s" % config["out_path"])
        os.makedirs(config["out_path"])
-    
+
    # Start preparing the prince command
    args = [config["prince_executable"], '-o', outfile]
    # Each HTML output file in the target is another arg to prince
    pages = get_pages(target)
    args += [os.path.join(temp_files_path, p["html"]) for p in pages]
-    
+
    logger.info("generating PDF: running %s..." % " ".join(args))
    prince_resp = subprocess.check_output(args, universal_newlines=True)
    print(prince_resp)
@@ -479,7 +450,7 @@ def make_pdf(outfile, target=None, bypass_errors=False):
 def githubify(md_file_name, target=None):
    """Wrapper - make the markdown resemble GitHub flavor"""
    target = get_target(target)
-    
+
 #    filein = os.path.join(config["content_path"], md_file_name)
 #    logging.info("opening source md file %s"%filein)
 #    with open(filein, "r") as f:
@@ -489,14 +460,14 @@ def githubify(md_file_name, target=None):
    md = get_markdown_for_page(md_file_name,
                               pp_env=setup_pp_env(),
                               target=target)
-    
+
    logging.info("githubifying markdown...")
    rendered_md = githubify_markdown(md, target=target, pages=pages)
-    
+
    if not os.path.isdir(config["out_path"]):
        logging.info("creating build folder %s" % config["out_path"])
        os.makedirs(config["out_path"])
-    
+
    fileout = os.path.join(config["out_path"], md_file_name)
    logging.info("writing generated file to path: %s"%fileout)
    with open(fileout, "w") as f:
@@ -527,10 +498,10 @@ if __name__ == "__main__":
                        help="Copy static files to the out dir",
                        default=False)
    cli_args = parser.parse_args()
-    
+
    if not cli_args.quiet:
        logging.basicConfig(level=logging.INFO)
-    
+
    if cli_args.config:
        load_config(cli_args.config)
    else:
@@ -559,7 +530,7 @@ if __name__ == "__main__":
        render_pages(target=cli_args.target,
                     bypass_errors=cli_args.bypass_errors)
        logging.info("done rendering")
-    
+
        if cli_args.copy_static:
            logging.info("copying static pages...")
            copy_static_files()
@@ -571,4 +542,3 @@ if __name__ == "__main__":
            watch(pdf_path, cli_args.target)
        else:
            watch(None, cli_args.target)
-
--- a/tool/dactyl_link_checker.py
+++ b/tool/dactyl_link_checker.py
@@ -5,7 +5,7 @@ import yaml
 import argparse
 from bs4 import BeautifulSoup

-DEFAULT_CONFIG_FILE = "devportal-config.yml"
+DEFAULT_CONFIG_FILE = "dactyl-config.yml"

 soupsCache = {}
 def getSoup(fullPath):
@@ -43,10 +43,10 @@ def checkLinks(offline=False):
              print("Empty link in",fullPath)
              broken_links.append( (fullPath, endpoint) )
              num_links_checked += 1
-            
+
            elif endpoint == "#":
              continue
-            
+
            elif "mailto:" in endpoint:
              print("Skipping email link in %s to %s"%(fullPath, endpoint))
              continue
@@ -55,7 +55,7 @@ def checkLinks(offline=False):
              if offline:
                print("Offline - Skipping remote URL %s"%(endpoint))
                continue
-                
+
              num_links_checked += 1
              if endpoint not in externalCache:
                print("Testing remote URL %s"%(endpoint))
@@ -71,15 +71,15 @@ def checkLinks(offline=False):
                  except Exception as e:
                    print("Error occurred:",e)
                    code = 500
-                
+
                if code < 200 or code >= 400:
                  print("Broken remote link in %s to %s"%(fullPath, endpoint))
                  broken_links.append( (fullPath, endpoint) )
                else:
                  print("...success.")
                  externalCache.append(endpoint)
-            
-              
+
+
            elif '#' in endpoint:
              if fname in config["ignore_anchors_in"]:
                print("Ignoring anchor %s in dynamic page %s"%(endpoint,fname))
@@ -98,7 +98,7 @@ def checkLinks(offline=False):
              elif filename in config["ignore_anchors_in"]:
                  #Some pages are populated dynamically, so BeatifulSoup wouldn't
                  # be able to find anchors in them anyway
-                  print("Skipping anchor link in %s to dynamic page %s" % 
+                  print("Skipping anchor link in %s to dynamic page %s" %
                        (fullPath, endpoint))
                  continue

@@ -153,26 +153,25 @@ if __name__ == "__main__":
        load_config(args.config)
    else:
        load_config()
-    
-    
-    
+
+
+
    broken_links, num_links_checked = checkLinks(args.offline)
-    
+
    print("---------------------------------------")
    print("Link check report. %d links checked."%num_links_checked)
-    
+
    if not args.strict:
-      unknown_broken_links = [ (page,link) for page,link in broken_links 
+      unknown_broken_links = [ (page,link) for page,link in broken_links
                        if link not in config["known_broken_links"] ]
-    
+
    if not broken_links:
      print("Success! No broken links found.")
    else:
      print("%d broken links found:"%(len(broken_links)))
      [print("File:",fname,"Link:",link) for fname,link in broken_links]
-      
+
      if args.strict or unknown_broken_links:
          exit(1)
-      
-      print("Success - all broken links are known problems.")

+      print("Success - all broken links are known problems.")
--- a/tool/dactyl_style_checker.py
+++ b/tool/dactyl_style_checker.py
@@ -0,0 +1,113 @@
+#!/usr/bin/env python3
+
+###############################################################################
+## Dactyl Style Police                                                       ##
+## Author: Rome Reginelli                                                    ##
+## Copyright: Ripple Labs, Inc. 2016                                         ##
+##                                                                           ##
+## Reads the markdown files to try and enforce elements of good style.       ##
+###############################################################################
+
+import logging
+import argparse
+#import nltk
+import re
+import collections
+import yaml
+
+from bs4 import BeautifulSoup
+
+import dactyl_build
+
+logger = logging.getLogger()
+
+with open("word_substitutions.yaml", "r") as f:
+	UNPLAIN_WORDS = yaml.load(f)
+with open("phrase_substitutions.yaml", "r") as f:
+	UNPLAIN_PHRASES = yaml.load(f)
+
+def check_all_pages(target=None):
+    """Reads all pages for a target and checks them for style."""
+    target = dactyl_build.get_target(target)
+    pages = dactyl_build.get_pages(target)
+
+    pp_env = dactyl_build.setup_pp_env()
+
+    style_issues = []
+    for page in pages:
+        if "md" not in page:
+            # Not a doc page, move on
+            continue
+        logging.info("Checking page %s" % page["name"])
+        page_issues = []
+        html = dactyl_build.parse_markdown(page, pages=pages, target=target)
+        soup = BeautifulSoup(html, "html.parser")
+
+        content_elements = ["p","li","h1","h2","h3","h4","h5","h6"]
+        passages = []
+        for el in soup.find_all(content_elements):
+            for passage in el.stripped_strings:
+                passage_issues = check_passage(passage)
+                if passage_issues:
+                    page_issues += passage_issues
+
+        if page_issues:
+            style_issues.append( (page["name"], page_issues) )
+
+    return style_issues
+
+def check_passage(passage):
+    """Checks an individual string of text for style issues."""
+    issues = []
+    logging.debug("Checking passage %s" % passage)
+    #tokens = nltk.word_tokenize(passage)
+    tokens = re.split(r"\s+", passage)
+    for t in tokens:
+        logging.debug
+        if t.lower() in UNPLAIN_WORDS:
+            issues.append( ("Unplain Word", t) )
+
+    for phrase,sub in UNPLAIN_PHRASES.items():
+        if phrase in passage.lower():
+            #logging.warn("Unplain phrase: %s; suggest %s instead" % (phrase, sub))
+            issues.append( ("Unplain Phrase", phrase) )
+
+    return issues
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Check content files for style issues.")
+    parser.add_argument("--config", "-c", type=str,
+        help="Specify path to an alternate config file.")
+    parser.add_argument("--quiet", "-q", action="store_true",
+                        help="Suppress status messages")
+    cli_args = parser.parse_args()
+
+    if not cli_args.quiet:
+        logging.basicConfig(level=logging.INFO)
+
+    if cli_args.config:
+        dactyl_build.load_config(cli_args.config)
+    else:
+        dactyl_build.load_config()
+
+    issues = check_all_pages()
+    if issues:
+        num_issues = sum(len(p[1]) for p in issues)
+        print("Found %d issues:" % num_issues)
+        for pagename,issuelist in issues:
+            print("Page: %s" % pagename)
+            c = collections.Counter(issuelist)
+            for i, count_i in c.items():
+                if i[0]=="Unplain Phrase":
+                    print("   Discouraged phrase: %s (%d instances); suggest ''%s' instead." %
+                                    ( i[1], count_i, UNPLAIN_PHRASES[i[1].lower()] ))
+                elif i[0]=="Unplain Word":
+                    print("   Discouraged word: %s (%d instances); suggest ''%s' instead." %
+                                    ( i[1], count_i, UNPLAIN_WORDS[i[1].lower()] ))
+                else:
+                    print("   %s: %s (%d instances)" % (i[0], i[1], count_i))
+        exit(1)
+    else:
+        print("Style check passed with flying colors!")
+        exit(0)
--- a/tool/filter_buttonize.py
+++ b/tool/filter_buttonize.py
@@ -0,0 +1,18 @@
+################################################################################
+## Buttonize links                                                            ##
+## Author: Rome Reginelli                                                     ##
+## Copyright: Ripple Labs, Inc. 2016                                          ##
+##                                                                            ##
+## Looks for links ending in >, and adds a "button" class to those links so   ##
+## they can be styled like buttons in the page.                               ##
+################################################################################
+import re
+
+def filter_soup(soup):
+    """make links ending in > render like buttons"""
+    buttonlinks = soup.find_all("a", string=re.compile(">$"))
+    for link in buttonlinks:
+        if "class" in link.attrs:
+            link["class"].append("button")
+        else:
+            link["class"] = "button"
--- a/tool/filter_markdown_in_divs.py
+++ b/tool/filter_markdown_in_divs.py
@@ -0,0 +1,25 @@
+################################################################################
+## Add Markdown Class to Divs filter                                          ##
+## Author: Rome Reginelli                                                     ##
+## Copyright: Ripple Labs, Inc. 2016                                          ##
+##                                                                            ##
+## Finds raw divs in the markdown and adds the markdown=1 attribute to them   ##
+## so that HTML inside those divs gets parsed as markdown.                    ##
+## Some flavors of markdown do this automatically, so this provides           ##
+## compatibility with those.                                                  ##
+################################################################################
+
+def filter_markdown(md):
+    """Python markdown requires markdown="1" on HTML block elements
+       that contain markdown. AND there's a bug where if you use
+       markdown.extensions.extra, it replaces code fences in HTML
+       block elements with garbled text."""
+    def add_markdown_class(m):
+        if m.group(0).find("markdown=") == -1:
+            return m.group(1) + ' markdown="1">'
+        else:
+            return m.group(0)
+
+    logger.info("... adding markdown class to embedded divs...")
+    md = re.sub(r"(<div[^>]*)>", add_markdown_class, md)
+    return md
--- a/tool/filter_multicode_tabs.py
+++ b/tool/filter_multicode_tabs.py
@@ -0,0 +1,20 @@
+################################################################################
+## Multicode Tabs filter                                                      ##
+## Author: Rome Reginelli                                                     ##
+## Copyright: Ripple Labs, Inc. 2016                                          ##
+##                                                                            ##
+## Finds and un-comments divs with the multicode class, for use with JS that  ##
+## turns the contents of those divs into tabs.                                ##
+## It's necessary to have them as comments so the markdown inside the div     ##
+## gets processed correctly.                                                  ##
+################################################################################
+import re
+
+def filter_html(html):
+    """Uncomment multicode tab divs"""
+    MC_START_REGEX = re.compile("<!-- *<div class=['\"]multicode['\"][^>]*> *-->")
+    MC_END_REGEX = re.compile("<!-- *</div> *-->")
+
+    html = re.sub(MC_START_REGEX, "<div class='multicode'>", html)
+    html = re.sub(MC_END_REGEX, "</div>", html)
+    return html
--- a/tool/filter_remove_doctoc.py
+++ b/tool/filter_remove_doctoc.py
@@ -0,0 +1,20 @@
+################################################################################
+## Remove doctoc filter                                                       ##
+## Author: Rome Reginelli                                                     ##
+## Copyright: Ripple Labs, Inc. 2016                                          ##
+##                                                                            ##
+## Removes an automatically-generated "doctoc" table of contents, as          ##
+## delineated by HTML comments, from the markdown source.                     ##
+################################################################################
+
+
+def filter_markdown(md):
+    """Strip out doctoc Table of Contents for RippleAPI"""
+    DOCTOC_START = "<!-- START doctoc generated TOC please keep comment here to allow auto update -->"
+    DOCTOC_END = "<!-- END doctoc generated TOC please keep comment here to allow auto update -->"
+
+    doctoc_start_i = md.find(DOCTOC_START)
+    doctoc_end_i = md.find(DOCTOC_END)
+    if doctoc_start_i != -1 and doctoc_end_i != -1:
+        md = md[:doctoc_start_i]+md[doctoc_end_i+len(DOCTOC_END):]
+    return md
--- a/tool/filter_standardize_header_ids.py
+++ b/tool/filter_standardize_header_ids.py
@@ -0,0 +1,17 @@
+################################################################################
+## Standardize header IDs                                                     ##
+## Author: Rome Reginelli                                                     ##
+## Copyright: Ripple Labs, Inc. 2016                                          ##
+##                                                                            ##
+## Replaces underscores with dashes in h1,h2,... element IDs. This provides   ##
+## compatibility with some other flavors of markdown that generate HTML IDs   ##
+## differently.                                                               ##
+################################################################################
+import re
+
+def filter_soup(soup):
+    """replace underscores with dashes in h1,h2,etc. for backwards compatibility"""
+    headers = soup.find_all(name=re.compile("h[0-9]"), id=True)
+    for h in headers:
+        if "_" in h["id"]:
+            h["id"] = h["id"].replace("_", "-")
--- a/tool/phrase_substitutions.yaml
+++ b/tool/phrase_substitutions.yaml
@@ -0,0 +1,77 @@
+and/or: __ or __ or both
+addressees are requested: (omit), please
+adjacent to: next to
+adversely impact on: hurt, set back
+afford an opportunity: allow, let
+a number of: some
+arrive onboard: arrive
+as a means of: to
+as prescribed by: in, under
+at the present time: at present, now
+be advised: (omit)
+by means of: by, with
+close proximity: near
+cold wallet: issuing address
+combat environment: combat
+comply with: follow
+due to the fact that: due to, since
+during the period: during
+effect modifications: make changes
+failed to: didn't
+for a period of: for
+has a requirement for: needs
+hot wallet: operational address
+in accordance with: by, following, per, under
+in addition: also, besides, too
+in an effort to: to
+in a timely manner: on time, promptly
+incumbent upon: must
+in lieu of: instead
+in order that: for, so
+inter alia: (omit)
+interpose no objection: don't object
+in order to: to
+in regard to: about, concerning, on
+in relation to: about, with, to
+in the amount of: for
+in the event of: if
+in the near future: shortly, soon
+in the process of: (omit)
+in view of: since
+in view of the above: so
+inasmuch as: since
+is applicable to: applies to
+is authorized to: may
+is in consonance with: agrees with, follows
+is responsible for: (omit) handles
+it appears: seems
+it is: (omit)
+it is essential: must, need to
+it is requested: please, we request, I request
+limited number: limits
+market maker: liquidity provider
+not later than: by, before
+not later than: by
+pertaining to: about, of, on
+prior to: before
+provided that: if
+provides guidance for: guides
+pursuant to: by, following, per, under
+relative to: about, on
+ripple network: Ripple Consensus Ledger, Ripple peer-to-peer network
+set forth in: in
+similar to: like
+successfully complete: complete, pass
+take action to: (omit)
+the month of: (omit)
+there are: (omit)
+there is: (omit)
+the undersigned: I
+the use of: (omit)
+time period: (either one)
+under the provisions of: under
+until such time as: until
+warm wallet: standby address
+with reference to: about
+with the exception of: except for
+your office: you
--- a/tool/word_substitutions.yaml
+++ b/tool/word_substitutions.yaml
@@ -0,0 +1,165 @@
+accompany: go with
+accomplish: carry out, do
+accorded: given
+accordingly: so
+accrue: add, gain
+accurate: correct, exact, right
+additional: added, more, other
+addressees: you
+advantageous: helpful
+advise: recommend, tell
+aircraft: plane
+allocate: divide
+anticipate: expect
+api's: APIs
+apparent: clear, plain
+appreciable: many
+appropriate: (omit), proper, right
+approximate: about
+ascertain: find out, learn
+assist: aid, help
+assistance: aid, help
+attain: meet
+attempt: try
+basically: (omit)
+benefit: help
+capability: ability
+caveat: warning
+clearly: (omit)
+combined: joint
+commence: begin, start
+component: part
+comprise: form, include, make up
+concerning: about, on
+consequently: so
+consolidate: combine, join, merge
+constitutes: is, forms, makes up
+contains: has
+convene: meet
+currently: (omit), now
+deem: believe, consider, think
+delete: cut, drop
+demonstrate: prove, show
+depart: leave
+designate: appoint, choose, name
+desire: want, wish
+determine: decide, figure, find
+disclose: show
+discontinue: drop, stop
+disseminate: give, issue, pass, send
+easy: (omit)
+easily: (omit)
+elect: choose, pick
+eliminate: cut, drop, end
+employ: use
+encounter: meet
+endeavor: try
+ensure: make sure
+enumerate: count
+equipments: equipment
+equitable: fair
+establish: set up, prove, show
+evidenced: showed
+evident: clear
+exhibit: show
+expedite: hasten, speed up
+expeditious: fast, quick
+expend: spend
+expertise: ability
+expiration: end
+facilitate: ease, help
+feasible: can be done, workable
+females: women
+finalize: complete, finish
+forfeit: give up, lose
+forward: send
+frequently: often
+furnish: give, send
+gateway: financial institution
+herein: here
+heretofore: until now
+herewith: below, here
+identical: same
+identify: find, name, show
+immediately: at once
+impacted: affected, changed
+implement: carry out, start
+inception: start
+indicate: show, write down
+indication: sign
+initial: first
+initiate: start
+insane: (omit)
+ious: issuances
+just: (omit)
+liaison: discussion
+magnitude: size
+maintain: keep, support
+methodology: method
+minimize: decrease, method
+modify: change
+monitor: check, watch
+necessitate: cause, need
+notify: let know, tell
+notwithstanding: inspite of, still
+numerous: many
+objective: aim, goal
+obligate: bind, compel
+observe: see
+obviously: (omit)
+operate: run, use, work
+optimum: best, greatest, most
+participate: take part
+perform: do
+permit: let
+portion: part
+possess: have, own
+practicable: practical
+preclude: prevent
+previous: earlier
+previously: before
+prioritize: rank
+proceed: do, go ahead, try
+procure: (omit)
+proficiency: skill
+promulgate: issue, publish
+provide: give, offer, say
+purchase: buy
+reflect: say, show
+regarding: about, of, on
+relocate: move
+remain: stay
+remainder: rest
+remuneration: pay, payment
+render: give, make
+represents: is
+require: must, need
+requirement: need
+reside: live
+retain: keep
+selection: choice
+sexy: (omit)
+simple: (omit)
+simply: (omit)
+solicit: ask for, request
+state-of-the-art: latest
+subject: the, this, your
+subsequent: later, next
+subsequently: after, later, then
+substantial: large, much
+sufficient: enough
+terminate: end, stop
+therefore: so
+therein: there
+thereof: its, their
+this activity, command: us, we
+timely: prompt
+transmit: send
+utilize: use
+utilization: use
+viable: practical, workable
+vice: instead of, versus
+wallet: address, client application, account
+warrant: call for, permit
+whereas: because, since
+witnessed: saw