rename parse_pages to Dactyl; add experimental style checking

This commit is contained in:
mDuo13
2016-03-17 02:53:54 -07:00
parent dc808f32f4
commit 6b5297c2e0
14 changed files with 614 additions and 156 deletions

1
.gitignore vendored
View File

@@ -2,3 +2,4 @@
*~
.DS_Store
content/code_samples/*/node_modules/
__pycache__

View File

@@ -9,14 +9,25 @@ Repository Layout
The HTML pages in this portal are generated from the markdown files in the [content/](content/) folder. Always edit the markdown files, not the HTML files.
The [tool/](tool/) folder contains tools and templates for generating the HTML files in the top level. The `parse_pages.py` script (requires Python 3, [Jinja 2](http://jinja.pocoo.org/), and various pip modules) uses the templates and the a YAML config file to generate HTML output. In general, `parse_pages.py` assumes you are running it with `tool/` as the current working directory, and the default config uses the top level of this repository as the output directory.
The [tool/](tool/) folder contains a tool, called **Dactyl**, for generating the HTML files in the top level. The `dactyl_build.py` script uses the templates and the a YAML config file to generate HTML output. The `dactyl_link_checker.py` script checks the generated HTML content for broken links. The `dactyl_style_checker.py` script (experimental) checks the content for style guide violations.
In general, Dactyl assumes you are running it with `tool/` as the current working directory, and the included config uses the top level of this repository as the output directory.
Dactyl Setup
------------
Dactyl uses Python 3 and a number of modules. First, make sure you have Python 3 installed in your local operating system, then use [PIP](https://pip.pypa.io/en/stable/) to install the dependencies:
`pip3 install -r tool/packages.txt`
Building
--------
The default configuration is [`devportal-config.yml`](tool/devportal-config.yml). You can specify an alternate config file with the `-c` or `--config` parameter:
The default configuration is [`dactyl-config.yml`](tool/dactyl-config.yml). You can specify an alternate config file with the `-c` or `--config` parameter:
`./parse_pages.py -c alt-config.yml`
`./dactyl_build.py -c alt-config.yml`
`./dactyl_link_checker.py -c alt-config.yml`
This script does the following:
@@ -26,12 +37,7 @@ This script does the following:
4. For all pages that have a URL as the `md` parameter, it fetches the file via HTTP(S). No pre-processing occurs on such contents.
5. For all the retrieved and preprocessed markdown files, it parses them using Python's markdown library, with extras enabled to approximate GitHub-flavored markdown.
6. For each page, it parses the **template** configured for the page using Jinja2, falling back to the **default_template** (or **pdf_template**). If it produced HTML from a Markdown source, it passes that HTML as the `content` parameter to the template. It also passes in several other arguments from the config file, including definition of the current page as the `currentpage` parameter.
7. It applies several post-processing steps on the generated HTML, including:
* Automatic link substitution by target
* The removal of any content between certain DOCTOC comments
* Un-comments any <div class="multicode"> and </div> tags (if building for a target with `multicode_tabs: true` configured).
* Modifies the `id` fields of generated header (<h#>) elements to use dashes instead of underscores. (This is for compatibility with previously-used doc tools.)
* Adds the `button` class to links whose text ends in >
7. It applies several post-processing steps on the generated HTML. Additional [filters](#filters) can be configured as plugins to run on a per-page or per-target basis. Dactyl always performs link substitution by target.
8. If building for PDF: It outputs the resulting HTML files to the configured **temporary_files_path**, using filenames specified in the **html** parameter of each page in the config file. It also copies the **template_static_path** and **content_static_path** folders to the temporary folder. Then it runs [Prince](http://www.princexml.com/) to generate a PDF. It writes the generated PDF to the **out_path**.
9. Otherwise: It outputs the resulting HTML files to the **out_path**. This does not generate a working copy of the site unless the necessary static files are also available at the **out_path**. (This is true by default, since the default output directory is the top level of the repository.) You can have it copy the configured **template_static_path** (containing files referenced by the templates) and the **content_static_path** (containing files referenced by the content) to the output directory using the `--copy_static` or `-s` parameter.
@@ -41,23 +47,43 @@ Githubify Mode
Alternate usage: `-g` produces a GitHub-flavored Markdown version of a single file. This parameter takes one argument, which should be a markdown file in the **content_path**. The tool uses Jinja2 to "preprocess" the file, as above, but stops after assembling GitHub-flavored Markdown and writes the output to the same filename in the **out_path**.
**Note:** The tool never erases files from the **out_path**. You may want to do that yourself, especially if you remove files.
**Note:** The tool never erases files from the **out_path** or the **temporary_files_path**. You may want to do that yourself, especially if you remove files from your config or rename them.
Multiple Targets
----------------
You can define multiple **targets** in the config file with arbitrary key-value parameters. The two parameters that the tool cares about by default are **name** (used to identify the target on the commandline and in the pages section of the config) and **multicode_tabs** (which determines whether or not to uncomment certain divs).
You can define multiple **targets** in the config file with arbitrary key-value parameters. The two parameters that the tool cares about by default are **name** (used to identify the target on the commandline and in the pages section of the config) and **filters** (which lists filter plugins to apply, if provided).
By default, the tool builds the first target in the list. Every page in the `pages` array is included in every target unless the page definition includes an explicit list of **targets** to build. (Each member in the list should be the **name** of a target.)
The tool can perform automatic substitution of links in the resulting HTML (or Markdown, when using [githubify](#githubify-mode)). For each parameter in the page definition that matches the name of a target, it replaces links to the `html` file with the contents of the target-name-parameter. Anchors from the original link carry over. This allows you to link to other pages using the filenames from the local version of the site, but replace them with different links for a live site. (As seen in the default configuration, Ripple.com has very different URLs for many pages.)
Filters
-------
Dactyl can apply various filters on document content, which is useful for handling compatibility with alternate Markdown flavors, among other things. The **filters** option, at the target or page level of the config, should contain an array of filters to apply. (Filters at the target level apply to all pages in that target; filters at the page level apply to that page only.) Each filter is implemented by a Python script in the Dactyl directory named `filter_{filter_name}.py`, based on the {filter_name} specified in the config.
Filters can apply at any or all of three steps: Raw markdown, raw HTML, or BeautifulSoup, as follows:
Raw markdown filters implement a `filter_markdown(md)` function, which inputs and outputs a string of markdown text.
Raw HTML filters implement a `filter_html(html)` function, which inputs and outputs a string of HTML text.
BeautifulSoup filters implement a `filter_soup(soup)` method, which takes a BeautifulSoup4 representation of the parsed HTML content as input. Because the input is passed by reference, the function modifies the soup in place instead of returning it.
Dactyl comes with the following filters:
* `remove_doctoc` - Remove DOCTOC-generated tables of contents
* `multicode_tabs` - Un-comment <div class="multicode"> and </div> tags
* `standardize_header_ids` - Modify the `id` fields of generated header (<h#>) elements to use dashes instead of underscores. (This is for compatibility with previously-used doc tools.)
* `buttonize` - Adds the `button` class to links whose text ends in >
* `markdown_in_divs` - Automatically add the `markdown="1"` element to <div> elements so that their contents get parsed as Markdown. (No longer used by the Dev Portal, but useful for compatibility with Markdown flavors that do this automatically.)
Contributing
------------
The Developer Portal welcomes outside contributions, especially to the documentation contents. If you have any corrections, improvements, or expansions of the portal, please contribute pull requests to the **gh-pages** branch.
The Developer Portal welcomes outside contributions, especially to the documentation contents. If you have any corrections, improvements, or expansions of the portal, please contribute pull requests to the **master** branch.
Contributions become copyright Ripple and are provided under the MIT [LICENSE](LICENSE).

View File

@@ -4,8 +4,8 @@ dependencies:
pre:
- pyenv global 3.4.0
override:
- pip3 install jinja2 requests beautifulsoup4 markdown watchdog
- pip3 install -r packages.txt
test:
override:
- ./parse_pages.py
- ./check_links.py
- ./dactyl_build.py
- ./dactyl_link_checker.py

View File

@@ -25,9 +25,15 @@ prince_executable: prince
targets:
# First member is the default that gets built when target not specified
- name: local
multicode_tabs: true
filters:
- multicode_tabs
- standardize_header_ids
- buttonize
- name: ripple.com
multicode_tabs: true
filters:
- multicode_tabs
- standardize_header_ids
- buttonize
pages:
# Intro pages is not directly replicated on ripple.com at this time
@@ -64,6 +70,8 @@ pages:
# Currently this is the only page that's fetched remotely.
md: https://raw.githubusercontent.com/ripple/ripple-lib/0.16.7/docs/index.md
ripple.com: https://ripple.com/build/rippleapi/
filters:
- remove_doctoc
sidebar: true
- name: Ripple Data API v2
@@ -214,4 +222,3 @@ known_broken_links:
- https://validators.ripple.com
# Zendesk hasn't updated the cert. See ticket IN-1168
- https://support.ripplelabs.com/hc/en-us/categories/200194196-Set-Up-Activation

View File

@@ -1,12 +1,13 @@
#!/usr/bin/env python3
################################################################################
# ripple-dev-portal doc parser
# Dactyl - a tool for heroic epics of documentation
#
# Generate the html for all the Ripple Dev Portal files from a template
# Generates a website from Markdown and Jinja templates, with filtering
# along the way.
################################################################################
DEFAULT_CONFIG_FILE = "devportal-config.yml"
DEFAULT_CONFIG_FILE = "dactyl-config.yml"
import os
import re
@@ -17,6 +18,9 @@ import logging
# Necessary to copy static files to the output dir
from distutils.dir_util import copy_tree
# Used to import filters.
from importlib import import_module
# Necessary for prince
import subprocess
@@ -36,10 +40,10 @@ from watchdog.events import PatternMatchingEventHandler
# The log level is configurable at runtime (see __main__ below)
logger = logging.getLogger()
filters = {}
def load_config(config_file=DEFAULT_CONFIG_FILE):
"""Reload config from a YAML file."""
global config
global config, filters
logger.info("loading config file %s..." % config_file)
with open(config_file, "r") as f:
config = yaml.load(f)
@@ -55,73 +59,23 @@ def load_config(config_file=DEFAULT_CONFIG_FILE):
if "prince_executable" not in config or not config["prince_executable"]:
config["prince_executable"] = "prince" # A reasonable default
# Figure out which filters we need and import them
filternames = set()
for target in config["targets"]:
if "filters" in target:
filternames.update(target["filters"])
for page in config["pages"]:
if "filters" in page:
filternames.update(page["filters"])
for filter_name in filternames:
filters[filter_name] = import_module("filter_"+filter_name)
def remove_doctoc(md):
"""Strip out doctoc Table of Contents for RippleAPI"""
DOCTOC_START = "<!-- START doctoc generated TOC please keep comment here to allow auto update -->"
DOCTOC_END = "<!-- END doctoc generated TOC please keep comment here to allow auto update -->"
doctoc_start_i = md.find(DOCTOC_START)
doctoc_end_i = md.find(DOCTOC_END)
if doctoc_start_i != -1 and doctoc_end_i != -1:
logger.info("... stripping doctoc...")
md = md[:doctoc_start_i]+md[doctoc_end_i+len(DOCTOC_END):]
return md
def enable_multicode(html):
"""Uncomment multicode tab divs"""
MC_START_REGEX = re.compile("<!-- *<div class='multicode'[^>]*> *-->")
MC_END_REGEX = re.compile("<!-- *</div> *-->")
logger.info("... enabling multicode tabs...")
html = re.sub(MC_START_REGEX, "<div class='multicode'>", html)
html = re.sub(MC_END_REGEX, "</div>", html)
return html
def standardize_header_ids(soup):
"""replace underscores with dashes in h1,h2,etc. for backwards compatibility"""
logger.info("... standardizing headers...")
headers = soup.find_all(name=re.compile("h[0-9]"), id=True)
for h in headers:
if "_" in h["id"]:
h["id"] = h["id"].replace("_", "-")
def buttonize_try_it(soup):
"""make links ending in > render like buttons"""
logger.info("... buttonizing try-it links...")
buttonlinks = soup.find_all("a", string=re.compile(">$"))
for link in buttonlinks:
if "class" in link.attrs:
link["class"].append("button")
else:
link["class"] = "button"
def markdown_in_div_elements(md):
"""Python markdown requires markdown="1" on HTML block elements
that contain markdown. AND there's a bug where if you use
markdown.extensions.extra, it replaces code fences in HTML
block elements with garbled text."""
def add_markdown_class(m):
if m.group(0).find("markdown=") == -1:
return m.group(1) + ' markdown="1">'
else:
return m.group(0)
logger.info("... adding markdown class to embedded divs...")
md = re.sub(r"(<div[^>]*)>", add_markdown_class, md)
return md
def substitute_links_for_target(soup, target):
"""Replaces local-html-links with appropriate substitutions
for the given target"""
target = get_target(target)
logger.info("... modifying links for target: %s" % target["name"])
# We actually want to get all pages, even the ones that aren't built as
# part of this target, in case those pages have replacement links.
@@ -140,49 +94,71 @@ def substitute_links_for_target(soup, target):
def get_target(target):
"""Get a target by name, or return the default target object.
We can't use default args in function defs because the default is
We can't use default args in function defs because the default is
set at runtime based on config"""
if target == None:
return config["targets"][0]
if type(target) == str:
try:
return next(t for t in config["targets"] if t["name"] == target)
except StopIteration:
logger.critical("Unknown target: %s" % target)
exit(1)
if "name" in target:
# Eh, it's probably a target, just return it
return target
def parse_markdown(md, target=None, pages=None):
def get_filters_for_page(page, target=None):
ffp = set()
target = get_target(target)
if "filters" in target:
ffp.update(target["filters"])
if "filters" in page:
ffp.update(page["filters"])
return ffp
def parse_markdown(page, target=None, pages=None):
"""Take a markdown string and output HTML for that content"""
target = get_target(target)
logging.info("Preparing page %s" % page["name"])
# Mostly unnecessary as long as the multicode divs are commented out
## markdown_in_div_elements(md)
# Preprocess Markdown using this Jinja environment
pp_env = setup_pp_env()
# RippleAPI doc file has an extra table-of-contents at the start
md = remove_doctoc(md)
# We'll apply these filters to the page
page_filters = get_filters_for_page(page, target)
md = get_markdown_for_page(page["md"], pp_env=pp_env, target=target)
# Apply markdown-based filters here
for filter_name in page_filters:
if "filter_markdown" in dir(filters[filter_name]):
logging.info("... applying markdown filter %s" % filter_name)
md = filters[filter_name].filter_markdown(md)
# Actually parse the markdown
logger.info("... parsing markdown...")
html = markdown(md, extensions=["markdown.extensions.extra",
"markdown.extensions.toc"])
# If target uses multicode tabs, uncomment the divs now
if "multicode_tabs" in target and target["multicode_tabs"]:
html = enable_multicode(html)
# Apply raw-HTML-string-based filters here
for filter_name in page_filters:
if "filter_html" in dir(filters[filter_name]):
logging.info("... applying HTML filter %s" % filter_name)
html = filters[filter_name].filter_html(html)
# At this point, HTML manipulations are easier on a soup than a string
# Some filters would rather operate on a soup than a string.
# May as well parse once and re-serialize once.
soup = BeautifulSoup(html, "html.parser")
# Mostly necessary for compatibility with legacy content
standardize_header_ids(soup)
#buttonize links ending in >
buttonize_try_it(soup)
# Apply soup-based filters here
for filter_name in page_filters:
if "filter_soup" in dir(filters[filter_name]):
logging.info("... applying soup filter %s" % filter_name)
filters[filter_name].filter_soup(soup)
# ^ the soup filters apply to the same object, passed by reference
# Replace links for any non-default target
if target["name"] != config["targets"][0]["name"]:
@@ -197,7 +173,7 @@ def githubify_markdown(md, target=None, pages=None):
"""Github-friendly markdown has absolute links, no md in divs"""
MARKDOWN_LINK_REGEX = re.compile(
r"(\[([^\]]+)\]\(([^:)]+)\)|\[([^\]]+)\]:\s*(\S+)$)", re.MULTILINE)
target = get_target(target)
if not pages:
pages = get_pages(target["name"])
@@ -262,7 +238,7 @@ def get_pages(target=None):
def get_categories(pages):
"""Produce an ordered, de-duplicated list of categories from
"""Produce an ordered, de-duplicated list of categories from
the page list"""
categories = []
for page in pages:
@@ -304,17 +280,17 @@ def copy_static_files(template_static=True, content_static=True, out_path=None):
"""Copy static files to the output directory."""
if out_path == None:
out_path = config["out_path"]
if template_static:
template_static_src = config["template_static_path"]
template_static_dst = os.path.join(out_path,
template_static_dst = os.path.join(out_path,
os.path.basename(template_static_src))
copy_tree(template_static_src, template_static_dst)
if content_static:
content_static_src = config["content_static_path"]
content_static_dst = os.path.join(out_path,
content_static_dst = os.path.join(out_path,
os.path.basename(content_static_src))
copy_tree(content_static_src, content_static_dst)
@@ -336,12 +312,9 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
pages = get_pages(target)
categories = get_categories(pages)
# Preprocess Markdown using this Jinja environment
pp_env = setup_pp_env()
# Insert generated HTML into templates using this Jinja environment
env = setup_html_env()
if for_pdf:
logging.info("reading pdf template...")
default_template = env.get_template(config["pdf_template"])
@@ -352,11 +325,11 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
for currentpage in pages:
if "md" in currentpage:
# Read and parse the markdown
try:
md_content = get_markdown_for_page(currentpage["md"],
pp_env=pp_env,
target=target)
html_content = parse_markdown(currentpage, target=target,
pages=pages)
except Exception as e:
if bypass_errors:
logging.warning( ("Skipping page %s " +
@@ -366,22 +339,20 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
else:
exit("Error when fetching page %s: %s" %
(currentpage["name"], e) )
html_content = parse_markdown(md_content, target, pages)
else:
html_content = ""
if "template" in currentpage:
# Use a template other than the default one
template = env.get_template(currentpage["template"])
#do link substitution for "doc_page" param
if "doc_page" in currentpage:
doc_page = next(p for p in pages
doc_page = next(p for p in pages
if p["html"] == currentpage["doc_page"])
if target["name"] in doc_page:
currentpage["doc_page"] = doc_page[target["name"]]
out_html = template.render(currentpage=currentpage,
categories=categories,
pages=pages,
@@ -393,13 +364,13 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
pages=pages,
content=html_content,
target=target)
# Experimental: replace links in full HTML, not just content
soup = BeautifulSoup(out_html, "html.parser")
if target["name"] != config["targets"][0]["name"]:
substitute_links_for_target(soup, target)
out_html = str(soup)
if for_pdf:
out_path = config["temporary_files_path"]
else:
@@ -414,7 +385,7 @@ def render_pages(target=None, for_pdf=False, bypass_errors=False):
def watch(pdf_file, target):
"""Look for changed files and re-generate HTML (and optionally
"""Look for changed files and re-generate HTML (and optionally
PDF whenever there's an update. Runs until interrupted."""
target = get_target(target)
@@ -456,21 +427,21 @@ def make_pdf(outfile, target=None, bypass_errors=False):
render_pages(target=target, for_pdf=True, bypass_errors=bypass_errors)
temp_files_path = config["temporary_files_path"]
# Prince will need the static files, so copy them over
copy_static_files(out_path=temp_files_path)
# Make sure the path we're going to write the PDF to exists
# Make sure the path we're going to write the PDF to exists
if not os.path.isdir(config["out_path"]):
logging.info("creating build folder %s" % config["out_path"])
os.makedirs(config["out_path"])
# Start preparing the prince command
args = [config["prince_executable"], '-o', outfile]
# Each HTML output file in the target is another arg to prince
pages = get_pages(target)
args += [os.path.join(temp_files_path, p["html"]) for p in pages]
logger.info("generating PDF: running %s..." % " ".join(args))
prince_resp = subprocess.check_output(args, universal_newlines=True)
print(prince_resp)
@@ -479,7 +450,7 @@ def make_pdf(outfile, target=None, bypass_errors=False):
def githubify(md_file_name, target=None):
"""Wrapper - make the markdown resemble GitHub flavor"""
target = get_target(target)
# filein = os.path.join(config["content_path"], md_file_name)
# logging.info("opening source md file %s"%filein)
# with open(filein, "r") as f:
@@ -489,14 +460,14 @@ def githubify(md_file_name, target=None):
md = get_markdown_for_page(md_file_name,
pp_env=setup_pp_env(),
target=target)
logging.info("githubifying markdown...")
rendered_md = githubify_markdown(md, target=target, pages=pages)
if not os.path.isdir(config["out_path"]):
logging.info("creating build folder %s" % config["out_path"])
os.makedirs(config["out_path"])
fileout = os.path.join(config["out_path"], md_file_name)
logging.info("writing generated file to path: %s"%fileout)
with open(fileout, "w") as f:
@@ -527,10 +498,10 @@ if __name__ == "__main__":
help="Copy static files to the out dir",
default=False)
cli_args = parser.parse_args()
if not cli_args.quiet:
logging.basicConfig(level=logging.INFO)
if cli_args.config:
load_config(cli_args.config)
else:
@@ -559,7 +530,7 @@ if __name__ == "__main__":
render_pages(target=cli_args.target,
bypass_errors=cli_args.bypass_errors)
logging.info("done rendering")
if cli_args.copy_static:
logging.info("copying static pages...")
copy_static_files()
@@ -571,4 +542,3 @@ if __name__ == "__main__":
watch(pdf_path, cli_args.target)
else:
watch(None, cli_args.target)

View File

@@ -5,7 +5,7 @@ import yaml
import argparse
from bs4 import BeautifulSoup
DEFAULT_CONFIG_FILE = "devportal-config.yml"
DEFAULT_CONFIG_FILE = "dactyl-config.yml"
soupsCache = {}
def getSoup(fullPath):
@@ -43,10 +43,10 @@ def checkLinks(offline=False):
print("Empty link in",fullPath)
broken_links.append( (fullPath, endpoint) )
num_links_checked += 1
elif endpoint == "#":
continue
elif "mailto:" in endpoint:
print("Skipping email link in %s to %s"%(fullPath, endpoint))
continue
@@ -55,7 +55,7 @@ def checkLinks(offline=False):
if offline:
print("Offline - Skipping remote URL %s"%(endpoint))
continue
num_links_checked += 1
if endpoint not in externalCache:
print("Testing remote URL %s"%(endpoint))
@@ -71,15 +71,15 @@ def checkLinks(offline=False):
except Exception as e:
print("Error occurred:",e)
code = 500
if code < 200 or code >= 400:
print("Broken remote link in %s to %s"%(fullPath, endpoint))
broken_links.append( (fullPath, endpoint) )
else:
print("...success.")
externalCache.append(endpoint)
elif '#' in endpoint:
if fname in config["ignore_anchors_in"]:
print("Ignoring anchor %s in dynamic page %s"%(endpoint,fname))
@@ -98,7 +98,7 @@ def checkLinks(offline=False):
elif filename in config["ignore_anchors_in"]:
#Some pages are populated dynamically, so BeatifulSoup wouldn't
# be able to find anchors in them anyway
print("Skipping anchor link in %s to dynamic page %s" %
print("Skipping anchor link in %s to dynamic page %s" %
(fullPath, endpoint))
continue
@@ -153,26 +153,25 @@ if __name__ == "__main__":
load_config(args.config)
else:
load_config()
broken_links, num_links_checked = checkLinks(args.offline)
print("---------------------------------------")
print("Link check report. %d links checked."%num_links_checked)
if not args.strict:
unknown_broken_links = [ (page,link) for page,link in broken_links
unknown_broken_links = [ (page,link) for page,link in broken_links
if link not in config["known_broken_links"] ]
if not broken_links:
print("Success! No broken links found.")
else:
print("%d broken links found:"%(len(broken_links)))
[print("File:",fname,"Link:",link) for fname,link in broken_links]
if args.strict or unknown_broken_links:
exit(1)
print("Success - all broken links are known problems.")
print("Success - all broken links are known problems.")

113
tool/dactyl_style_checker.py Executable file
View File

@@ -0,0 +1,113 @@
#!/usr/bin/env python3
###############################################################################
## Dactyl Style Police ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Reads the markdown files to try and enforce elements of good style. ##
###############################################################################
import logging
import argparse
#import nltk
import re
import collections
import yaml
from bs4 import BeautifulSoup
import dactyl_build
logger = logging.getLogger()
with open("word_substitutions.yaml", "r") as f:
UNPLAIN_WORDS = yaml.load(f)
with open("phrase_substitutions.yaml", "r") as f:
UNPLAIN_PHRASES = yaml.load(f)
def check_all_pages(target=None):
"""Reads all pages for a target and checks them for style."""
target = dactyl_build.get_target(target)
pages = dactyl_build.get_pages(target)
pp_env = dactyl_build.setup_pp_env()
style_issues = []
for page in pages:
if "md" not in page:
# Not a doc page, move on
continue
logging.info("Checking page %s" % page["name"])
page_issues = []
html = dactyl_build.parse_markdown(page, pages=pages, target=target)
soup = BeautifulSoup(html, "html.parser")
content_elements = ["p","li","h1","h2","h3","h4","h5","h6"]
passages = []
for el in soup.find_all(content_elements):
for passage in el.stripped_strings:
passage_issues = check_passage(passage)
if passage_issues:
page_issues += passage_issues
if page_issues:
style_issues.append( (page["name"], page_issues) )
return style_issues
def check_passage(passage):
"""Checks an individual string of text for style issues."""
issues = []
logging.debug("Checking passage %s" % passage)
#tokens = nltk.word_tokenize(passage)
tokens = re.split(r"\s+", passage)
for t in tokens:
logging.debug
if t.lower() in UNPLAIN_WORDS:
issues.append( ("Unplain Word", t) )
for phrase,sub in UNPLAIN_PHRASES.items():
if phrase in passage.lower():
#logging.warn("Unplain phrase: %s; suggest %s instead" % (phrase, sub))
issues.append( ("Unplain Phrase", phrase) )
return issues
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Check content files for style issues.")
parser.add_argument("--config", "-c", type=str,
help="Specify path to an alternate config file.")
parser.add_argument("--quiet", "-q", action="store_true",
help="Suppress status messages")
cli_args = parser.parse_args()
if not cli_args.quiet:
logging.basicConfig(level=logging.INFO)
if cli_args.config:
dactyl_build.load_config(cli_args.config)
else:
dactyl_build.load_config()
issues = check_all_pages()
if issues:
num_issues = sum(len(p[1]) for p in issues)
print("Found %d issues:" % num_issues)
for pagename,issuelist in issues:
print("Page: %s" % pagename)
c = collections.Counter(issuelist)
for i, count_i in c.items():
if i[0]=="Unplain Phrase":
print(" Discouraged phrase: %s (%d instances); suggest ''%s' instead." %
( i[1], count_i, UNPLAIN_PHRASES[i[1].lower()] ))
elif i[0]=="Unplain Word":
print(" Discouraged word: %s (%d instances); suggest ''%s' instead." %
( i[1], count_i, UNPLAIN_WORDS[i[1].lower()] ))
else:
print(" %s: %s (%d instances)" % (i[0], i[1], count_i))
exit(1)
else:
print("Style check passed with flying colors!")
exit(0)

18
tool/filter_buttonize.py Normal file
View File

@@ -0,0 +1,18 @@
################################################################################
## Buttonize links ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Looks for links ending in >, and adds a "button" class to those links so ##
## they can be styled like buttons in the page. ##
################################################################################
import re
def filter_soup(soup):
"""make links ending in > render like buttons"""
buttonlinks = soup.find_all("a", string=re.compile(">$"))
for link in buttonlinks:
if "class" in link.attrs:
link["class"].append("button")
else:
link["class"] = "button"

View File

@@ -0,0 +1,25 @@
################################################################################
## Add Markdown Class to Divs filter ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Finds raw divs in the markdown and adds the markdown=1 attribute to them ##
## so that HTML inside those divs gets parsed as markdown. ##
## Some flavors of markdown do this automatically, so this provides ##
## compatibility with those. ##
################################################################################
def filter_markdown(md):
"""Python markdown requires markdown="1" on HTML block elements
that contain markdown. AND there's a bug where if you use
markdown.extensions.extra, it replaces code fences in HTML
block elements with garbled text."""
def add_markdown_class(m):
if m.group(0).find("markdown=") == -1:
return m.group(1) + ' markdown="1">'
else:
return m.group(0)
logger.info("... adding markdown class to embedded divs...")
md = re.sub(r"(<div[^>]*)>", add_markdown_class, md)
return md

View File

@@ -0,0 +1,20 @@
################################################################################
## Multicode Tabs filter ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Finds and un-comments divs with the multicode class, for use with JS that ##
## turns the contents of those divs into tabs. ##
## It's necessary to have them as comments so the markdown inside the div ##
## gets processed correctly. ##
################################################################################
import re
def filter_html(html):
"""Uncomment multicode tab divs"""
MC_START_REGEX = re.compile("<!-- *<div class=['\"]multicode['\"][^>]*> *-->")
MC_END_REGEX = re.compile("<!-- *</div> *-->")
html = re.sub(MC_START_REGEX, "<div class='multicode'>", html)
html = re.sub(MC_END_REGEX, "</div>", html)
return html

View File

@@ -0,0 +1,20 @@
################################################################################
## Remove doctoc filter ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Removes an automatically-generated "doctoc" table of contents, as ##
## delineated by HTML comments, from the markdown source. ##
################################################################################
def filter_markdown(md):
"""Strip out doctoc Table of Contents for RippleAPI"""
DOCTOC_START = "<!-- START doctoc generated TOC please keep comment here to allow auto update -->"
DOCTOC_END = "<!-- END doctoc generated TOC please keep comment here to allow auto update -->"
doctoc_start_i = md.find(DOCTOC_START)
doctoc_end_i = md.find(DOCTOC_END)
if doctoc_start_i != -1 and doctoc_end_i != -1:
md = md[:doctoc_start_i]+md[doctoc_end_i+len(DOCTOC_END):]
return md

View File

@@ -0,0 +1,17 @@
################################################################################
## Standardize header IDs ##
## Author: Rome Reginelli ##
## Copyright: Ripple Labs, Inc. 2016 ##
## ##
## Replaces underscores with dashes in h1,h2,... element IDs. This provides ##
## compatibility with some other flavors of markdown that generate HTML IDs ##
## differently. ##
################################################################################
import re
def filter_soup(soup):
"""replace underscores with dashes in h1,h2,etc. for backwards compatibility"""
headers = soup.find_all(name=re.compile("h[0-9]"), id=True)
for h in headers:
if "_" in h["id"]:
h["id"] = h["id"].replace("_", "-")

View File

@@ -0,0 +1,77 @@
and/or: __ or __ or both
addressees are requested: (omit), please
adjacent to: next to
adversely impact on: hurt, set back
afford an opportunity: allow, let
a number of: some
arrive onboard: arrive
as a means of: to
as prescribed by: in, under
at the present time: at present, now
be advised: (omit)
by means of: by, with
close proximity: near
cold wallet: issuing address
combat environment: combat
comply with: follow
due to the fact that: due to, since
during the period: during
effect modifications: make changes
failed to: didn't
for a period of: for
has a requirement for: needs
hot wallet: operational address
in accordance with: by, following, per, under
in addition: also, besides, too
in an effort to: to
in a timely manner: on time, promptly
incumbent upon: must
in lieu of: instead
in order that: for, so
inter alia: (omit)
interpose no objection: don't object
in order to: to
in regard to: about, concerning, on
in relation to: about, with, to
in the amount of: for
in the event of: if
in the near future: shortly, soon
in the process of: (omit)
in view of: since
in view of the above: so
inasmuch as: since
is applicable to: applies to
is authorized to: may
is in consonance with: agrees with, follows
is responsible for: (omit) handles
it appears: seems
it is: (omit)
it is essential: must, need to
it is requested: please, we request, I request
limited number: limits
market maker: liquidity provider
not later than: by, before
not later than: by
pertaining to: about, of, on
prior to: before
provided that: if
provides guidance for: guides
pursuant to: by, following, per, under
relative to: about, on
ripple network: Ripple Consensus Ledger, Ripple peer-to-peer network
set forth in: in
similar to: like
successfully complete: complete, pass
take action to: (omit)
the month of: (omit)
there are: (omit)
there is: (omit)
the undersigned: I
the use of: (omit)
time period: (either one)
under the provisions of: under
until such time as: until
warm wallet: standby address
with reference to: about
with the exception of: except for
your office: you

View File

@@ -0,0 +1,165 @@
accompany: go with
accomplish: carry out, do
accorded: given
accordingly: so
accrue: add, gain
accurate: correct, exact, right
additional: added, more, other
addressees: you
advantageous: helpful
advise: recommend, tell
aircraft: plane
allocate: divide
anticipate: expect
api's: APIs
apparent: clear, plain
appreciable: many
appropriate: (omit), proper, right
approximate: about
ascertain: find out, learn
assist: aid, help
assistance: aid, help
attain: meet
attempt: try
basically: (omit)
benefit: help
capability: ability
caveat: warning
clearly: (omit)
combined: joint
commence: begin, start
component: part
comprise: form, include, make up
concerning: about, on
consequently: so
consolidate: combine, join, merge
constitutes: is, forms, makes up
contains: has
convene: meet
currently: (omit), now
deem: believe, consider, think
delete: cut, drop
demonstrate: prove, show
depart: leave
designate: appoint, choose, name
desire: want, wish
determine: decide, figure, find
disclose: show
discontinue: drop, stop
disseminate: give, issue, pass, send
easy: (omit)
easily: (omit)
elect: choose, pick
eliminate: cut, drop, end
employ: use
encounter: meet
endeavor: try
ensure: make sure
enumerate: count
equipments: equipment
equitable: fair
establish: set up, prove, show
evidenced: showed
evident: clear
exhibit: show
expedite: hasten, speed up
expeditious: fast, quick
expend: spend
expertise: ability
expiration: end
facilitate: ease, help
feasible: can be done, workable
females: women
finalize: complete, finish
forfeit: give up, lose
forward: send
frequently: often
furnish: give, send
gateway: financial institution
herein: here
heretofore: until now
herewith: below, here
identical: same
identify: find, name, show
immediately: at once
impacted: affected, changed
implement: carry out, start
inception: start
indicate: show, write down
indication: sign
initial: first
initiate: start
insane: (omit)
ious: issuances
just: (omit)
liaison: discussion
magnitude: size
maintain: keep, support
methodology: method
minimize: decrease, method
modify: change
monitor: check, watch
necessitate: cause, need
notify: let know, tell
notwithstanding: inspite of, still
numerous: many
objective: aim, goal
obligate: bind, compel
observe: see
obviously: (omit)
operate: run, use, work
optimum: best, greatest, most
participate: take part
perform: do
permit: let
portion: part
possess: have, own
practicable: practical
preclude: prevent
previous: earlier
previously: before
prioritize: rank
proceed: do, go ahead, try
procure: (omit)
proficiency: skill
promulgate: issue, publish
provide: give, offer, say
purchase: buy
reflect: say, show
regarding: about, of, on
relocate: move
remain: stay
remainder: rest
remuneration: pay, payment
render: give, make
represents: is
require: must, need
requirement: need
reside: live
retain: keep
selection: choice
sexy: (omit)
simple: (omit)
simply: (omit)
solicit: ask for, request
state-of-the-art: latest
subject: the, this, your
subsequent: later, next
subsequently: after, later, then
substantial: large, much
sufficient: enough
terminate: end, stop
therefore: so
therein: there
thereof: its, their
this activity, command: us, we
timely: prompt
transmit: send
utilize: use
utilization: use
viable: practical, workable
vice: instead of, versus
wallet: address, client application, account
warrant: call for, permit
whereas: because, since
witnessed: saw