The JHTML format is a strict subset of HTML used to encode arbitrary JSON (or full JavaScript objects) within HTML. This library seeks to provide conversions while simultaneously validating the indicated JHTML structure(s).
Possible use cases include:
- Hierarchical data storage in a faithful, readily portable and readily viewable format.
- Allow building of data files within (schema-constrained) WYSIWYG editors
- Transforming JSON to XHTML for applying XSL, running XPath, CSS Selector, or DOM queries, etc.
JHTML ought to be round-trippable with canonical JSON except in the case when converting from object-containing JSON to JHTML when the ECMAScript/JSON interpreter does not iterate the properties in definition order (as ECMAScript interpreters are not obliged to do).
Note that when script tags of custom type are available (e.g., <script type="application/json">) it is probably easier to use them with JSON directly.
For representing XML as HTML, see hxml.
See a demo here.
Currently, comment (and processing instructions) and whitespace text nodes are allowed throughout, but any elements must be constrained to the expected types. For canonicalization, attributes beyond those explicitly allowed should not be present. Microdata might not care about hierarchy, but this specification adds such constraints.
- A top-level JSON string primitive will be encoded by the presence of
<span>
whose contents will be stringified into JSON upon serialization. - Other JSON primitives (
null
, boolean, or number) will be encoded within<i>
, whether at the top-level or elsewhere, with the exact type determined by the contained value (i.e., "null", "true", "false", and any of the allowable formats for a JSON number are the possible values). - JSON arrays (in whatever context) will be encoded as
<ol start="0">
whose individual child items (if any) will be represented by<li>
. Pure text content will indicate a string, whereas a single<i>
child will indicatenull
or a boolean or number type (as per the previous rule). A single<dl>
or<ol start="0">
child will indicate a child object or array respectively (see the next rule for object rules). - JSON objects (in whatever context) will be encoded as
<dl>
whose individual child items (if any) will be represented by alternating<dt>
/<dd>
pairs (only single instances are allowed for each within a pair).<dt>
will represent the keys of the object, whereas<dd>
will represent the values. Pure text content within<dd>
will indicate a string, whereas a single<i>
child of<dd>
will indicatenull
or a boolean or number type (as per the second rule). A single<dl>
or<ol>
child will indicate a child object or array respectively (see the previous rule for array rules). - The top-level element SHOULD include an XHTML namespace declaration
(
xmlns="http://www.w3.org/1999/xhtml"
) for polyglot compatibility and MUST contain the attributes,itemscope="" itemtype="http://brett-zamir.me/ns/microdata/json-as-html/2"
- Be as simple as possible while distinguishing types and be round-trippable (when using the valid subset of HTML) without picking up false positives (HTML markup which was not intended to represent JSON).
- Use mark-up which is as semantically clear as possible (e.g., ordered list well represents the concept of arrays, etc.).
- Be unambiguous in markup choice (e.g., while json2html offers display of "tabular" arrays in a manner distinct from other nested arrays, this would add ambiguity and complexity for a round-trippable format; we instead stick to always requiring nested ordered lists to represent tables).
- Minimize use of invisible mark-up which, if say used in a WYSIWYG editor would not be readily discovered and thus could suffer from undetected maintenance problems).
- Distinguish types visually without need for CSS: Requiring
null
, boolean, and numbers (if not object keys) to be within<i>
visually distinguishes them from strings of the same value. Although this adds some verbosity, and it would technically be possible with CSS to overcome this need, without it, bare HTML would not allow distinguishment between primitive types. Should ideally allow them to be distinguishable from each other as well, though this is not provided for in the current spec. - It should potentially be able to accommodate other JavaScript objects
(e.g.,
undefined
, function (viatoString()
, non-finite numbers, date objects, and regular expression objects ought to appear within <i> without ambiguity). - Visually distinguish depth of nesting.
npm install jhtml
var JHTML = require('jhtml');
<script src="jhtml.js"></script>
// The following code will look for all elements within the document
// belonging to the JHTML itemtype namespace (currently:
// http://brett-zamir.me/ns/microdata/json-as-html/1 ).
// Alternatively, one may supply the items as the first (and only)
// argument (there is no validation for namespace currently
// in such a case).
// These return a JSON array if multiple elements are found or a single object otherwise
JHTML.toJSONObject(); // returns a JSON object
JHTML.toJSONString(); // returns a JSON string
Note that if you wish to store the JHTML without displaying it,
you can enclose it within a <script type="jhtml">
element and
obtain the content via script (though you could also obtain
regular JSON in a similar manner or simply use JSON within
your JavaScript). Do not merely add the style display:none
as
this will still cause your JHTML content to display for users
who have disabled CSS.
If you intend to support older browsers, you will need polyfills for:
Array.prototype.map
Array.prototype.reduce
Element.prototype.textContent
Element.prototype.itemProp
HTMLDocument.prototype.getItems
Element.firstElementChild
- Reimplement JHTML.toJHTMLDOM() using JTLT (when ready))
- Reimplement JHTML.toJHTMLString() using JTLT (when ready))
- Define as ECMAScript 6 Module with polyfill plug-in
- Allow equivalents to
JSON.parse
's reviver orJSON.stringify
's replacer and space arguments?
The following might perhaps be allowed in conjunction with JSON Schema, although I would also like to allow optional encoding of non-JSON JavaScript objects as well.
- This could be expanded to support types like: URL, Date, etc.
- Support a special HTML-aware string type to allow arbitrary nested
HTML where JSON strings are expected (which might be encapsulated say
by a
<a>
). This could still convert to JSON, but as a string. - Could use itemid/itemref to encode linked references
The following may loosen requirements, but may not be desirable as they would allow expansion of the size of JHTML files.
- Loosen requirements to allow dropping the start attribute in
<ol start="0">
? For portable proper structural readability, however, this seems like it should stay, even though CSS can mimic the correct 0-indexed display. - Loosen requirements to allow
<span>
on string primitives (for parity with a string at the root) within object keys or object or array keys or values. Currently, the shortest possible expression is required behavior. - Allow
<table>
to be used in place of nested<ol>
arrays especially when there are only two dimensions and the arrays are known to be of equal length at each level (any<thead>
for visual purposes only but not converted to JSON?).
The following are possible tightening or other breaking changes:
- Disallow comment and processing instruction nodes? Despite the precedent with JSON disallowing comments, I am partial to allowing comment nodes in JHTML, despite the burden on implementers, as it is extremely convenient to be able to include such information within data files. Of course, they will not be round-trippable with JSON (unless encoded as a legitimate part of the JSON object) since JSON disallows comments.
- Require primitives to be within
<data>
elements (but the HTML spec currently requires avalue
attribute which would be redundant with the human-readable value). - Change the Microdata attributes on the root to "data-*" attributes since the information is not necessarily semantic (and if it is, it is semantic to the specific JSON format). Although the "data-*" attributes are supposed to only have meaning within the application (e.g., not to be interpreted in a special way by search engines perhaps), their use would not imply that tools could not parse them in a similar manner.
- Move the
itemtype
properties to a container element such as<a>
to avoid the need for an inconsistency with string requiring<span>
at the top level. - For
null
, booleans, and numbers, change<i>
to<code>
(or optionally to<code class="language-javascript">
as specifically allowed by the spec ) for greater semantic accuracy (but at a cost of simplicity and slightly different presentation). - Allow styling hooks to allow distinguishing between
null
, booleans, and numbers (orundefined
, non-finite numbers, and functions)
The following are other possible changes:
- Change the itemtype namespace if standardized
- Allow multiple
<dd>
's if taken to mean array children? (Probably more confusing even if more succinct than requiring a child<ol>
). - Anything else that comes up out of consultation with others (although I intend to change the namespace upon any breaking changes).
- Switch from internal SAJJ dependency to external dj dependency.
npm install
npm test
or, with nodeunit
installed globally:
npm install
nodeunit test
For browser testing, open test/test.html.
JHTML was inspired by Netscape bookmark files as used when exporting bookmarks
in Firefox. They brought to my attention that <dl>
could be used to represent
nestable key-value data hierarchies as also found in JSON objects.