Hypertext Abstract Syntax Tree format.
HAST discloses HTML as an abstract syntax tree. Abstract means not all information is stored in this tree and an exact replica of the original document cannot be re-created. Syntax Tree means syntax is present in the tree, thus an exact syntactic document can be re-created.
The reason for introducing a new “virtual” DOM is primarily:
- The DOM is very heavy to implement outside of the browser, a lean and stripped down virtual DOM can be used everywhere
- Most virtual DOMs do not focus on ease of use in transformations
- Other virtual DOMs cannot represent the syntax of HTML in its entirety (think comments, document types, and character data)
- Neither HTML nor virtual DOMs focus on positional information
HAST is a subset of Unist and implemented by rehype.
This document may not be released. See releases for released
documents. The latest released version is 2.2.0
.
hastscript
— Hyperscript compatible DSL for creating nodeshast-to-hyperscript
— Convert a Node to React, Virtual DOM, Hyperscript, and morehast-util-assert
— Assert HAST nodeshast-util-class-list
— Simulate the browser'sclassList
API for HAST nodeshast-util-embedded
— Check ifnode
is embedded contenthast-util-find-and-replace
— Find and replace texthast-util-from-parse5
— Transform Parse5’s AST to HASThast-util-from-string
— Set the plain-text value of a nodehast-util-has-property
— Check if a node has a propertyhast-util-heading
— Check if a node is heading contenthast-util-interactive
— Check if a node is interactivehast-util-is-body-ok-link
— Check if alink
element is “Body OK”hast-util-is-conditional-comment
— Check ifnode
is a conditional commenthast-util-is-css-link
— Check ifnode
is a CSSlink
hast-util-is-css-style
— Check ifnode
is a CSSstyle
hast-util-is-element
— Check ifnode
is a (certain) elementhast-util-is-event-handler
— Check ifproperty
is an event handlerhast-util-is-javascript
— Check ifnode
is a JavaScriptscript
hast-util-labelable
— Check ifnode
is labelablehast-util-menu-state
— Check the state of a menu elementhast-util-parse-selector
— Create an element from a simple CSS selectorhast-util-phrasing
— Check if a node is phrasing contenthast-util-raw
— Reparse a HAST treehast-util-sanitize
— Sanitise nodeshast-util-select
—querySelector
,querySelectorAll
, andmatches
hast-util-script-supporting
— Check ifnode
is script-supporting contenthast-util-sectioning
— Check ifnode
is sectioning contenthast-util-table-cell-style
— Transform deprecated styling attributes on table cells to inline styleshast-util-to-html
— Stringify nodes to HTMLhast-util-to-mdast
— Transform HAST to MDASThast-util-to-nlcst
— Transform HAST to NLCSThast-util-to-parse5
— Transform HAST to Parse5’s ASThast-util-to-string
— Get the plain-text value of a nodehast-util-transparent
— Check ifnode
is transparent contenthast-util-whitespace
— Check ifnode
is inter-element whitespace
See the List of Unist Utilities for projects which work with HAST nodes too.
a-rel
— List of link types forrel
ona
/area
aria-attributes
— List of ARIA attributescollapse-white-space
— Replace multiple white-space characters with a single spacecomma-separated-tokens
— Parse/stringify comma-separated tokenshtml-tag-names
— List of HTML tag-nameshtml-dangerous-encodings
— List of dangerous HTML character encoding labelshtml-encodings
— List of HTML character encoding labelshtml-element-attributes
— Map of HTML attributeshtml-void-elements
— List of void HTML tag-nameslink-rel
— List of link types forrel
onlink
mathml-tag-names
— List of MathML tag-namesmeta-name
— List of values forname
onmeta
property-information
— Information on HTML propertiesspace-separated-tokens
— Parse/stringify space-separated tokenssvg-tag-names
— List of SVG tag-namessvg-element-attributes
— Map of SVG attributesweb-namespaces
— Map of web namespaces
Root (Parent) houses all nodes.
interface Root <: Parent {
type: "root";
}
Element (Parent) represents an HTML Element. For example,
a div
. HAST Elements corresponds to the HTML Element
interface.
One element is special, and comes with another property: <template>
with
content
. The contents of a template element is not exposed through its
children
, like other elements, but instead on a content
property which
houses a Root
node.
<noscript>
elements should house their tree in the same way as other elements,
as if scripting was not enabled.
interface Element <: Parent {
type: "element";
tagName: string;
properties: Properties;
content: Root?;
}
For example, the following HTML:
<a href="http://alpha.com" class="bravo" download></a>
Yields:
{
"type": "element",
"tagName": "a",
"properties": {
"href": "http://alpha.com",
"id": "bravo",
"className": ["bravo"],
"download": true
},
"children": []
}
A dictionary of property names to property values. Most virtual DOMs
require a disambiguation between attributes
and properties
. HAST
does not and defers this to compilers.
interface Properties {}
Property names are keys on properties
objects and
reflect HTML, SVG, ARIA, XML, XMLNS, or XLink attribute names.
Often, they have the same value as the corresponding attribute
(for example, id
is a property name reflecting the id
attribute
name), but there are some notable differences.
These rules aren’t simple. Use
hastscript
(orproperty-information
directly) to help.
The following rules are used to disambiguate the names of attributes and their corresponding HAST property name. These rules are based on how ARIA is reflected in the DOM, and differs from how some (older) HTML attributes are reflected in the DOM.
- Any name referencing a combinations of multiple words (such as “stroke
miter limit”) becomes a camel-cased property name capitalising each word
boundary.
This includes combinations that are sometimes written as several words.
For example,
stroke-miterlimit
becomesstrokeMiterLimit
,autocorrect
becomesautoCorrect
, andallowfullscreen
becomesallowFullScreen
. - Any name that can be hyphenated, becomes a camel-cased property name
capitalising each boundary.
For example, “read-only” becomes
readOnly
. - Compound words that are not used with spaces or hyphens are treated as a normal word and the previous rules apply. For example, “placeholder”, “strikethrough”, and “playback” stay the same.
- Acronyms in names are treated as a normal word and the previous rules apply.
For example,
itemid
becomeitemId
andbgcolor
becomesbgColor
.
Some jargon is seen as one word even though it may not be seen as such by
dictionaries.
For example, nohref
becomes noHref
, playsinline
becomes playsInline
,
and accept-charset
becomes acceptCharset
.
The HTML attributes class
and for
respectively become className
and
htmlFor
in alignment with the DOM.
No other attributes gain different names as properties, other than a change in
casing.
The HAST rules for property names differ from how HTML is reflected in the DOM for the following attributes:
View list of differences
charoff
becomescharOff
(notchOff
)char
stayschar
(does not becomech
)rel
staysrel
(does not becomerelList
)checked
stayschecked
(does not becomedefaultChecked
)muted
staysmuted
(does not becomedefaultMuted
)value
staysvalue
(does not becomedefaultValue
)selected
staysselected
(does not becomedefaultSelected
)char
stayschar
(does not becomech
)allowfullscreen
becomesallowFullScreen
(notallowFullscreen
)hreflang
becomeshrefLang
, nothreflang
autoplay
becomesautoPlay
, notautoplay
autocomplete
becomesautoComplete
(notautocomplete
)autofocus
becomesautoFocus
, notautofocus
enctype
becomesencType
, notenctype
formenctype
becomesformEncType
(notformEnctype
)vspace
becomesvSpace
, notvspace
hspace
becomeshSpace
, nothspace
lowsrc
becomeslowSrc
, notlowsrc
Property values should reflect the data type determined by their
property name. For example, the following HTML <div hidden></div>
contains a hidden
(boolean) attribute, which is reflected as a hidden
property name set to true
(boolean) as value in HAST, and
<input minlength="5">
, which contains a minlength
(valid
integer) attribute, is reflected as a property minLength
set to 5
(number) in HAST.
In JSON, the value
null
must be treated as if the property was not included. In JavaScript, bothnull
andundefined
must be similarly ignored.
The DOM is strict in reflecting those properties, and HAST is not,
where the DOM treats <div hidden=no></div>
as having a true
(boolean) value for the hidden
attribute, and <img width="yes">
as having a 0
(number) value for the width
attribute, these should
be reflected as 'no'
and 'yes'
, respectively, in HAST.
The reason for this is to allow plug-ins and utilities to inspect these non-standard values.
The DOM also specifies comma- and space-separated lists attribute
values. In HAST, these should be treated as ordered lists.
For example, <div class="alpha bravo"></div>
is represented as
['alpha', 'bravo']
.
There’s no special format for
style
.
Doctype (Node) defines the type of the document.
interface Doctype <: Node {
type: "doctype";
name: string;
public: string?;
system: string?;
}
For example, the following HTML:
<!DOCTYPE html>
Yields:
{
"type": "doctype",
"name": "html",
"public": null,
"system": null
}
Comment (Text) represents embedded information.
interface Comment <: Text {
type: "comment";
}
For example, the following HTML:
<!--Charlie-->
Yields:
{
"type": "comment",
"value": "Charlie"
}
TextNode (Text) represents everything that is text.
Note that its type
property is text
, but it is different
from the abstract Unist interface Text.
interface TextNode <: Text {
type: "text";
}
For example, the following HTML:
<span>Foxtrot</span>
Yields:
{
"type": "element",
"tagName": "span",
"properties": {},
"children": [{
"type": "text",
"value": "Foxtrot"
}]
}
hast is built by people just like you! Check out
contribute.md
for ways to get started.
This project has a Code of Conduct. By interacting with this repository, organisation, or community you agree to abide by its terms.
Want to chat with the community and contributors? Join us in Gitter!
Have an idea for a cool new utility or tool? That’s great! If you want
feedback, help, or just to share it with the world you can do so by creating
an issue in the syntax-tree/ideas
repository!
The initial release of this project was authored by @wooorm.
Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!
Thanks to @kthjm @KyleAMathews, @rhysd, @Rokt33r, @s1n, @Sarah-Seo, @sethvincent, and @simov for contributing commits since!