Matching JSON nodes in Erlang.
Kind of regular expression applied to JSON documents.
- Find if a JSON document has some structural properties, and possibly extract some information.
- Useful to extract small data pieces from large JSON documents.
- Efficient filtering of JSON nodes in real time.
Backends for jsone, jsx, jiffy and mochijson2.
Add a dependency to ejpet
and possibly to a supported JSON
codec in your project dependency set.
- With
rebar3
, inrebar.config
file
{deps, [
%% ...
{ejpet, ".*", {git, "git://github.com/nmichel/ejpet.git", {tag, "0.7.0"}},
{jsx, ".*", {git, "https://github.com/talentdeficit/jsx.git", {tag, "v2.8.3"}},
%% ...
]}.
- With
mix
, inmix.exs
file
defmodule MyProject.Mixfile do
use Mix.Project
def project do
[
# ...
deps: deps()
# ...
]
end
defp deps() do
[
# ...
{:ejpet, "~> 0.7.0"},
{:jsx, "~> 2.8"},
# ...
]
end
end
Clone
$ git clone [email protected]:nmichel/ejpet.git
Build
$ cd ejpet
$ ./rebar get-deps
$ make && make test
Start Erlang shell
erl -pz ./ebin ./deps/*/ebin
Read some JSON data
1> {ok, Data} = file:read_file("./test/channels_list.json").
{ok,<<239,187,191,91,13,10,32,32,32,32,123,13,10,32,32,
32,32,32,32,32,32,34,110,117,109,98,101,...>>}
Decode JSON using, say, jsx (provided you have jsx
in your load path)
2> Node = jsx:decode(Data).
[[{<<"number">>,1},
{<<"lcn">>,2},
{<<"name">>,<<"France 2">>},
{<<"sap_group">>,<<>>},
{<<"ip_multicast">>,<<"239.100.10.1">>},
{<<"port_multicast">>,1234},
{<<"num_clients">>,0},
{<<"scrambling_ratio">>,0},
{<<"is_up">>,1},
{<<"pcr_pid">>,120},
{<<"pmt_version">>,4},
{<<"unicast_port">>,0},
{<<"service_id">>,257},
{<<"service_type">>,
<<"Please report : Unknown service type doc : EN 30"...>>},
{<<"pids_num">>,7},
{<<"pids">>,
...
Ok. Now define what we are looking for, and what we want to get
Find somewhere in a list, an object with
* a {"ip_multicast", "239.100.10.4"} pair
* a key "pcr_pid", whatever value captured in variable "pcr",
* a key "pids", which value is either a list or an object into which there are
* an object with
* a key "language" which value matches regex "^fr",
* a key "number", whatever value captured in variable "apid"
* a key "type", whatever value captured in variable "acodec"
* an object with
* a key "type", which value matches regex "Video" captured in variable "vcodec"
* a key "number", whatever value captured in variable "vpid"
3> O = ejpet:compile("[*, {\"ip_multicast\":\"239.100.10.4\",
\"pcr_pid\":(?<pcr>_),
\"pids\":<{\"language\": #\"^fr\",
\"number\": (?<apid>_),
\"type\": (?<acodec>_)},
{\"type\": (?<vcodec>#\"Video\"),
\"number\": (?<vpid>_)}>}, *]", jsx).
{ejpet,jsx,#Fun<ejpet_jsx_generators.9.11467207>}
Run and seek ...
4> ejpet:run(Node, O).
Here you are !
{true,[{"vpid",520},
{"vcodec",[<<"Video (MPEG2)">>]},
{"acodec",[<<"Audio (MPEG1)">>]},
{"apid",530},
{"pcr",520}]}
Express what you want to match using a simple expression language.
pattern | match ? | Notes |
---|---|---|
true |
true |
|
false |
false |
|
null |
null |
|
"string" |
the string "string" |
UTF-8 encoded string (with escaping) |
#"regex" |
any string matching regex "regex" |
UTF-8 encoded string (no escaping) |
number |
the number number e.g. (42 , 3.14159 , -3395.1264e-22 ) |
|
{ kv* } |
object for which all kv (key/value) patterns are matched | Order does not matter |
[ item* (, *)?] |
list for which all item patterns are matched | Order DOES matter |
< value* > |
value set (list, or object values) for which all value patterns are matched | Order does not matter |
< value* >/g |
same as previous but search for ALL matches. Useful only when capturing | Order does not matter |
<! value* !> |
same as < value* > but search deep. |
|
<! value* !>/g |
same as previous but search for ALL matches. Useful only when capturing | |
(?<name>expr) |
capture expression expr in return value name |
Every JSON expression may be captured |
(!<name>type) |
match json object of type type against parameter named name |
kv
may be one of the form
- _:pattern
"key"
:_
"key"
:pattern
item
may be one of the form
*,
pattern- pattern
value
is a pattern
kv
, item
and value
are separated by ,
.
In parameter injection type
may be
number
boolean
string
regex
number
matching may be strict or loose, depending on an option passed are compile-time.
1> ejpet:match(<<"42.0">>, "42").
{true,<<"{}">>}
2> ejpet:match(<<"42.0">>, "42", [{number_strict_match, true}]).
{false,<<"{}">>}
string
and regex
are UTF-8 encoded byte streams.
They may contain escaping sequences, as in "\\b"
, or "\u00E9"
. When found in a string
these sequences are interpreted by default (but they may be left as-is with option string_apply_escape_sequence
set to false
). Found in regex
they are not interpreted.
3> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, true}]).
{true,<<"{}">>}
4> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{false,<<"{}">>}
5> ejpet:match(<<"\"\\\\u00E9\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{true,<<"{}">>}
Codepoint produced by evaluating an escape sequence of the form \uABCD
is NOT checked. One can insert any codepoint, valid or not, in a string or regex.
Every pattern p
can be captured by simply substituing it by (?<variable_name>p)
. Captures are returned as a JSON object, where each variable_name
ìs a key, and the list if captures found for that variable is the value.
This JSON object is build with repect to the backend indicated when compiling the pattern.
Warning : if there is no captures to return, the empty JSON object {}
will be returned. But its actual form depends on the backend.
- jsx:
[{}]
- jiffy:
{[]}
- mochijson:
{struct, []}
- jsone:
#{}
One may wonder why return captures as a encoded JSON object. There is 2 reasons :
- captures objects are captured "as is" in the parsed document, i.e. in their encoded form. Using the backend encoding for the result is more coherent;
- capture JSON object can itself be pattern matched.
It is possible to provide some matching values at match-time, through parameter injection forms like (!<param_name>param_type)
, where param_type
may be number
, string
, boolean
and regex
.
At match-time, produced matching functions will look for an entry named param_name
in the provided parameters list. See ejpet:run/3
and ejpet:match/4
.
Note that string
values should be binaries, and regex
values MUST be mp()
opaque objects returned by re:compile/2
.
backend() = jsx | jiffy | mochijson2 | jsone
epm() = {ejpet, term(), term()}
expr_src() = string()
compile_option() = {string_apply_escape_sequence, boolean()}
| {number_strict_match, boolean()}
json_input() = string() | binary()
json_src() = binary()
json_term() = jsx_term() | jiffy_term() | mochijson2_term()
run_param_name = binary()
run_param_value = boolean() | number() | binary() | re::mp()
run_param = {run_param_name(), run_param_value()}
run_res() = {match_stat(), json_term()}
match_res() = {match_stat(), json_src()}
match_stat() = true | false
ejpet:decode(JSONText, Backend) -> json_term()
JSONText = json_input()
Backend = backend()
ejpet:encode(JSONTerm, Backend) -> json_term()
JSONTerm = json_term()
Backend = backend()
ejpet:compile(Expr, Backend, Options) -> epm()
Expr = expr_src()
Backend = backend()
Options = [Option]
Option = compile_option()
ejpet:compile(Expr, Backend) -> epm()
Same as ejpet:compile(Expr, Backend, [])
ejpet:compile(Expr) -> epm()
Same as ejpet:compile(Expr, jsx, [])
ejpet:backend(EPM) -> backend()
EPM = epm()
ejpet:run(JSONTerm, EPM, Params) -> run_res()
EPM = epm()
JSONTerm = json_term()
Params = [Param]
Param = run_param()
ejpet:run(JSONTerm, EPM) -> run_res()
Same pas ejpet:run(JSONTerm, EPM, [])
ejpet:match(JSONText, Expr, Options, Params) -> match_res()
JSONText = json_input()
Expr = expr_src() | epm()
Options = [Option]
Option = compile_option()
Params = [Param]
Param = run_param()
ejpet:match(JSONText, Expr, Options) -> match_res()
Same as ejpet:match(JSONText, Expr, Options, [])
ejpet:match(JSONText, Expr) -> match_res()
Same as ejpet:match(JSONText, Expr, [], [])
ejpet:get_status(Res) -> match_stat()
Res = run_res() | match_res()
get_captures(Res) -> json_term()
Res = run_res() | match_res()
get_capture(Res, Name) -> {ok, json_term()} | not_found
Same as get_captures(Res, Name, jsx)
get_capture(Res, Name, Backend) -> {ok, json_term()} | not_found
Res = run_res()
Name = string() | binary()
Backend = backend()
empty_capture_set() -> json_term()
Same as empty_capture_set(jsx)
empty_capture_set(Backend) -> json_term()
Backend = backend()
Expression | Match | No match | Code snippet |
---|---|---|---|
42 |
42 |
"42" , [42] , {"key": 42} |
ejpet:match(<<"42">>, "42"). |
"42" |
"42" |
42 , ["42"] , {"key": "42"} |
ejpet:match(<<"\"42\"">>, "\"42\""). |
true |
true |
"true" , [true] |
ejpet:match(<<"true">>, "true"). |
false |
false |
"false" , [false] |
ejpet:match(<<"false">>, "false"). |
null |
null |
"null" , [null] |
ejpet:match(<<"null">>, "null"). |
#"foo" |
"foobar" , "barfoo" |
"barfo" |
ejpet:match(<<"\"foobar\"">>, "#\"foo\""). |
#"^foo" |
"foobar" |
"barfoo" |
ejpet:match(<<"\"foobar\"">>, "#\"^foo\""). |
#"bar$" |
"foobar" |
"barfoo" |
ejpet:match(<<"\"foobar\"">>, "#\"bar$\""). |
Expression | Match | No match | Code snippet |
---|---|---|---|
{_:42} |
{"bar": 42} , {"bar": 47, "foo": 42} |
{"bar": 47} , {"foo": "42"} |
ejpet:match(<<"{\"foo\": 42}">>, "{_:42}"). |
{"foo":_} |
{"foo": 42} , {"bar": 42, "foo": {}} |
{"bar": "foo"} |
ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":_}"). |
{"foo":42} |
{"foo": 42} , {"bar": "42", "foo": 42} |
{"bar": 42, "foo": "42"} |
ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":42}"). |
{_:{"foo": 42}, "bar": {_:#"bar"}} |
{"neh": {"foo": 42}, "bar": {"nimp": "foobar"}} |
{"neh": {"notfoo": 42}, "bar": {"nimp": "foobar"}} |
ejpet:match(<<"{\"neh\": {\"foo\": 42}, \"bar\": {\"nimp\": \"foobar\"}}">>, "{_:{\"foo\": 42}, \"bar\": {_:#\"bar\"}}"). |
Expression | Match | No match | Code snippet |
---|---|---|---|
["42"] |
["42"] |
{"bar": "42"} , {"foo": 42} , [42] , ["42", "42"] |
ejpet:match(<<"[\"42\"]">>, "[\"42\"]"). |
[*, "42"] |
["42"] , ["42", "42"] , [true, "42"] |
{"bar": "42"} , {"foo": 42} , [42] , ["42", true] |
ejpet:match(<<"[true, \"42\"]">>, "[*, \"42\"]"). |
[*, "42", *] |
["42"] , ["42", "42"] , [true, "42"] , ["42", true] , [{}, "42", true] |
{"bar": "42"} , {"foo": 42} , [42] |
ejpet:match(<<"[true, \"42\", {}]">>, "[*, \"42\", *]"). |
[[42]] |
[[42]] |
[42] , [[42], 42] |
ejpet:match(<<"[[42]]">>, "[[42]]"). |
[*, [42]] |
[[42]] , ["42", [42]] |
[[42], 42] |
ejpet:match(<<"[\"42\", [42]]">>, "[*, [42]]"). |
[[42], *] |
[[42]] , [[42], 42] |
["42", [42]] |
ejpet:match(<<"[[42], \"42\"]">>, "[[42], *]"). |
Expression | Match | No match | Code snippet |
---|---|---|---|
<42> |
[42] , {"key": 42} |
42 , "42" |
ejpet:match(<<"{\"key\": 42}">>, "<42>"). |
<"42"> |
["42"] , {"bar": "42"} , [42, "42"] , ["42", 42] |
[42] , {"bar": 47} , {"foo": 42} |
ejpet:match(<<"{\"bar\": \"42\"}">>, "<\"42\">"). |
<!"42"!> |
["42"] , [true, "42"] , ["foo", ["42", true], {}] , [{}, {"foo": "42"}, true] , {"bar": "42"} , {"bar": {"foo": "42"}} |
"42" , {"foo": 42} , [42] |
ejpet:match(<<"[true, [null, {\"foo\": \"42\"}, \"bar\"], {}]">>, "<!\"42\"!>"). |
<!<!"42"!>!> |
[["42"]] , [{}, {"foo": "42"}, true] , {"bar": {"foo": "42"}} |
["42"] , {"bar": "42"} |
ejpet:match(<<"[{\"foo\":\"42\"}]">>, "<!<!\"42\"!>!>"). |
Expression | Test | Capture(s) | Code snippet |
---|---|---|---|
<!(?<subnode>{_:42})!> |
[{"foo": null}, {"foo": 42, "bar": {}}] |
subnode: [{"foo":42,"bar":{}}] |
ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "<!(?<subnode>{_:42})!>"). |
(?<all><!(?<subnode>{_:42})!>) |
[{"foo": null}, {"foo": 42, "bar": {}}] |
all: [[{"foo":null},{"foo":42,"bar":{}}]] ,subnode: [{"foo":42,"bar":{}}] |
ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)"). |
Expression | Test | Capture(s) | Code snippet |
---|---|---|---|
<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g |
[{"codec": "audio", "lang": "fr"}, {"codec": "video", "lang": "en"}, {"codec": "foo", "lang": "it"}] |
node: [{"codec":"audio","lang":"fr"}, {"codec":"video","lang":"en"}, {"codec":"foo","lang":"it"}] lang: ["fr", "en", "it"] |
ejpet:match(<<"[{\"codec\": \"audio\", \"lang\": \"fr\"}, {\"codec\":\"video\", \"lang\": \"en\"}, {\"codec\": \"foo\", \"lang\": \"it\"}]">>, <<"<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g">>) |
Expression | Test | parameters | Capture(s) | Code snippet |
---|---|---|---|---|
<(?<subnode>(!<what>number))> |
[41, 42, 43] |
[{<<"what">>, 42}] |
subnode: [42] |
ejpet:match(<<"[41, 42, 43]">>, "<(?<subnode>(!<what>number))>", [], [{<<"what">>, 42}]). |
In arrays above, captured values are expressed as "abstract JSON node", for illustration purpose. As explained previously, actual capture result depends on the API function used, and may be:
- serialized JSON nodes (as in the "Code snippet" column), with
ejpet:match()
1> ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").
{true,<<"{\"all\":[[{\"foo\":null},{\"foo\":42,\"bar\":{}}]],\"subnode\":[{\"foo\":42,\"bar\":{}}]}">>}
- (jsx | jiffy | mochijson2) JSON value, depending on the backend, for easier further processing, with
ejpet:run()
1> JSX = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", jsx, []).
{ejpet,jsx,#Fun<ejpet_jsx_generators.19.98422695>}
2> ejpet:run((ejpet:backend(JSX)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), JSX).
{true,[{"all",
[[[{<<"foo">>,null}],[{<<"foo">>,42},{<<"bar">>,[{}]}]]]},
{"subnode",[[{<<"foo">>,42},{<<"bar">>,[{}]}]]}]}
39> Mochi = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", mochijson2, []).
{ejpet,mochijson2,
#Fun<ejpet_mochijson2_generators.19.110863078>}
40> ejpet:run((ejpet:backend(Mochi)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), Mochi).
{true,{struct,[{<<"all">>,
[[{struct,[{<<"foo">>,null}]},
{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]]},
{<<"subnode">>,
[{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]}]}}