Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle special characters such as '&' #297

Closed
5 of 6 tasks
ishong opened this issue Oct 23, 2020 · 9 comments
Closed
5 of 6 tasks

Handle special characters such as '&' #297

ishong opened this issue Oct 23, 2020 · 9 comments
Labels
Feature-Request New features suggested by users v4

Comments

@ishong
Copy link

ishong commented Oct 23, 2020

  • Are you running the latest version?
  • Have you included sample input, output, error, and expected output?
  • Have you checked if you are using correct configuration?
  • Did you try online tool?

Description

fast-xml-parser does not handle special characters of tag values.

I can use the option, but I think the default behavior can be changed.

tagValueProcessor: (tagValue) => {
            return tagValue.replace(/&lt;/g, '<').replace(/&amp;/g, '&').replace(/&gt;/g, '>');
        },

Input

<abc>a &amp; b</abc>

Code

parser.parse('<abc>a &amp; b</abc>', { trimValues: false, parseTrueNumberOnly: true });

Output

{
   "abc":"a &amp; b"
}

expected data

{
   "abc":"a & b"
}

Would you like to work on this issue?

  • Yes
  • No

Bookmark this repository for further updates.

@amitguptagwl amitguptagwl added Feature-Request New features suggested by users v4 labels Oct 23, 2020
@amitguptagwl
Copy link
Member

HTML Ampersand Code are currently not supported. This feature is in plan to add along with more features for HTML like empty tag

@ishong
Copy link
Author

ishong commented Oct 23, 2020

Thank you. @amitguptagwl
Three special characters must be escaped to meet the XML specification.
Ref. https://www.w3.org/TR/xml/#syntax
The Validating XML function also should meet this spec.

@gwicksted
Copy link

Note regarding original implementation in the Description listed above: do not do &amp; prior to other entity processing or it will incorrectly decode &amp;gt; to > when it should be &gt;. You will also want to add an attrValueProcessor to process &quot; and &apos;.

Also, depending on what you're interfacing with, you may wish to implement &#nnnn; and &#xhhhh where nnnn is the unicode decimal representation of a character and hhhh is the unicode hexidecimal representation of a character. These are variable-length up to 4 hexidecimal characters to support all of UTF-16. I don't believe there is an official spec for UTF-32 in XML which can have up to 8 hexidecimal characters. That said, most XML processing libraries send these characters verbatim using a well-known encoding (typically UTF-8), a UTF-16 BOM (typically found in UTF-16 files used to indicate little or big-endian UTF-16), or via the xml encoding preprocessor declaration.

@amitguptagwl
Copy link
Member

Please check with v4.0.0-beta.3

@FunctionDJ
Copy link

@amitguptagwl

> builder.build({ f: { "@a": "a & b" } })
'<f><@a>a & b</@a></f>'

It doesn't encode the ampersand into &amp; in version 4.0.0-beta.5

@amitguptagwl
Copy link
Member

amitguptagwl commented Dec 5, 2021

@FunctionDJ Builder supports following entities: >, <, ', "

{
    "f": {
        "@a": "a > b"
    }
}
<f>
  <@a>
    a &gt; b
  </@a>
</f>

@FunctionDJ
Copy link

Yeah that's unfortunate. The software i'm generating an XML for can't handle unescaped ampersands. I'll see if i can use the builder options to replace the ampersand, but right now i'm simply replacing all instances with a Regex, kinda dirty.
xml2js generates escaped ampersands, but it's a lot slower.

@amitguptagwl
Copy link
Member

The reason parser needs to escape & so that it can't form an entity. This is not the case with FXP. Hence. we've avoided unnecessary replacement. However, I'll include that in next version soon. Till then, you can achieve that using attribute or tag value processors, I believe.

@FunctionDJ
Copy link

@amitguptagwl Thank you very much!! The regex replace works for my current case, but using a processor is definitely a better idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature-Request New features suggested by users v4
Projects
None yet
Development

No branches or pull requests

4 participants