Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Extra content at the end of the document #161

Merged
merged 2 commits into from
Jul 7, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Jul 4, 2024

Why?

XML with additional content at the end of the document is invalid.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc

[27]    Misc       ::=          Comment | PI | S

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI

[16]    PI         ::=          '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget

[17]    PITarget           ::=          Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

## Why?

XML with multiple root elements is invalid.

See: ruby#160 (comment)
lib/rexml/parsers/baseparser.rb Outdated Show resolved Hide resolved
lib/rexml/parsers/baseparser.rb Outdated Show resolved Hide resolved
lib/rexml/parsers/baseparser.rb Outdated Show resolved Hide resolved
test/parser/test_base_parser.rb Outdated Show resolved Hide resolved
test/parser/test_base_parser.rb Outdated Show resolved Hide resolved
test/parser/test_base_parser.rb Outdated Show resolved Hide resolved
test/parser/test_base_parser.rb Outdated Show resolved Hide resolved
@naitoh naitoh force-pushed the fix_multiple_root_elements branch from 641d9d1 to 4e9de51 Compare July 5, 2024 04:19
@naitoh naitoh requested a review from kou July 5, 2024 04:30
naitoh added a commit to naitoh/rexml that referenced this pull request Jul 6, 2024
## Why?
XML declaration must be the first item.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   prolog   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl

```
[23]   XMLDecl  ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

See: ruby#161 (comment)
test/parse/test_processing_instruction.rb Outdated Show resolved Hide resolved
test/parse/test_comment.rb Outdated Show resolved Hide resolved
test/parse/test_element.rb Outdated Show resolved Hide resolved
test/parse/test_element.rb Outdated Show resolved Hide resolved
test/parse/test_text.rb Outdated Show resolved Hide resolved
return [ :start_element, tag, attributes ]
end
else
text = @source.read_until("<")
if text.chomp!("<")
@source.position -= "<".bytesize
end
if @tags.empty? and @have_root
if text.strip != ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip allocates a new string. Can we avoid it?
For example: /\A\s*\z/.match?(text)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
I see.

## Why?

XML with additional content at the end of the document is invalid.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc

```
[27]   	Misc	   ::=   	Comment | PI | S
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI

```
[16]   	PI	   ::=   	'<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget

```
[17]   	PITarget	   ::=   	Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
```
@naitoh naitoh force-pushed the fix_multiple_root_elements branch from 4e9de51 to c094825 Compare July 7, 2024 00:40
@naitoh naitoh requested a review from kou July 7, 2024 00:45
naitoh added a commit to naitoh/rexml that referenced this pull request Jul 7, 2024
## Why?
XML declaration must be the first item.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   prolog   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl

```
[23]   XMLDecl  ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

See: ruby#161 (comment)
@kou kou merged commit eb45c8d into ruby:master Jul 7, 2024
61 checks passed
@kou
Copy link
Member

kou commented Jul 7, 2024

Thanks.

kou pushed a commit that referenced this pull request Jul 7, 2024
## Why?
XML declaration must be the first item.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   prolog   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl

```
[23]   XMLDecl  ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

See: #161 (comment)
@naitoh naitoh deleted the fix_multiple_root_elements branch July 7, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants