-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #306 #316
Fix issue #306 #316
Conversation
@piponazo Thanks for working on this. I will review this on Sunday evening. We're having a family gathering on Sunday for our son's partner's birthday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luis
I haven't run the fuzzed file through the debugger, so I don't know the root issue. However, by inspecting the code, I believe you're modifying the code to say "if languageText.size()==0 then there's no translatedString".
I don't interpret the specification to say this and the original code appears to implement the specification:
4.2.3.3. iTXt International textual data
This chunk is semantically equivalent to the tEXt and zTXt chunks, but the textual data is in the UTF-8 encoding of the Unicode character set instead of Latin-1. This chunk contains:
Keyword: 1-79 bytes (character string)
Null separator: 1 byte
Compression flag: 1 byte
Compression method: 1 byte
Language tag: 0 or more bytes (character string)
Null separator: 1 byte
Translated keyword: 0 or more bytes
Null separator: 1 byte
Text: 0 or more bytes
I don't recall working on this code and don't know if we have a test file with an iTxT Chunk.
I think the "fuzzed file" has deliberately omitted a NULL separator and we should throw kerCorruptedMetadata
I've built the code with -fsanitize=address and reproduced the issue. I'll step the code in the debugger tomorrow and provide more feedback.
According to the specification:
This would be related to our
I debugged the code with the POC provided in #306 and what we have is:
Then the next line of code (the old one):
Was causing the crash. It's true that I made an assumption. The code now says: if the I actually tried with one valid PNG image with several text chunks that I got from this webpage: I think it would be also useful to include that small image in the test suite to check that we do not break this code in the future. I think we are not testing that part of the code right now. Anyways, let me know if you have more insights about this. The specification is not clear about what to do when the Language Tag is empty. |
Luis: You're always very persuasive. I promise to step this tomorrow in debugger and share my thoughts with you. I agree that this file should be in the test suite. For sure we shouldn't crash! |
@piponazo I've spent 90 minutes in the debugger with this puzzle. Interesting stuff. I haven't solved this, however I'm making progress. I'm using the following test file:
Here's the state of my code at the moment:
And the output from std::cout is as follows. It's XML, not compressed. And why does it start with
Ummm. I need more time to unpuzzle this. Alison and I will be out tomorrow afternoon+evening. I promise to look again at this on Wednesday morning. If you have a flash of inspiration, let me know. This is a fun puzzle and we'll be delighted when we solve it. |
@clanmills I think the crash does not come from
Maybe you were trying directly on my branch instead of doing it on the master one? After reading again the specification I am still not sure how this is happening. That line of code should correspond to:
I think the problem is that there is a missing null separator, that should be the one determining when to finish the reading of The only way I can think of fixing this issue is to go through all the data buffer searching for the right amount of null-separators. |
Thanks for the feedback on this. I'll look at this tomorrow morning and use your branch. What you've said about search the buffer for the NUL separators sounds correct to me. That's what I suggested on Sunday evening. If we don't find the correct number of NULs we should throw kerCorruptedMetadata Incidentally, we do have images in the test suite with iTxT chunks. They appear to be uncompressed XMP, so your thought that the compressed chunk code isn't commonly executed by the test suite is probably correct.
|
I have changed the code to check the number of null separators:
The reason to start from This implementation solves the problem described in #306 and keep the code working for other images with valid PNG chunks. |
I have run the POC file through a debugger and can confirm @piponazo's observation: the problem is that in line 175: std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1)); we construct a new std::string from a non-null terminated character array. Since the constructor of std::string searches for the first Btw, this problem can be verified quite easily with gdb, set a breakpoint before the problematic string and immediatly before the crashing line, run:
which will result in an ASAN crash. That usually indicates a not properly terminated C string, as strlen reads beyond its bounds when searching for the first Imho the best solution for this would be to use the following std::string constructor: basic_string( const CharT* s,
size_type count,
const Allocator& alloc = Allocator() ); and I think something like this could work (for the // keysize + 3 could be larger than the dataBuf, not sure whether the Safe::add is necessary
enforce(Safe::add(keysize, 3) < data.size_, kerCorruptedMetadata);
// find the first \0 after the start of languageText
const byte * languageTextTermination = std::find(data.pData_ + keysize + 3, data.pData_ + data.size_, 0);
// if find returned the end iterator => no \0 found
const size_t languageTextLen = languageTextTermination == data.pData + data.size_ ? data.size_ : data.pData + keysize + 3 - languageTextTermination;
assert(languageTextLen <= static_cast<size_t>(data.size_));
std::string languageText(reinterpret_cast<const char*>(data.pData_ + keysize + 3), languageTextLen); This should however probably become a function itself. Not really efficient, but better safe than sorry. |
@piponazo Sorry, I have not properly read your comment. Your method should if course work too, albeit it will reject not null terminated strings (but I guess that's fine, since the standard specifically mandates null termination). |
I do not see the point of using As you pointed out, the problem with the POC provided in #306 is that there is not a null character in the buffer at that point. The code that I introduced is checking for at least 2 null separators once we start analysing the Language Tag:
If you accept this solution I will squash the commits and try to simplify the current solution using some STL algorithm instead of writing pure C code directly. |
My idea with using Using my idea would however only make sense if we want to tolerate not null terminated strings in the png chunk. Don't know if that makes sense or not. @clanmills Do you happen to know if that's a common issue? @piponazo I'll review your solution. |
According to the specification LanguageTag and TranslatedKeyword must have a null separator. We should not change that. Later on, in the if/else blocks checking the compressionFlag and compressionMethod we are reading the Text that can contains or not null characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't think of a way how to circumvent the safety you added, so this is more than fine by me.
src/pngchunk_int.cpp
Outdated
@@ -162,6 +163,15 @@ namespace Exiv2 { | |||
} | |||
else if(type == iTXt_Chunk) | |||
{ | |||
int nullSeparators = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[suggestion] This can be also performed with std::count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I was thinking :). Now that we agreed on the solution I will use that algorithm.
Yeah, I have seen that in the specification. But image manipulation programs have bugs too, so I wouldn't be surprised if they'd produce such strings ;-) |
I don't remember looking at this code before Sunday. The specification is a "pure C" kind of thing, and I don't see anything wrong with the "pure C" search for NUL. We should also check that the Incidentally, I think we've both spotted what's wrong here. You guys have it more correct. The NUL bytes are essential to prevent the std::string constructor reading past the end of buffer. When I built for debugging in Xcode, I forgot to set the ASAN flags. So, all the stuff I discovered about treating the translated string as optional (and therefore the 3+ stuff is suspicious). My thoughts are both valid and bogus. The specification is clear. translated isn't optional. The NULs must be present. The value of Incidentally, exiftool can read the file. Phil Harvey is friendly and helpful. I could reach out and ask him for his opinion about this. However, I think we're confident of our analysis. We check the chunk and throw if it violates the spec. Using a "pure C" search seems OK to me. Another perfect day in England. The beautiful weather you brought is still here. I'm going out to run in the Californian Weather. |
@clanmills I will also add some checks for the values of the compression flag and method. Thanks for your feedback guys! |
…TChunk This commit fixes the heap-buffer-overflow in PngChunk::parseTXTChunk. According to the specification: http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html There must be 2 null separators when we start to analyze the language tag.
@@ -162,6 +164,9 @@ namespace Exiv2 { | |||
} | |||
else if(type == iTXt_Chunk) | |||
{ | |||
const int nullSeparators = std::count(&data.pData_[keysize+3], &data.pData_[data.size_-1], '\0'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I think this should be &data.pData_[data.size_]
as the end iterator should point to the first element outside of the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahg, you are right. I'll create a mini PR fixing that.
src/pngchunk_int.cpp
Outdated
@@ -165,22 +165,28 @@ namespace Exiv2 { | |||
else if(type == iTXt_Chunk) | |||
{ | |||
const int nullSeparators = std::count(&data.pData_[keysize+3], &data.pData_[data.size_-1], '\0'); | |||
enforce(nullSeparators >= 2, Exiv2::kerCorruptedMetadata); | |||
enforce(nullSeparators >= 2, Exiv2::kerCorruptedMetadata, "iTXt chunk: not enough null separators"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message you added will not be displayed, as kerCorruptedMetadata
does not include it in the output via e.what()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the message was not printed, but I thought it could be a good idea to have that string there in case we can add unit tests in the future around this code. I did not check though if we could get the string when catching the exception in the unit tests code.
Anyways, I think it does not hurt to have a small string in the enforce
calls giving some details of why we are placing the enforce calls in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will hurt the performance slightly in case the exception gets thrown, as then the Exiv2::Error
constructor will search for a place where to insert the string but won't be able to find one. Not really a big deal though. In case you want the error message printed, we you can use kerErrorMessage
(albeit that is a quite generic error).
According to the specification:
http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html
The language tag (languageText) can be empty. In that case the language
is unspecified and therefore there is not a translated keyword.
I still need to add a test for this case, but please, let me know if you think that my analysis is correct.