Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #306 #316

Merged
merged 6 commits into from
May 23, 2018
Merged

Fix issue #306 #316

merged 6 commits into from
May 23, 2018

Conversation

piponazo
Copy link
Collaborator

According to the specification:
http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html

The language tag (languageText) can be empty. In that case the language
is unspecified and therefore there is not a translated keyword.

I still need to add a test for this case, but please, let me know if you think that my analysis is correct.

@piponazo piponazo requested review from clanmills and D4N May 19, 2018 18:07
@clanmills
Copy link
Collaborator

@piponazo Thanks for working on this. I will review this on Sunday evening. We're having a family gathering on Sunday for our son's partner's birthday.

@piponazo piponazo changed the title Fix issue306 Fix issue #306 May 20, 2018
Copy link
Collaborator

@clanmills clanmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luis

I haven't run the fuzzed file through the debugger, so I don't know the root issue. However, by inspecting the code, I believe you're modifying the code to say "if languageText.size()==0 then there's no translatedString".

I don't interpret the specification to say this and the original code appears to implement the specification:

4.2.3.3. iTXt International textual data

This chunk is semantically equivalent to the tEXt and zTXt chunks, but the textual data is in the UTF-8 encoding of the Unicode character set instead of Latin-1. This chunk contains:

   Keyword:             1-79 bytes (character string)
   Null separator:      1 byte
   Compression flag:    1 byte
   Compression method:  1 byte
   Language tag:        0 or more bytes (character string)
   Null separator:      1 byte
   Translated keyword:  0 or more bytes
   Null separator:      1 byte
   Text:                0 or more bytes

I don't recall working on this code and don't know if we have a test file with an iTxT Chunk.

I think the "fuzzed file" has deliberately omitted a NULL separator and we should throw kerCorruptedMetadata

I've built the code with -fsanitize=address and reproduced the issue. I'll step the code in the debugger tomorrow and provide more feedback.

@piponazo
Copy link
Collaborator Author

According to the specification:

The language tag [RFC-1766] indicates the human language used by the translated keyword and the text. Unlike the keyword, the language tag is case-insensitive. It is an ASCII [ISO-646] string consisting of hyphen-separated words of 1-8 letters each (for example: cn, en-uk, no-bok, x-klingon). If the first word is two letters long, it is an ISO language code [ISO-639]. If the language tag is empty, the language is unspecified.

This would be related to our languageText variable. Then the specification continues with:

The translated keyword and text both use the UTF-8 encoding of the Unicode character set [ISO/IEC-10646-1], and neither may contain a zero byte (null character). The text, unlike the other strings, is not null-terminated; its length is implied by the chunk length.

I debugged the code with the POC provided in #306 and what we have is:

  • `compressionFlag == 1'
  • `compressionMethod == 0'
  • `languageText == ""'
  • `languageTextSize == 0'

Then the next line of code (the old one):

std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1));

Was causing the crash. It's true that I made an assumption. The code now says: if the languageText (that is the Language Tag according to the specification) is 0, we do not have a translated keyword, since we do not know to which language the tag is translated. However, later on, the code is still reading the Text.

I actually tried with one valid PNG image with several text chunks that I got from this webpage:
https://pmt.sourceforge.io/itxt/

I think it would be also useful to include that small image in the test suite to check that we do not break this code in the future. I think we are not testing that part of the code right now.

Anyways, let me know if you have more insights about this. The specification is not clear about what to do when the Language Tag is empty.

@clanmills
Copy link
Collaborator

Luis: You're always very persuasive. I promise to step this tomorrow in debugger and share my thoughts with you. I agree that this file should be in the test suite. For sure we shouldn't crash!

@clanmills
Copy link
Collaborator

@piponazo I've spent 90 minutes in the debugger with this puzzle. Interesting stuff. I haven't solved this, however I'm making progress. I'm using the following test file:

816 rmills@rmillsmm:~/Downloads $ ls -alt ~/Downloads/*.dms
-rw-r--r--+ 1 rmills staff 88125 May 20 20:19 /Users/rmills/Downloads/id-000004,sig-06,src-000036,op-havoc,rep-128.dms
817 rmills@rmillsmm:~/Downloads $ 
  1. The crash/exception is coming from zlibUncompress because the data is not compressed!
  2. If you're right about ignoring translatedText when languageTextSize == 0, then all the 3+this+that is not correct.

Here's the state of my code at the moment:

            // translated keyword string after the language description
            unsigned int translatedKeyTextSize = 0 ;
            if ( languageTextSize ) {
                std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1));
                translatedKeyTextSize = static_cast<unsigned int>(translatedKeyText.size());
            }

            if ( compressionFlag[0] == 0x00 )
            {
... unchanged ... although this includes the 3 + this + that code ...
            }
            else if ( compressionFlag[0] == 0x01 && compressionMethod[0] == 0x00 )
            {
                // then it's a zlib compressed iTXt chunk
#ifdef DEBUG
                std::cout << "Exiv2::PngChunk::parseTXTChunk: We found a zlib compressed iTXt field\n";
#endif
                unsigned int skip = 4;
                skip += languageTextSize ? languageTextSize : 0;
                skip += languageTextSize ? translatedKeyTextSize +1 : 0;
                // the compressed text comes after the translated keyword, but isn't null terminated
                const byte* compressedText = data.pData_ + keysize + skip;
                long compressedTextSize    = data.size_  - keysize - skip;
                std::cout << (const char*) compressedText << std::endl;

                zlibUncompress(compressedText, compressedTextSize, arr);
            }

And the output from std::cout is as follows. It's XML, not compressed. And why does it start with "egin= - that's begin

"egin='' id='W5M0MpCehiHzreSzNTczkc9d'?><x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9-9, framework 1.6'>
...

Ummm. I need more time to unpuzzle this.

Alison and I will be out tomorrow afternoon+evening. I promise to look again at this on Wednesday morning.

If you have a flash of inspiration, let me know. This is a fun puzzle and we'll be delighted when we solve it.

@piponazo
Copy link
Collaborator Author

@clanmills I think the crash does not come from zlibUncompress. In my case it was crashing when running the line of code:

            std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1));

Maybe you were trying directly on my branch instead of doing it on the master one?

After reading again the specification I am still not sure how this is happening. That line of code should correspond to:

   Keyword:             1-79 bytes (character string)
   Null separator:      1 byte                                                       
   Compression flag:    1 byte                                               <-- compressionFlag
   Compression method:  1 byte                                           <-- compressionMethod
   Language tag:        0 or more bytes (character string)     <-- languageText (can be read even if it's empty).
   Null separator:      1 byte
   Translated keyword:  0 or more bytes                               <--translatedKeyText. CRASHING HERE!
   Null separator:      1 byte
   Text:                0 or more bytes

I think the problem is that there is a missing null separator, that should be the one determining when to finish the reading of translatedKeyText. Right now the code is assuming the existence of all these null separators.

The only way I can think of fixing this issue is to go through all the data buffer searching for the right amount of null-separators.

@clanmills
Copy link
Collaborator

Thanks for the feedback on this. I'll look at this tomorrow morning and use your branch.

What you've said about search the buffer for the NUL separators sounds correct to me. That's what I suggested on Sunday evening. If we don't find the correct number of NULs we should throw kerCorruptedMetadata

Incidentally, we do have images in the test suite with iTxT chunks. They appear to be uncompressed XMP, so your thought that the compressed chunk code isn't commonly executed by the test suite is probably correct.

606 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/test/data $ for p in *.png ; do echo $p ; exiv2 -pS $p | grep -i itxt; done
ReaganLargePng.png
    9154 | iTXt  |    7156 | XML:com.adobe.xmp.....<?xpacke | 0x8d6d70ba
ReaganSmallPng.png
    9337 | iTXt  |    7117 | XML:com.adobe.xmp.....<?xpacke | 0x2ff025b8
exiv2-bug1074.png
    3360 | iTXt  |     596 | XML:com.adobe.xmp.....<x:xmpme | 0xba1fe0c1
exiv2-bug841.png
Exiv2 exception in print action for file exiv2-bug841.png:
Failed to read image data
exiv2-bug922.png
    9096 | iTXt  |    2524 | XML:com.adobe.xmp.....<?xpacke | 0x1df4a351
imagemagick.png
607 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/test/data $ 

@piponazo
Copy link
Collaborator Author

I have changed the code to check the number of null separators:

else if(type == iTXt_Chunk)  {
            int nullSeparators = 0;
            for (int i = keysize + 3; i < data.size_; i++)
            {
                if (data.pData_[i] == '\0') {
                    nullSeparators++;
                }
            }
            enforce(nullSeparators >= 2, Exiv2::kerCorruptedMetadata);

            // Extract a deflate compressed or uncompressed UTF-8 text chunk

            // we get the compression flag after the key
            const byte* compressionFlag   = data.pData_ + keysize + 1;
            // we get the compression method after the compression flag
            const byte* compressionMethod = data.pData_ + keysize + 2;
            // language description string after the compression technique spec
            std::string languageText((const char*)(data.pData_ + keysize + 3));
            unsigned int languageTextSize = static_cast<unsigned int>(languageText.size());
            // translated keyword string after the language description
            std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1));
            unsigned int translatedKeyTextSize = static_cast<unsigned int>(translatedKeyText.size());

The reason to start from keysize + 3 is that compressionFlag and compressionMethod can also be \0. Therefore, from there we need to have 2 null separators at least. 1 for the LanguageTag, other for the Translated Keyword, and it could be possible to find some other null separators in the text.

This implementation solves the problem described in #306 and keep the code working for other images with valid PNG chunks.

@D4N
Copy link
Member

D4N commented May 22, 2018

I have run the POC file through a debugger and can confirm @piponazo's observation: the problem is that in line 175:

std::string translatedKeyText((const char*)(data.pData_ + keysize + 3 + languageTextSize +1));

we construct a new std::string from a non-null terminated character array. Since the constructor of std::string searches for the first \0 it reads beyond the bounds of the allocated array which AddressSanitzer detects & aborts.

Btw, this problem can be verified quite easily with gdb, set a breakpoint before the problematic string and immediatly before the crashing line, run:

p strlen(data.pData_ + keysize + 3 + languageTextSize +1)

which will result in an ASAN crash. That usually indicates a not properly terminated C string, as strlen reads beyond its bounds when searching for the first \0.

Imho the best solution for this would be to use the following std::string constructor:

basic_string( const CharT* s,
              size_type count, 
              const Allocator& alloc = Allocator() );

and std::string::find to search for the first \0 after the string start.

I think something like this could work (for the languageText string, analog for translatedKeyText):

// keysize + 3 could be larger than the dataBuf, not sure whether the Safe::add is necessary
enforce(Safe::add(keysize, 3) < data.size_, kerCorruptedMetadata);
// find the first \0 after the start of languageText
const byte * languageTextTermination = std::find(data.pData_ + keysize + 3, data.pData_ + data.size_, 0);
// if find returned the end iterator => no \0 found
const size_t languageTextLen = languageTextTermination == data.pData + data.size_ ? data.size_ : data.pData + keysize + 3 - languageTextTermination;

assert(languageTextLen <= static_cast<size_t>(data.size_));
std::string languageText(reinterpret_cast<const char*>(data.pData_ + keysize + 3), languageTextLen);

This should however probably become a function itself. Not really efficient, but better safe than sorry.

@D4N
Copy link
Member

D4N commented May 23, 2018

@piponazo Sorry, I have not properly read your comment. Your method should if course work too, albeit it will reject not null terminated strings (but I guess that's fine, since the standard specifically mandates null termination).

@piponazo
Copy link
Collaborator Author

I do not see the point of using std::find since the constructor of std::string we are using (taking just a const char *) is already finding the first null character for all.

As you pointed out, the problem with the POC provided in #306 is that there is not a null character in the buffer at that point. The code that I introduced is checking for at least 2 null separators once we start analysing the Language Tag:

 Keyword:             1-79 bytes (character string)
   Null separator:      1 byte
   Compression flag:    1 byte
   Compression method:  1 byte
   Language tag:        0 or more bytes (character string)    // <-------- check null separators from here
   Null separator:      1 byte                                // <---- 1
   Translated keyword:  0 or more bytes
   Null separator:      1 byte                                // <---- 2
   Text:                0 or more bytes

If you accept this solution I will squash the commits and try to simplify the current solution using some STL algorithm instead of writing pure C code directly.

@D4N
Copy link
Member

D4N commented May 23, 2018

My idea with using std::find was that we would be able to extract strings that are not null terminated from the png file. The mentioned constructor of std::string does not search for the first null terminator but instead blindly copies count bytes from the source array (which I exploit via std::find to either copy until the first null terminator or until the end of the array).

Using my idea would however only make sense if we want to tolerate not null terminated strings in the png chunk. Don't know if that makes sense or not. @clanmills Do you happen to know if that's a common issue?

@piponazo I'll review your solution.

@piponazo
Copy link
Collaborator Author

According to the specification LanguageTag and TranslatedKeyword must have a null separator. We should not change that. Later on, in the if/else blocks checking the compressionFlag and compressionMethod we are reading the Text that can contains or not null characters.

Copy link
Member

@D4N D4N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't think of a way how to circumvent the safety you added, so this is more than fine by me.

@@ -162,6 +163,15 @@ namespace Exiv2 {
}
else if(type == iTXt_Chunk)
{
int nullSeparators = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] This can be also performed with std::count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I was thinking :). Now that we agreed on the solution I will use that algorithm.

@D4N
Copy link
Member

D4N commented May 23, 2018

According to the specification LanguageTag and TranslatedKeyword must have a null separator.

Yeah, I have seen that in the specification. But image manipulation programs have bugs too, so I wouldn't be surprised if they'd produce such strings ;-)

@clanmills
Copy link
Collaborator

I don't remember looking at this code before Sunday. The specification is a "pure C" kind of thing, and I don't see anything wrong with the "pure C" search for NUL. We should also check that the Compression flag and Compression method fields are valid.

Incidentally, I think we've both spotted what's wrong here. You guys have it more correct. The NUL bytes are essential to prevent the std::string constructor reading past the end of buffer. When I built for debugging in Xcode, I forgot to set the ASAN flags. So, all the stuff I discovered about treating the translated string as optional (and therefore the 3+ stuff is suspicious). My thoughts are both valid and bogus. The specification is clear. translated isn't optional. The NULs must be present. The value of Compression flag and Compression method must be valid.

Incidentally, exiftool can read the file. Phil Harvey is friendly and helpful. I could reach out and ask him for his opinion about this. However, I think we're confident of our analysis. We check the chunk and throw if it violates the spec. Using a "pure C" search seems OK to me.

Another perfect day in England. The beautiful weather you brought is still here. I'm going out to run in the Californian Weather.

@piponazo
Copy link
Collaborator Author

@clanmills I will also add some checks for the values of the compression flag and method.

Thanks for your feedback guys!

…TChunk

This commit fixes the heap-buffer-overflow in PngChunk::parseTXTChunk.

According to the specification:
http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html

There must be 2 null separators when we start to analyze the language tag.
@piponazo piponazo merged commit 3ad0050 into Exiv2:master May 23, 2018
@piponazo piponazo deleted the fixIssue306 branch May 23, 2018 08:57
@@ -162,6 +164,9 @@ namespace Exiv2 {
}
else if(type == iTXt_Chunk)
{
const int nullSeparators = std::count(&data.pData_[keysize+3], &data.pData_[data.size_-1], '\0');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I think this should be &data.pData_[data.size_] as the end iterator should point to the first element outside of the range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahg, you are right. I'll create a mini PR fixing that.

@@ -165,22 +165,28 @@ namespace Exiv2 {
else if(type == iTXt_Chunk)
{
const int nullSeparators = std::count(&data.pData_[keysize+3], &data.pData_[data.size_-1], '\0');
enforce(nullSeparators >= 2, Exiv2::kerCorruptedMetadata);
enforce(nullSeparators >= 2, Exiv2::kerCorruptedMetadata, "iTXt chunk: not enough null separators");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message you added will not be displayed, as kerCorruptedMetadata does not include it in the output via e.what().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the message was not printed, but I thought it could be a good idea to have that string there in case we can add unit tests in the future around this code. I did not check though if we could get the string when catching the exception in the unit tests code.

Anyways, I think it does not hurt to have a small string in the enforcecalls giving some details of why we are placing the enforce calls in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will hurt the performance slightly in case the exception gets thrown, as then the Exiv2::Error constructor will search for a place where to insert the string but won't be able to find one. Not really a big deal though. In case you want the error message printed, we you can use kerErrorMessage (albeit that is a quite generic error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants