Unify error handling of bad dictionary elements #16407

cmb69 · 2024-10-13T10:49:31Z

As is, we have separate error handling and reporting for empty strings and strings containing NUL bytes. For simpler error handling we merge both cases; while that is slightly worse regarding the error reporting, it shouldn't matter in practice, since users need to check the validity of the dictionary upfront anyway.

This is a follow-up to PR #16335, where @TimWolla and @Girgias made suggestions for further improvements. This is the "best" I could come up with.

Girgias

I don't mind, but I also don't understand why you don't use a goto to some clean-up section?

Girgias · 2024-10-13T13:08:28Z

ext/zlib/zlib.c

-						ZEND_ASSERT(*ptr);
-						if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
+						*++ptr = zval_try_get_string(cur);
+						if (*ptr == NULL || ZSTR_LEN(*ptr) == 0 || zend_str_has_nul_byte(*ptr)) {


Something else I'm noticing, is that if the option_buffer zval is a string we don't check if it doesn't have a nul byte AFAICS.

That is correct, because a dictionary string is a char array of zero terminated strings (otherwise you could only pass a single word).

cmb69 · 2024-10-13T14:03:24Z

I also don't understand why you don't use a goto to some clean-up section?

Something like:

 ext/zlib/zlib.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index 4221bceacc..4ece73ab46 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -808,25 +808,14 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 					zend_string **end, **ptr = strings - 1;
 
 					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
-						*++ptr = zval_get_string(cur);
-						ZEND_ASSERT(*ptr);
-						if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
-							if (!EG(exception)) {
-								zend_argument_value_error(2, "must not contain empty strings");
-							}
-							return 0;
+						*++ptr = zval_try_get_string(cur);
+						if (*ptr == NULL || ZSTR_LEN(*ptr) == 0) {
+							zend_argument_value_error(2, "must not contain empty strings");
+							goto fail;
 						}
 						if (zend_str_has_nul_byte(*ptr)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
 							zend_argument_value_error(2, "must not contain strings with null bytes");
-							return 0;
+							goto fail;
 						}
 
 						*dictlen += ZSTR_LEN(*ptr) + 1;
@@ -842,6 +831,15 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 						zend_string_release_ex(*ptr, 0);
 					} while (++ptr != end);
 					efree(strings);
+					break;
+fail:
+					do {
+						if (*ptr != NULL) {
+							zend_string_release(*ptr);
+						}
+					} while (--ptr >= strings);
+					efree(strings);
+					return 0;
 				}
 			} break;

In my opinion, that's harder to understand, and error-prone (if you forget the break, you have double-frees). And actually we need a third error for the case where the array element cannot be converted to string; that's yet another leg.

Girgias · 2024-10-13T14:36:45Z

I also don't understand why you don't use a goto to some clean-up section?

Something like:

 ext/zlib/zlib.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index 4221bceacc..4ece73ab46 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -808,25 +808,14 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 					zend_string **end, **ptr = strings - 1;
 
 					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
-						*++ptr = zval_get_string(cur);
-						ZEND_ASSERT(*ptr);
-						if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
-							if (!EG(exception)) {
-								zend_argument_value_error(2, "must not contain empty strings");
-							}
-							return 0;
+						*++ptr = zval_try_get_string(cur);
+						if (*ptr == NULL || ZSTR_LEN(*ptr) == 0) {
+							zend_argument_value_error(2, "must not contain empty strings");
+							goto fail;
 						}
 						if (zend_str_has_nul_byte(*ptr)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
 							zend_argument_value_error(2, "must not contain strings with null bytes");
-							return 0;
+							goto fail;
 						}
 
 						*dictlen += ZSTR_LEN(*ptr) + 1;
@@ -842,6 +831,15 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 						zend_string_release_ex(*ptr, 0);
 					} while (++ptr != end);
 					efree(strings);
+					break;
+fail:
+					do {
+						if (*ptr != NULL) {
+							zend_string_release(*ptr);
+						}
+					} while (--ptr >= strings);
+					efree(strings);
+					return 0;
 				}
 			} break;

In my opinion, that's harder to understand, and error-prone (if you forget the break, you have double-frees). And actually we need a third error for the case where the array element cannot be converted to string; that's yet another leg.

Basically, I need to pray for C2Y to give us defer :D but I don't mind either way.

cmb69 · 2024-10-13T17:16:41Z

Basically, I need to pray for C2Y to give us defer :D but I don't mind either way.

Would that work block scoped? Because I guess that is my actual issue with a goto here.

TimWolla · 2024-10-13T17:52:54Z

Untested, but the following should work. It does some redundant work in case of failure, but the failure case should be rare, making this acceptable:

This also gets rid of the annoying pointer arithmetic, which needlessly obfuscates the code. In fact the strings - 1 is technically undefined behavior, because creating (even without dereferencing) an OOB pointer is UB.

static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_t *dictlen) {
	zval *option_buffer;

	if (options && (option_buffer = zend_hash_str_find(options, ZEND_STRL("dictionary"))) != NULL) {
		ZVAL_DEREF(option_buffer);
		switch (Z_TYPE_P(option_buffer)) {
			case IS_STRING: {
				zend_string *str = Z_STR_P(option_buffer);
				*dict = emalloc(ZSTR_LEN(str));
				memcpy(*dict, ZSTR_VAL(str), ZSTR_LEN(str));
				*dictlen = ZSTR_LEN(str);

				return 1;
			}

			case IS_ARRAY: {
				HashTable *dictionary = Z_ARR_P(option_buffer);
				bool result = 1;

				if (zend_hash_num_elements(dictionary) > 0) {
					zend_string **strings = safe_emalloc(zend_hash_num_elements(dictionary), sizeof(zend_string *), 0);
					size_t total = 0;

					zval *cur;
					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
						zend_string *string = zval_get_string(cur);
						ZEND_ASSERT(string);
						if (EG(exception)) {
							result = 0;
							break;
						}
						if (ZSTR_LEN(string) == 0) {
							result = 0;
							zend_argument_value_error(2, "must not contain empty strings");
							break;
						}
						if (zend_str_has_nul_byte(string)) {
							result = 0;
							zend_argument_value_error(2, "must not contain strings with null bytes");
							break;
						}

						*dictlen += ZSTR_LEN(string) + 1;
						strings[total] = string;
						total++;
					} ZEND_HASH_FOREACH_END();

					char *dictptr = emalloc(*dictlen);
					*dict = dictptr;
					for (size_t i = 0; i < total; i++) {
						zend_string *string = strings[i];
						dictptr = zend_mempcpy(dictptr, ZSTR_VAL(string), ZSTR_LEN(string));
						*dictptr++ = 0;
						zend_string_release(string);
					}
					efree(strings);
					if (!result) {
						efree(*dict);
						*dict = NULL;
					}
				}

				return result;
			}

			default:
				zend_argument_type_error(2, "must be of type zero-terminated string or array, %s given", zend_zval_value_name(option_buffer));
				return 0;
		}
	}
}

@TimWolla

The patch has been contributed by @TimWolla.

cmb69 · 2024-10-14T17:22:57Z

Thank you @TimWolla! This is really nice! I have replaced my suggestion with yours, and added small commit on top, which @Girgias will like (I think).

The only problem is that there is still no goto … ;)

TimWolla · 2024-10-14T17:35:26Z

This is really nice! I have replaced my suggestion with yours,

Thanks. I only have one small request: Please do not @-mention me in commits. This annoyingly sends a notification email whenever the commit is pushed anywhere (including rebases and cherry-picks).

My preferred way of getting credit is adding this at the bottom of the commit message:

Co-authored-by: Tim Düsterhus <[email protected]>

cmb69 · 2024-10-14T17:40:27Z

My preferred way of getting credit is adding this at the bottom of the commit message:
Co-authored-by: Tim Düsterhus <[email protected]>

Okay, will change that. However, it is actually the other way round. :)

ext/zlib/zlib.c

TimWolla · 2024-10-14T17:51:05Z

One thing that I remembered exists after my initial suggestion: The smart_string API (that's the one working on char* instead of zend_string*) might be able to simplify this further. I have this wonderful piece of code, which also comes with the goto. I've previously used the if(0) trick in

php-src/Zend/zend_vm_def.h

Lines 4214 to 4215 in 8f1543a

    
           	if (0) { 
        
           ZEND_VM_C_LABEL(fcall_by_name_end):

Code untested:

			case IS_ARRAY: {
				HashTable *dictionary = Z_ARR_P(option_buffer);
				bool result = 1;

				if (zend_hash_num_elements(dictionary) > 0) {
					zend_string *string;

					zval *cur;
					smart_string out = {0};
					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
						string = zval_try_get_string(cur);
						if (string == NULL) {
							string = ZSTR_EMPTY_ALLOC();
							goto fail;
						}
						if (ZSTR_LEN(string) == 0) {
							zend_argument_value_error(2, "must not contain empty strings");
							goto fail;
						}
						if (zend_str_has_nul_byte(string)) {
							zend_argument_value_error(2, "must not contain strings with null bytes");
							goto fail;
						}

						smart_string_appendl(&out, ZSTR_VAL(string), ZSTR_LEN(string));
						smart_string_appendc(&out, '\0');

						zend_string_release(string);
					} ZEND_HASH_FOREACH_END();

					*dict = out.c;
					*dictlen = out.len;

					if (0) {
 fail:
						result = 0;
						zend_string_release(string);
						smart_string_free(&out);
					}
				}

				return result;
			}

cmb69 · 2024-10-15T16:54:14Z

The smart_string API (that's the one working on char* instead of zend_string*) might be able to simplify this further.

Might be a good idea, although for a long time I'm having an issue with the way the underlying buffer is resized; that doesn't look smart to me.

I have this wonderful piece of code, which also comes with the goto.

Fortunately, my therapist helped: "breathe – imagine a green field – breathe – imagine warm sunshine – breathe". That went on for hours, but finally I'm well again.

github-actions bot added the Extension: zlib label Oct 13, 2024

Girgias reviewed Oct 13, 2024

View reviewed changes

cmb69 added 2 commits October 14, 2024 19:20

Simplify zlib dictionary processing

5584811

The patch has been contributed by @TimWolla.

Minor fix and improvement

859f472

cmb69 force-pushed the cmb/zlib-dict branch from 3928d47 to 859f472 Compare October 14, 2024 17:20

TimWolla approved these changes Oct 14, 2024

View reviewed changes

TimWolla reviewed Oct 14, 2024

View reviewed changes

ext/zlib/zlib.c Show resolved Hide resolved

Fix leak

2990e3e

TimWolla approved these changes Oct 15, 2024

View reviewed changes

cmb69 merged commit b7fd773 into php:master Oct 16, 2024
9 of 10 checks passed

cmb69 deleted the cmb/zlib-dict branch October 16, 2024 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify error handling of bad dictionary elements #16407

Unify error handling of bad dictionary elements #16407

cmb69 commented Oct 13, 2024

Girgias left a comment

Girgias Oct 13, 2024

cmb69 Oct 13, 2024

cmb69 commented Oct 13, 2024

Girgias commented Oct 13, 2024

cmb69 commented Oct 13, 2024

TimWolla commented Oct 13, 2024 •

edited

Loading

cmb69 commented Oct 14, 2024

TimWolla commented Oct 14, 2024

cmb69 commented Oct 14, 2024

TimWolla commented Oct 14, 2024 •

edited

Loading

cmb69 commented Oct 15, 2024

Unify error handling of bad dictionary elements #16407

Unify error handling of bad dictionary elements #16407

Conversation

cmb69 commented Oct 13, 2024

Girgias left a comment

Choose a reason for hiding this comment

Girgias Oct 13, 2024

Choose a reason for hiding this comment

cmb69 Oct 13, 2024

Choose a reason for hiding this comment

cmb69 commented Oct 13, 2024

Girgias commented Oct 13, 2024

cmb69 commented Oct 13, 2024

TimWolla commented Oct 13, 2024 • edited Loading

cmb69 commented Oct 14, 2024

TimWolla commented Oct 14, 2024

cmb69 commented Oct 14, 2024

TimWolla commented Oct 14, 2024 • edited Loading

cmb69 commented Oct 15, 2024

TimWolla commented Oct 13, 2024 •

edited

Loading

TimWolla commented Oct 14, 2024 •

edited

Loading