Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify error handling of bad dictionary elements #16407

Merged
merged 3 commits into from
Oct 16, 2024
Merged

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Oct 13, 2024

As is, we have separate error handling and reporting for empty strings and strings containing NUL bytes. For simpler error handling we merge both cases; while that is slightly worse regarding the error reporting, it shouldn't matter in practice, since users need to check the validity of the dictionary upfront anyway.


This is a follow-up to PR #16335, where @TimWolla and @Girgias made suggestions for further improvements. This is the "best" I could come up with.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind, but I also don't understand why you don't use a goto to some clean-up section?

ext/zlib/zlib.c Outdated
ZEND_ASSERT(*ptr);
if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
*++ptr = zval_try_get_string(cur);
if (*ptr == NULL || ZSTR_LEN(*ptr) == 0 || zend_str_has_nul_byte(*ptr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else I'm noticing, is that if the option_buffer zval is a string we don't check if it doesn't have a nul byte AFAICS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, because a dictionary string is a char array of zero terminated strings (otherwise you could only pass a single word).

@cmb69
Copy link
Member Author

cmb69 commented Oct 13, 2024

I also don't understand why you don't use a goto to some clean-up section?

Something like:

 ext/zlib/zlib.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index 4221bceacc..4ece73ab46 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -808,25 +808,14 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 					zend_string **end, **ptr = strings - 1;
 
 					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
-						*++ptr = zval_get_string(cur);
-						ZEND_ASSERT(*ptr);
-						if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
-							if (!EG(exception)) {
-								zend_argument_value_error(2, "must not contain empty strings");
-							}
-							return 0;
+						*++ptr = zval_try_get_string(cur);
+						if (*ptr == NULL || ZSTR_LEN(*ptr) == 0) {
+							zend_argument_value_error(2, "must not contain empty strings");
+							goto fail;
 						}
 						if (zend_str_has_nul_byte(*ptr)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
 							zend_argument_value_error(2, "must not contain strings with null bytes");
-							return 0;
+							goto fail;
 						}
 
 						*dictlen += ZSTR_LEN(*ptr) + 1;
@@ -842,6 +831,15 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 						zend_string_release_ex(*ptr, 0);
 					} while (++ptr != end);
 					efree(strings);
+					break;
+fail:
+					do {
+						if (*ptr != NULL) {
+							zend_string_release(*ptr);
+						}
+					} while (--ptr >= strings);
+					efree(strings);
+					return 0;
 				}
 			} break;
 

In my opinion, that's harder to understand, and error-prone (if you forget the break, you have double-frees). And actually we need a third error for the case where the array element cannot be converted to string; that's yet another leg.

@Girgias
Copy link
Member

Girgias commented Oct 13, 2024

I also don't understand why you don't use a goto to some clean-up section?

Something like:

 ext/zlib/zlib.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index 4221bceacc..4ece73ab46 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -808,25 +808,14 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 					zend_string **end, **ptr = strings - 1;
 
 					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
-						*++ptr = zval_get_string(cur);
-						ZEND_ASSERT(*ptr);
-						if (ZSTR_LEN(*ptr) == 0 || EG(exception)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
-							if (!EG(exception)) {
-								zend_argument_value_error(2, "must not contain empty strings");
-							}
-							return 0;
+						*++ptr = zval_try_get_string(cur);
+						if (*ptr == NULL || ZSTR_LEN(*ptr) == 0) {
+							zend_argument_value_error(2, "must not contain empty strings");
+							goto fail;
 						}
 						if (zend_str_has_nul_byte(*ptr)) {
-							do {
-								zend_string_release(*ptr);
-							} while (--ptr >= strings);
-							efree(strings);
 							zend_argument_value_error(2, "must not contain strings with null bytes");
-							return 0;
+							goto fail;
 						}
 
 						*dictlen += ZSTR_LEN(*ptr) + 1;
@@ -842,6 +831,15 @@ static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_
 						zend_string_release_ex(*ptr, 0);
 					} while (++ptr != end);
 					efree(strings);
+					break;
+fail:
+					do {
+						if (*ptr != NULL) {
+							zend_string_release(*ptr);
+						}
+					} while (--ptr >= strings);
+					efree(strings);
+					return 0;
 				}
 			} break;
 

In my opinion, that's harder to understand, and error-prone (if you forget the break, you have double-frees). And actually we need a third error for the case where the array element cannot be converted to string; that's yet another leg.

Basically, I need to pray for C2Y to give us defer :D but I don't mind either way.

@cmb69
Copy link
Member Author

cmb69 commented Oct 13, 2024

Basically, I need to pray for C2Y to give us defer :D but I don't mind either way.

Would that work block scoped? Because I guess that is my actual issue with a goto here.

@TimWolla
Copy link
Member

TimWolla commented Oct 13, 2024

Untested, but the following should work. It does some redundant work in case of failure, but the failure case should be rare, making this acceptable:

This also gets rid of the annoying pointer arithmetic, which needlessly obfuscates the code. In fact the strings - 1 is technically undefined behavior, because creating (even without dereferencing) an OOB pointer is UB.

static bool zlib_create_dictionary_string(HashTable *options, char **dict, size_t *dictlen) {
	zval *option_buffer;

	if (options && (option_buffer = zend_hash_str_find(options, ZEND_STRL("dictionary"))) != NULL) {
		ZVAL_DEREF(option_buffer);
		switch (Z_TYPE_P(option_buffer)) {
			case IS_STRING: {
				zend_string *str = Z_STR_P(option_buffer);
				*dict = emalloc(ZSTR_LEN(str));
				memcpy(*dict, ZSTR_VAL(str), ZSTR_LEN(str));
				*dictlen = ZSTR_LEN(str);

				return 1;
			}

			case IS_ARRAY: {
				HashTable *dictionary = Z_ARR_P(option_buffer);
				bool result = 1;

				if (zend_hash_num_elements(dictionary) > 0) {
					zend_string **strings = safe_emalloc(zend_hash_num_elements(dictionary), sizeof(zend_string *), 0);
					size_t total = 0;

					zval *cur;
					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
						zend_string *string = zval_get_string(cur);
						ZEND_ASSERT(string);
						if (EG(exception)) {
							result = 0;
							break;
						}
						if (ZSTR_LEN(string) == 0) {
							result = 0;
							zend_argument_value_error(2, "must not contain empty strings");
							break;
						}
						if (zend_str_has_nul_byte(string)) {
							result = 0;
							zend_argument_value_error(2, "must not contain strings with null bytes");
							break;
						}

						*dictlen += ZSTR_LEN(string) + 1;
						strings[total] = string;
						total++;
					} ZEND_HASH_FOREACH_END();

					char *dictptr = emalloc(*dictlen);
					*dict = dictptr;
					for (size_t i = 0; i < total; i++) {
						zend_string *string = strings[i];
						dictptr = zend_mempcpy(dictptr, ZSTR_VAL(string), ZSTR_LEN(string));
						*dictptr++ = 0;
						zend_string_release(string);
					}
					efree(strings);
					if (!result) {
						efree(*dict);
						*dict = NULL;
					}
				}

				return result;
			}

			default:
				zend_argument_type_error(2, "must be of type zero-terminated string or array, %s given", zend_zval_value_name(option_buffer));
				return 0;
		}
	}
}

@cmb69
Copy link
Member Author

cmb69 commented Oct 14, 2024

Thank you @TimWolla! This is really nice! I have replaced my suggestion with yours, and added small commit on top, which @Girgias will like (I think).

The only problem is that there is still no goto … ;)

@TimWolla
Copy link
Member

This is really nice! I have replaced my suggestion with yours,

Thanks. I only have one small request: Please do not @-mention me in commits. This annoyingly sends a notification email whenever the commit is pushed anywhere (including rebases and cherry-picks).

My preferred way of getting credit is adding this at the bottom of the commit message:

Co-authored-by: Tim Düsterhus <[email protected]>

@cmb69
Copy link
Member Author

cmb69 commented Oct 14, 2024

My preferred way of getting credit is adding this at the bottom of the commit message:

Co-authored-by: Tim Düsterhus <[email protected]>

Okay, will change that. However, it is actually the other way round. :)

@TimWolla
Copy link
Member

TimWolla commented Oct 14, 2024

One thing that I remembered exists after my initial suggestion: The smart_string API (that's the one working on char* instead of zend_string*) might be able to simplify this further. I have this wonderful piece of code, which also comes with the goto. I've previously used the if(0) trick in

php-src/Zend/zend_vm_def.h

Lines 4214 to 4215 in 8f1543a

if (0) {
ZEND_VM_C_LABEL(fcall_by_name_end):

Code untested:

			case IS_ARRAY: {
				HashTable *dictionary = Z_ARR_P(option_buffer);
				bool result = 1;

				if (zend_hash_num_elements(dictionary) > 0) {
					zend_string *string;

					zval *cur;
					smart_string out = {0};
					ZEND_HASH_FOREACH_VAL(dictionary, cur) {
						string = zval_try_get_string(cur);
						if (string == NULL) {
							string = ZSTR_EMPTY_ALLOC();
							goto fail;
						}
						if (ZSTR_LEN(string) == 0) {
							zend_argument_value_error(2, "must not contain empty strings");
							goto fail;
						}
						if (zend_str_has_nul_byte(string)) {
							zend_argument_value_error(2, "must not contain strings with null bytes");
							goto fail;
						}

						smart_string_appendl(&out, ZSTR_VAL(string), ZSTR_LEN(string));
						smart_string_appendc(&out, '\0');

						zend_string_release(string);
					} ZEND_HASH_FOREACH_END();

					*dict = out.c;
					*dictlen = out.len;

					if (0) {
 fail:
						result = 0;
						zend_string_release(string);
						smart_string_free(&out);
					}
				}

				return result;
			}

@cmb69
Copy link
Member Author

cmb69 commented Oct 15, 2024

The smart_string API (that's the one working on char* instead of zend_string*) might be able to simplify this further.

Might be a good idea, although for a long time I'm having an issue with the way the underlying buffer is resized; that doesn't look smart to me.

I have this wonderful piece of code, which also comes with the goto.

Fortunately, my therapist helped: "breathe – imagine a green field – breathe – imagine warm sunshine – breathe". That went on for hours, but finally I'm well again.

@cmb69 cmb69 merged commit b7fd773 into php:master Oct 16, 2024
9 of 10 checks passed
@cmb69 cmb69 deleted the cmb/zlib-dict branch October 16, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants