-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade generated parse_line() to accept constant SBuf #1840
base: master
Are you sure you want to change the base?
Upgrade generated parse_line() to accept constant SBuf #1840
Conversation
These changes attempt to preserve existing squid.conf grammar idiosyncrasies. A lot of squid.conf parsing code needs to be upgraded; these changes focus on parse_line() because upcoming reconfiguration improvements require that particular function to accept SBuf. Also fixed ConfigParser SetCfgLine() method and CfgLine data member descriptions. These members are not about the whole squid.conf line, and not every squid.conf line goes through these members: These members are dedicated to tokens located after a directive name on a directive line. A lot of related code needs to be polished, but we fixed these docs because this change affects all SetCfgLine() callers and CfgLine users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review annotates PR changes but does not request any adjustments.
} | ||
|
||
std::optional<SBuf> | ||
NextWordWhileRemovingDoubleQuotesAndBackslashesInsideThem(Parser::Tokenizer &tk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using this very long name because I assume that the underlying syntax here does not have a well-known name (it is effectively defined by existing strwordtok() implementation that this code is based on) and to automatically flag all callers for extra scrutiny: If the caller should use a well-known syntax instead, they should not call this function.
Just like with libcSpace(): We may (and probably should!) change callers grammar to some well-known alternative, but let's not do that in this behavior-preserving PR.
If my assumption about the lack of a well-known name for this syntax is wrong, please let me know what the correct name is, and I will rename this function in his PR, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, this is called "un-escaping a quoted string"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, this is called "un-escaping a quoted string"
Yes, it is. A lot of things can be (and are!) called that. This function name should be more precise IMO, for the reasons detailed earlier.
if 0=1 Our preprocessor grammar is not based on space-separated tokens!
Condition "00=000" will need to be replaced with "0=000" when int64() bug is fixed.
Better than nothing. We could add a stronger test that validates equality, but it would make this configuration file inapplicable to SMP Squids. IMO, we should not do that without also developing the corresponding instruction (and adding that to the instructions file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also fix #include order for STL headers. Context: squid-cache#1840 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I believe you are dong the refactor "wrong", ie. in a way which makes the change steps more difficult and complex than other approaches.
There are a lot of "Also, ..." changes in src/cache_cf.cc
which are not documented in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yadij, I hope I have addressed all change requests. Please revisit the two change requests related to character sets ASAP. I hope those requests were based on the wrong assumption about the current official code, and the corresponding changes should be implemented in a followup PR instead. If you insist, I will address them in this PR, but I do not want this PR wait another ten days for that decision. Thank you.
/// extracts all leading space characters (if any) | ||
/// \returns whether at least one character was extracted | ||
static bool | ||
SkipOptionalSpace(Parser::Tokenizer &tk) | ||
{ | ||
SubstituteMacro(line, len, "${service_name}", service_name.c_str()); | ||
SubstituteMacro(line, len, "${process_name}", TheKidName.c_str()); | ||
SubstituteMacro(line, len, "${process_number}", xitoa(KidIdentifier)); | ||
return tk.skipAll(CharacterSet::WSP); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please inline this one-line function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not be inlined because inlining it will duplicate its code and obscure its high-level intent/logic in many callers.
In most cases, we should not inline, regardless of the number of lines a function has. There are exceptions to that rule of thumb, of course, but this particular function is not exceptional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling a one-line function is just as much duplicate code as writing its contents in full.
This function should not be inlined because inlining it will duplicate its code
"duplicate code" arguments are only relevant to lines which are complex (eg math equations).
and obscure its high-level intent/logic in many callers.
The "high level intent" is clear from the syntax and parameters of skipAll(CharacterSet::WSP);
.
This being a separate function is redundant (as is the new documentation for it).
In most cases, we should not inline, regardless of the number of lines a function has. There are exceptions to that rule of thumb, of course, but this particular function is not exceptional.
I Disagree, one-line functions are all special-case for that rule of thumb. This one is also an exceptional in addition to that one-line nature:
FYI: One-line functions/methods that simply call another function/method (such as this one does) are called "trampolines" and best practice is to avoid use of trampolines whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling a one-line function is just as much duplicate code as writing its contents in full.
That assertion is false for at least two independent reasons:
- That encapsulated code line contains more details than a function name. It contains the choice of delimiters to use when skipping space.
- Using the same context-specific wrapper function (name) in multiple places carries more information than calling a low level skipAll() method would. Knowing that all these places are related and all of them skip optional space (rather than, say, required space) helps read and maintain code.
This function should not be inlined because inlining it will duplicate its code
"duplicate code" arguments are only relevant to lines which are complex (eg math equations).
That assertion is false because complexity is not the only thing we do not want to duplicate. Duplicating simple decisions is also a negative.
The "high level intent" is clear from the syntax and parameters of
skipAll(CharacterSet::WSP);
.
Does that call skip optional or required space? What other places should be updated if I want to update WSP in one particular place? Encapsulating that call helps answer these and other questions. Yes, those answers can usually be derived without encapsulation, but at a higher cost (sometimes significantly higher cost!).
In most cases, we should not inline, regardless of the number of lines a function has. There are exceptions to that rule of thumb, of course, but this particular function is not exceptional.
I Disagree, one-line functions are all special-case for that rule of thumb.
I see no good reason to make one-line functions a special case. The onus is on you to prove such an exception is needed (just stating that it is needed is not enough).
N.B. Absence of an exception to "do not inline unless you have a very good reason" rule of thumb does not mean that one-line functions are always better than duplicated inlined code, of course.
This one is also an exceptional in addition to that one-line nature:
FYI: One-line functions/methods that simply call another function/method (such as this one does) are called "trampolines" and best practice is to avoid use of trampolines whenever possible.
The term "trampoline function" means quite different things in various programming languages/contexts, but it is not used to describe "a function calling another function" in any relevant context I can find. The term is usually related to functions that define/contain (not just call!) other functions or receive other functions as parameters/state. None of that applies here, of course.
/// \returns std::nullopt if input does not look like an `include` statement | ||
/// \returns `include` parameters if input is an `include` statement | ||
static std::optional<SBuf> | ||
IsIncludeLine(Parser::Tokenizer tk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing reference leaves the keyword "include " in callers Tokenizer
to be interpreted as a filename:
IsIncludeLine(Parser::Tokenizer tk) | |
IsIncludeLine(Parser::Tokenizer &tk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing reference leaves the keyword "include " in callers Tokenizer to be interpreted as a filename
That reference is not missing here and adding it would introduce problems: This IsX() function is a test function. It does not and should not change its input/Tokenizer. The caller is fine because it gets filenames from the function return value. The caller also does not want any input consumed because it is running other tests against the same input line and does not want to worry about one test affecting the other.
char* lvalue = (char*)xmalloc(equation - expr + 1); | ||
xstrncpy(lvalue, expr, equation - expr + 1); | ||
trim_trailing_ws(lvalue); | ||
// grammar: space* keyword space* END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grammar does not match the parser. The second space*
is prohibited from being >0
by RejectTrailingGarbage()
below:
// grammar: space* keyword space* END | |
// grammar: space* keyword 0#space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grammar does not match the parser. The second
space*
is prohibited from being>0
byRejectTrailingGarbage()
below
The grammar matches: The second space*
is allowed/consumed right before RejectTrailingGarbage() is called:
if (SkipOptionalSpace(tk)) { // second `space*` is allowed/consumed
RejectTrailingGarbage("preprocessor keyword", keyword, tk); // throws if there is trailing garbage
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. right. The wrapper function naming and code structure makes it seem like any skipping done causes a rejection.
The problem this is the opposite. A line *without spaces after the found token is not rejected as garbage. For example;
- the old parser (as already mentioned) accepted "else if" and treated it as an else-statement with a nested if-statement
- this logic will accept "elseif" and do the same.
.. both of which are unintentional and broken/undefined behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A line *without spaces after the found token is not rejected as garbage.
That statement is misleading in this context because such a line is not accepted either: A line with garbage after letters matching a given keyword K results in IsIfStatementLine(K) returning false. That is correct behavior for that function and the grammar it defines/supports -- preprocessor keyword K was not found. After IsIfStatementLine(K) returns false, other parsing code (possibly including other IsIfStatementLine(X) calls) deals with that line, rejecting it as needed.
For example; ... this logic will accept "elseif" and [treat it as an else-statement with a nested if-statement]
That example is incorrect or misleading. PR code does not accept "elseif". It correctly rejects "elseif", first as not a preprocessor line and then as an unknown squid.conf directive:
2024/11/05 10:51:40| ERROR: unrecognized directive near 'elseif'
2024/11/05 10:51:41| FATAL: configuration failure: Found 1 unrecognized directive(s)
If we add a directive named like elsewhere_services ...
or iffy_clients_enabled ...
, PR preprocessor code will not need to be modified for Squid to correctly recognize that new directive despite its name starting with the same letters as the preprocessor keyword "else" or "if". Needless to say, the chances of us adding those exact directives are pretty much zero, but I am using them to illustrate that PR code structure is correct and can support future changes such as addition of new directives (or new preprocessor keywords that are more likely to match some directive name prefixes!).
if (SkipOptionalSpace(tk)) { | ||
RejectTrailingGarbage("preprocessor keyword", keyword, tk); | ||
return true; | ||
} | ||
// e.g., "elseif" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, "elseif" is not supported.
However, despite cf.data.pre
stating that "NOTE: An else-if condition is not supported." the configuration grammar:
// grammar: "else" space "if" space condition space* END
does actually work in current Squid. This new logic will break that. Please fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, "elseif" is not supported.
Yes, this comment provides an example of invalid (but realistic) input that would allow function code to reach this particular spot on a negative/"return false" code path.
However, despite
cf.data.pre
stating that "NOTE: An else-if condition is not supported." the configuration grammar:// grammar: "else" space "if" space condition space* END
does actually work in current Squid.
No, else if
does not work in the current official code. Our documentation is correct -- that and similar else-if expressions are not supported: Official code does not bother to check for leftovers and misinterprets else if ...
as unconditional else
! There are other problems related to if/else/endif handling in official code (that this PR does not make worse); please do not ask this "SBuf upgrade" PR to add proper else-if support to Squid.
This new logic will break that. Please fix.
PR code does not break non-existent else if
support (so there is nothing to "fix" in this change request scope), but this PR does fix detection of that still-unsupported case and provides useful diagnostics:
FATAL: configuration failure: found trailing garbage after parsing preprocessor keyword else: if false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yadij, I believe I have addressed all change requests. Please unblock this PR.
/// extracts all leading space characters (if any) | ||
/// \returns whether at least one character was extracted | ||
static bool | ||
SkipOptionalSpace(Parser::Tokenizer &tk) | ||
{ | ||
SubstituteMacro(line, len, "${service_name}", service_name.c_str()); | ||
SubstituteMacro(line, len, "${process_name}", TheKidName.c_str()); | ||
SubstituteMacro(line, len, "${process_number}", xitoa(KidIdentifier)); | ||
return tk.skipAll(CharacterSet::WSP); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not be inlined because inlining it will duplicate its code and obscure its high-level intent/logic in many callers.
In most cases, we should not inline, regardless of the number of lines a function has. There are exceptions to that rule of thumb, of course, but this particular function is not exceptional.
/// \returns std::nullopt if input does not look like an `include` statement | ||
/// \returns `include` parameters if input is an `include` statement | ||
static std::optional<SBuf> | ||
IsIncludeLine(Parser::Tokenizer tk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing reference leaves the keyword "include " in callers Tokenizer to be interpreted as a filename
That reference is not missing here and adding it would introduce problems: This IsX() function is a test function. It does not and should not change its input/Tokenizer. The caller is fine because it gets filenames from the function return value. The caller also does not want any input consumed because it is running other tests against the same input line and does not want to worry about one test affecting the other.
char* lvalue = (char*)xmalloc(equation - expr + 1); | ||
xstrncpy(lvalue, expr, equation - expr + 1); | ||
trim_trailing_ws(lvalue); | ||
// grammar: space* keyword space* END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grammar does not match the parser. The second
space*
is prohibited from being>0
byRejectTrailingGarbage()
below
The grammar matches: The second space*
is allowed/consumed right before RejectTrailingGarbage() is called:
if (SkipOptionalSpace(tk)) { // second `space*` is allowed/consumed
RejectTrailingGarbage("preprocessor keyword", keyword, tk); // throws if there is trailing garbage
return true;
}
if (SkipOptionalSpace(tk)) { | ||
RejectTrailingGarbage("preprocessor keyword", keyword, tk); | ||
return true; | ||
} | ||
// e.g., "elseif" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, "elseif" is not supported.
Yes, this comment provides an example of invalid (but realistic) input that would allow function code to reach this particular spot on a negative/"return false" code path.
However, despite
cf.data.pre
stating that "NOTE: An else-if condition is not supported." the configuration grammar:// grammar: "else" space "if" space condition space* END
does actually work in current Squid.
No, else if
does not work in the current official code. Our documentation is correct -- that and similar else-if expressions are not supported: Official code does not bother to check for leftovers and misinterprets else if ...
as unconditional else
! There are other problems related to if/else/endif handling in official code (that this PR does not make worse); please do not ask this "SBuf upgrade" PR to add proper else-if support to Squid.
This new logic will break that. Please fix.
PR code does not break non-existent else if
support (so there is nothing to "fix" in this change request scope), but this PR does fix detection of that still-unsupported case and provides useful diagnostics:
FATAL: configuration failure: found trailing garbage after parsing preprocessor keyword else: if false
Also applied recent commit 314e430 "only SP and HT" logic when parsing
filenames in preprocessor
include
directives and operands inpreprocessor
if
statements. Prior to this change, in these contexts,Squid also accepted ASCII VT and FF characters as well as any other
locale-specific characters classified as space by isspace(3).
Other changes attempt to preserve existing squid.conf grammar
idiosyncrasies. A lot of squid.conf parsing code needs to be upgraded;
these changes focus on parse_line() because upcoming reconfiguration
improvements require that particular function to accept SBuf.
Also fixed ConfigParser SetCfgLine() method and CfgLine data member
descriptions. These members are not about the whole squid.conf line, and
not every squid.conf line goes through these members: These members are
dedicated to tokens located after a directive name on a directive line.
A lot of related code needs to be polished, but we fixed these docs
because this change affects all SetCfgLine() callers and CfgLine users.