-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99127: Allow some features of syslog to the main interpreter only #99128
Conversation
corona10
commented
Nov 5, 2022
•
edited by bedevere-bot
Loading
edited by bedevere-bot
- Issue: Port syslog module to use module state #99127
(.oss) ➜ cpython git:(gh-99127) ✗ ./python.exe -m test test_syslog -R 3:3
Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 3.65 Run tests sequentially
0:00:00 load avg: 3.65 [1/1] test_syslog
beginning 6 repetitions
123456
......
== Tests result: SUCCESS ==
1 test OK.
Total duration: 704 ms
Tests result: SUCCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider: what happens when multiple sub interpreters use the syslog extension. The current implementation always calls openlog(3) at least once before calling syslog(3), see the check in syslog_syslog_impl.
That results in interference between sub interpreters when in one of them the python code explicitly calls syslog.openlog and the other doesn't.
IMHO the check in syslog_syslog_impl is not necessary because it results in a call to openlog(3) with parameters that match the default behaviour of the syslog library. Removing it is a fairly minor behaviour change though (syslog.syslog will currently override any calls to openlog that were done in C before the first call to syslog.syslog and will stop doing that when removing the check in syslog_syslog_impl).
Thanks that's why I got a headache which makes me close the issue but thanks to suggest the solution.
Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right? |
Or did you intended implicit call of openlog(3) while calling syslog(3)? |
Failure tests from buildbot/PPC64LE Fedora were not related to this change. |
Unfortunately I don't have time to review this right now. Maybe @serhiy-storchaka can have a look? He recently did some work on this module. |
At the C level calling openlog(3) before calling syslog(3) is not necessary, although the spec is not entirely clear about this. The linux manual page does clearly mention that openlog(3) is optional, and matches what I remember from older Unix systems. The call to openlog in syslog_syslog_impl basically resets the default system state (use the process name for the ident and log using the LOG_USER facility). Leaving that out should therefore be harmless, but this is technically a user-visible change of behaviour:
Currently the Python code resets the values set in the first step, with my proposal it would no longer do this. I'd consider this a good change, but there's bound to be someone affected by this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stateless syslog C API doesn't seem to be designed to be consumed by multiple interpreters. I'm not sure that making S_ident_o and S_log_open global varibles per interpreter is safe.
Modules/syslogmodule.c
Outdated
{ | ||
_syslog_state *state = get_syslog_state(module); | ||
if (state->S_log_open) { | ||
closelog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if interpreter A calls closelog() and interpreter B (same process!) still uses the Python syslog module, so after closelog() has been called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
POSIX syslog will transparently call openlog()
with default values.
Oh I missed this, I will push the documentation ASAP. |
@vstinner Updated! PTAL @ericsnowcurrently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm not good to review English, but the code LGTM :-)
Doc/whatsnew/3.12.rst
Outdated
|
||
* :func:`syslog.openlog` and :func:`syslog.closelog` are only available from the main interpreter not subinterpreter. | ||
:func:`syslog.syslog` is only available to subinterpreters if :func:`syslog.openlog` was already called from the main interpreter. | ||
(Contributed by Dong-hee Na in :gh:`99127`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I think that the "Porting to Python 3.12" is a better section for these changes. The "syslog" section is more for new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can syslog.syslog only be used when the main interpreter has called syslog.openlog? That's a restriction that is not present in the underlying library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronaldoussoren cc @ericsnowcurrently
This might be the discussion part for this change, from the view of managing for open/close state, subinterpreter should not intervene in changing the state of open/close. And this is why syslog.openlog
and syslog.closelog
are not available to the subinterpreter side. so if the subinterpreter can call syslog.syslog()
without involving the main interpreter, it means that subinterpreter intervenes in the state of open/close so I think that it should be prevented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.
For a use-case like embedding Python in a web server (like mod_python it is more likely that the embedding program does the syslog setup than that this is done in Python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.
In summary for your suggestion might be:
- syslog.syslog(): Allow for both the main interpreter and subinterpreters at any condition.
- syslog.openlog()/closelog(): Allow only for the main interpreter
But if the subinterpreter calls syslog.syslog()
after the main interpreter calls the syslog.closelog()
, the subinterpreter will neutralize what the main interpreter did. Who will close the syslog.syslog()
that is opened by subinterpreter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two factors here:
- most process-global resources should be managed by the app
- we are storing a str object in a static variable (
S_ident_o
), which is tricky when multiple interpreters are involved
Regarding the first one, this is perhaps something that we need not be so strict about. In Python, "app" means the __main__
module, running in the main interpreter (starting in the main thread). This implies that the global resources modifed via syslog.openlog()
and syslog.closelog()
should only be managed in the main interpreter. This would be similar to how signal handlers are supported only in the main thread of the main interpreter. However, this could be an overly strict interpretation for syslog. It makes sense to me, but "consenting adults", etc.
Regarding the second factor, we have been (and still are) working hard to get interpreters isolated from one another, especially to avoid problems that arise when they step on each others' toes. [1] (Isolation also provides new implementation opportunities.) For example, an object created in one interpreter should never be modified (even just the refcount) by another interpreter. [2] In the case of syslog, the object in the static variable (set in syslog.openlog()
) should be managed only by the interpreter that owns it. Here are ways to enforce this:
- the interpreter that calls
openlog()
should be the only one allowed to callcloselog()
(simplest way is to restrict to the main interpreter) - if another interpreter calls
closelog()
, then it would callPy_AddPendingCall()
to have the owning interpreter decref the string
Obviously we went with the simplest form of the first one.
On top of the existing isolation concerns, there are additional ones that become more complicated under a per-interpreter GIL due to thread-safety. These are a non-issue if we restrict to a per-interpreter GIL.
[1] Such problems would be amplified by a per-interpreter GIL (see PEP 684, which has not been accepted yet), but they still exist even now.
[2] Except for a small set of objects managed by the process-global runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, if the concern is backward compatibility, we can easily apply the restriction only if the subinterpreter was created via Py_NewInterpreter()
(or, under a per-interpreter GIL, is otherwise sharing the GIL with the main interpreter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.
For a use-case like embedding Python in a web server (like mod_python it is more likely that the embedding program does the syslog setup than that this is done in Python code.
FWIW, both concerns are valid, but pre-date this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some recommendations for the docs.
When you're done making the requested changes, leave the comment: |
Co-authored-by: Eric Snow <[email protected]>
I have made the requested changes; please review again |
Thanks for making the requested changes! @ericsnowcurrently, @vstinner: please review the changes made to this pull request. |
The limitation comes from Python which stores a Python object in a global variable:
We must hold a strong reference because we pass a pointer into the internal UTF-8 encoded flavor of the Unicode string, PyUnicode_AsUTF8():
I understand that openlog() doesn't copy the ident string. Maybe this assumption is wrong and the whole change is not needed. I don't know. I don't want to check the implementation of openlog() on all supported platforms, and run a real test.
Ok, I'm curious and I looked into the GNU glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=misc/syslog.c;h=f67d4b58a448cabdf63f629f72a9df6e009286cf;hb=HEAD#l295 The ident string is not copied, only the pointer is copied ( |
One problem with sub-interpreters is that Python objects should not travel from one interpreter to another. But For closelog(), I proposed earlier a counter increased by syslog.openlog() and decreased by syslog.closelog(). The C function closelog() would only be called when the counter reach zero. One problem is if two interepreters use different identifiers, the latest call to syslog.openlog() wins: it overrides the previous ident (and log level). os.chdir() has a similar problem: it's a process-wide parameter affecting all threads and all interpreters. But it's a simpler problem. The current working directory is stored in the kernel, not in Python, and no resource have to be released explicitly at exit :-) |
Agree, keeping a copy of ident in the raw dynamically allocated array will resolve the issue. But I do not think we need a reference counter. Just call |
What about keeping a static buffer and copying the str object's C string into it, then giving the buffer to #define MAX_IDENT 256
static char S_ident[MAX_IDENT+1];
...
static PyObject *
syslog_openlog_impl(...)
{
....
const char *ident_str = NULL;
if (ident != NULL) {
Py_ssize_t size = PyUnicode_GET_LENGTH(ident);
if (size > MAX_IDENT) {
PyErr_SetString(...);
return NULL;
}
else if (size > 0) {
strncpy(S_ident, PyUnicode_AsUTF8(ident), size);
ident_str = S_ident;
}
}
openlog(ident_str, logopt, facility);
Py_RETURN_NONE;
}
static PyObject *
syslog_syslog_impl(...)
{
if (PySys_Audit("syslog.syslog", "is", priority, message) < 0) {
...
}
#ifdef __APPLE__
// gh-98178: On macOS, libc syslog() is not thread-safe
syslog(priority, "%s", message);
#else
Py_BEGIN_ALLOW_THREADS;
syslog(priority, "%s", message);
Py_END_ALLOW_THREADS;
#endif
Py_RETURN_NONE;
}
static PyObject *
syslog_closelog_impl(PyObject *module)
{
...
closelog();
Py_RETURN_NONE;
} There would still be a race under per-interpreter GIL, though, if subinterpreters are allowed to call |
@ronaldoussoren @vstinner @ericsnowcurrently As the first step for the subinterpreter project, I would like to suggest maintaining restrictions for subinterpreter as the current PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a good baseline. I'm okay with circling back with improvements (or loosening restrictions) afterward.
* main: (112 commits) pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895) pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613) Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917) pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916) pythongh-89189: More compact range iterator (pythonGH-27986) bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491) pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906) pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850) pythongh-99845: Use size_t type in __sizeof__() methods (python#99846) pythonGH-99877) Fix typo in exception message in `multiprocessing.pool` (python#99900) pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869) pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893) pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825) pythonGH-81057: remove static state from suggestions.c (python#99411) Improve zip64 limit error message (python#95892) pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591) pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128) pythongh-82836: fix private network check (python#97733) Docs: improve accuracy of socketserver reference (python#24767) ...