-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sympasoap oddity with utf-8 input #1862
Comments
Hi @dpc22 ,
Please provide a sample of the input data, the client script you used and how you made sure the client encoded it correctly. |
I attach my example Python script which fails (.txt extension required by github) The equivalent Perl script seems to work: The only obvious difference is:
I can't find a direct equivalence to "$soap->default_ns()" in the Zeep library that I am using in Python. There is: "zeep.set_ns_prefix()", but that takes two arguments.
The following didn't help:
Afraid that I don't know what SOAP namespaces do, so I'm blundering around in the dark rather. |
I'm pretty sure that my Python code was originally derived from: https://pypi.org/project/sympasoap/. That doesn't seem to do anything with namespaces either. (Edit to add) It also has a normalize method which just discards any non-ASCII characters on the GeCOS field before invoking the SOAP add method. Presumably the author ran into the same issue, but didn't come up with a more sensible fix. |
https://docs.python-zeep.org/en/master/transport.html#debugging tells me how to dump the raw XML which is sent to the sympasoap server. The raw HTTP POST request was:
We have The
I have a dedicated test server if I can add useful debugging at the server end. The normal Sympa verbose logging didn't tell me anything. |
Thank you. That seems to have fixed the problem on my test server. I did need to add a patch for src/lib/Makefile.in in order to backport your fix from the GIT repository to the 6.2.72 release tarball given: rename from src/lib/Sympa/WWW/SOAP/Transport.pm I will apply the fix to the live system either tomorrow morning or Monday morning. |
Duplicate of #1541. |
Okay, that seems to have worked on the live system as well. Thanks for your help here! |
Version
6.2.72
Installation method
My own rpm, derived from "official" RHEL 9 rpm.
Expected behavior
If someone calls the SOAP "add" method with a GeCOS value which contains non-ASCII characters, the data should be processed as UTF-8.
Actual behavior
The PostgreSQL database back end throws an exception:
Jul 8 09:19:27 lists-2 sympasoap[298198]: err main::#85 > Sympa::WWW::SOAP::Transport::handle#118 > SOAP::Transport::HTTP::CGI::handle#627 > SOAP::Transport::HTTP::Server::handle#459 > SOAP::Server::handle#2844 > (eval)#2878 > (eval)#2893 > Sympa::WWW::SOAP::add#812 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::add::_twist#80 > Sympa::List::add_list_member#3291 > Sympa::DatabaseDriver::PostgreSQL::do_prepared_query#112 > Sympa::Database::do_prepared_query#383 Unable to execute SQL statement "INSERT INTO subscriber_table (subscribed_subscriber, reception_subscriber, update_epoch_subscriber, number_messages_subscriber, date_epoch_subscriber, visibility_subscriber, user_subscriber, comment_subscriber, list_subscriber, robot_subscriber) SELECT ?, ?, ?, ?, ?, ?, ?, ?, ?, ? FROM dual WHERE NOT EXISTS ( SELECT 1 FROM subscriber_table WHERE user_subscriber = ? AND list_subscriber = ? AND robot_subscriber = ? )": (22021) ERROR: invalid byte sequence for encoding "UTF8": 0xa3
"0xa3" is the single byte ISO-8859-1 character "£".
This is correctly encoded using the 2 byte UTF-8 sequence: "0xc2 0xa3" in my SOAP client.
Something has trans-coded UTF-8 to ISO-8859-1, but the database backend is expecting UTF-8.
Steps to reproduce
SOAP client script (written in Python) available on request.
Additional information
I have an unpleasant feeling that this is in some way related to:
#1407
"This behavior seems due to bug (or buggy behavior) of SOAP::Lite".
(We are using the version of SOAP-Lite which ships with RHEL 9, which is: perl-SOAP-Lite-1.27-8.el9.noarch).
If I add a "Encode::_utf8_off($gecos);" to: lib/Sympa/WWW/SOAP.pm:
Then things start to work in the way that I would expect. However it isn't clear to me whether this is a safe or sensible thing to do.
The text was updated successfully, but these errors were encountered: