Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with client encoding for exception messages #37

Closed
mclei-asw opened this issue Apr 4, 2022 · 9 comments
Closed

Problem with client encoding for exception messages #37

mclei-asw opened this issue Apr 4, 2022 · 9 comments

Comments

@mclei-asw
Copy link

Using PostgreSQL 13.6 and current pg_background worker, all on Debian 10.

There is a problem with handling exception messages with special characters.

I have database in UTF-8 and I am using Czech special characters. It works when all my clients uses also UTF-8 encoding.
But when client uses different encoding and conversion from UTF-8 must be taken, it fails with:
ERROR: invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

Testing using:

create or replace function test_bg(a text) returns text
language plpgsql as $$
begin
  raise exception 'This is ěščřžýáíé';
  return 'x';
end; $$;

If I run client with different encoding, then it fails:

#> LANG=cs_CZ.ISO88592 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

The correct behavior is:

#> LANG=cs_CZ.UTF-8 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is ěščřžýáíé
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 21871
@vibhorkum
Copy link
Owner

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

@mclei-asw
Copy link
Author

mclei-asw commented Apr 22, 2022

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

What exactly do you mean by this? I have proper client encoding. Should I setup anything special when using pg_background?

The texts in SQL are in UTF-8. PostgreSQL is responsible for converting it into specified client encoding. It works well for normal connection, but fails when used from inside pg_background.

@rjuju
Copy link
Collaborator

rjuju commented Apr 26, 2022

I don't think that's the expected behavior, as it's not what happens if calling the function without pg_background. For instance:

=# set client_encoding to 'ISO88592';
SET

=# select test_bg(null);
ERROR:  P0001: This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
LOCATION:  exec_stmt_raise, pl_exec.c:3898

=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8
LOCATION:  report_invalid_encoding, mbutils.c:1669

AFAICS the error message is converted to the client encoding in the bgworker, and then the converted data is converted again when re-throwing it, which definitely cannot work.

@vibhorkum
Copy link
Owner

No, this is expected behavior. pg_background worker starts a process with all GUCs settings of the current session and keeps the process attached to the session in which the module was called. So, if your session has client_encoding set to ISO88592 background worker will take that same client_encoding for the launched process. This is intentional, see below:

/* Restore GUC values from launching backend. */

@mclei-asw
Copy link
Author

Sorry, I don't understand you. Expected behavior is to use wrong encoding when run through pg_background and right encoding when run directly? Something smells here.

I think rjuju is right.

@vibhorkum
Copy link
Owner

vibhorkum commented May 9, 2022

No. Expected behavior is to use encoding/GUCs defined in your main session. For example, if you have started a session with client_encoding ISO88592 and you call pg_background, the module will create a process with the same client_encoding ISO88592.

If you want to avoid this, you could either set the server-side encoding in your main session or use the following command:
ALTER FUNCTION public.test_bg SET client_encoding TO '<server side encoding>'; to set at the function level
Below is an example from your sample code:

edb=# show server_encoding ;
 server_encoding
-----------------
 UTF8
(1 row)
edb=# set client_encoding to 'ISO88592';
SET
edb=# select test_bg(null);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8

edb=# ALTER FUNCTION public.test_bg SET client_encoding TO 'UTF8';
ALTER FUNCTION

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123735
edb=#

Or you could do:

edb=# select result from pg_background_result(pg_background_launch($$set client_encoding TO 'UTF-8'; select test_bg(null)$$)) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123750

@vibhorkum
Copy link
Owner

Closing this issue. If needed, it can be reopened.

@mclei-asw
Copy link
Author

Your solution is still only a workaround. The real problem is with double encoding into client charset.

Unfortunately your workaround does not work, when we want to use procedures with transaction control (commit/rollback), because "SET" clause block this.

From doc: a SET clause is attached to a procedure, then that procedure cannot execute transaction control statements (for example, COMMIT and ROLLBACK, depending on the language).
We have also tested the second variant, and it does not work either.

@joelhecht
Copy link

joelhecht commented Aug 23, 2024

I think I encountered the same problem here.
Strangely charset is OK for the text returned by a function called through pg_background, but It crashes when the same text is returned by a raise notice.
For details, just have a look at https://stackoverflow.com/questions/78906240/why-do-i-received-a-invalid-byte-sequence-when-calling-a-function-through-pg

To be short, this single instruction fails :

# select * from pg_background_result(pg_background_launch('do $$ begin raise notice ''Noël''; end; $$')) as t(r text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xeb 0x6c

joelhecht pushed a commit to joelhecht/pg_background that referenced this issue Aug 24, 2024
Some fields of ErrorData are translated back to server encoding before rethrowing the error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants