Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After migration russian letters are incorrectly encoded #5185

Closed
iddm opened this issue Oct 25, 2018 · 14 comments
Closed

After migration russian letters are incorrectly encoded #5185

iddm opened this issue Oct 25, 2018 · 14 comments
Labels
type/question Issue needs no code to be fixed, only a description on how to fix it yourself.

Comments

@iddm
Copy link
Contributor

iddm commented Oct 25, 2018

I have recently migrated from binary gitea to docker gitea. I made a dump and it imported into database inside docker also and now my issues look like this:

I also have 500 Internal Server Error error very often and when I look for the problem in the logs I see this:

ERROR: invalid character '\n' in string literal

I have no idea what I have done wrong, could anyone help me please?

My docker-compose.yml:

version: "2"

services:
  db:
    image: mariadb:latest
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=gitea
      - MYSQL_USER=gitea
      - MYSQL_PASSWORD=gitea
      - MYSQL_DATABASE=gitea
    volumes:
      - ./mysql:/var/lib/mysql

  server:
    image: gitea/gitea:latest
    restart: always
    environment:
      - USER_UID=1006
      - USER_GID=1006
      - USER=gitea
    volumes:
      - ./gitea:/data
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db
@zeripath
Copy link
Contributor

zeripath commented Oct 25, 2018

I think your mariaDB has not been set-up to use utf-8, see docker-library/docs#613

Basically you need docker-compose.yml to read:

version: "2"

services:
  db:
    image: mariadb:latest
    command: ['--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=gitea
      - MYSQL_USER=gitea
      - MYSQL_PASSWORD=gitea
      - MYSQL_DATABASE=gitea
    volumes:
      - ./mysql:/var/lib/mysql

  server:
    image: gitea/gitea:latest
    restart: always
    environment:
      - USER_UID=1006
      - USER_GID=1006
      - USER=gitea
    volumes:
      - ./gitea:/data
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db

@iddm
Copy link
Contributor Author

iddm commented Oct 26, 2018

I don't quite understand when I should set this command: before importing database dump from old gitea or whenever I want and it must help immediately even after dump had been imported?

@zeripath
Copy link
Contributor

I'm not a MariaDB expert, but I suspect you need it at least when you're importing the dump and I'd suspect when you're running the database.

@zeripath
Copy link
Contributor

Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program.

@iddm
Copy link
Contributor Author

iddm commented Oct 26, 2018

Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program.

Thank you for your answer! But I have no idea how to reencode it back, gonna google. And how did you find this out, that the data was encoded twice?

@zeripath
Copy link
Contributor

The D bar characters in your screenshot told me that somewhere something was interpreting utf8 high bytes as separate characters rather than as part of an encoded single character.

There are two ways of that happening - the database is unaware that it has utf8 data so it prints out single bytes as characters which the receiving program thinks represents characters, so it reencodes them as utf8 characters - hence you see glyphs that would match the high bytes, or, data has been put into the database already in utf8 encoded bytes but which the db thinks are characters so it reencodes them as bytes.

Now it's difficult to actually see these things because most things nowadays do utf8 properly. You really need to check the bytestream at each point.

You should take a look at the wiki page for utf8 to learn about how it works. File encoding is a surprisingly difficult and fiddly topic in general and it's good to learn about it. Especially if your native language is not written plain old low-byte ASCII Latin.

@iddm
Copy link
Contributor Author

iddm commented Oct 27, 2018

Okay, I have done what you asked me to do and I still have the same result. Could you recommend me anything else?

@iddm
Copy link
Contributor Author

iddm commented Oct 28, 2018

Okay, this is still unanswered question. I have fixed it for myself so: I have just ignored the dump, installed fresh instance and migrated all of my 42 repositories from old instance, manually. Of course this is painful way but I was not able to find a good one unfortunately.

@zeripath
Copy link
Contributor

Ugh. That's obviously not an ideal situation. Sorry to hear that.

If you're still interested in finding out how to fix this, could you give me some more information?

  1. Did you use the gitea dump command line command to dump the database? Or did you dump from mariaDB directly?
  2. What were the settings of the mariaDB?
  3. If you try dumping your docker dB and reimporting into another docker dB does that still foul up the encoding?

@iddm
Copy link
Contributor Author

iddm commented Oct 28, 2018

Sad day, the old VDS instance where was my old gitea has just been deleted, so I can't tell you exactly what version of mariadb was there, but I remember that I was looking for it when I migrated so they must be the same on new VDS. I don't recall any special settings, I have just installed it via something like apt-get install mariadb and that's all. I tried to create dump via gitea-bin commands as it was told in the documentation. I have restored everything correctly but just this encoding issue happened, everything else was fine afaik.

And, perhaps, you forgot my problem: I migrated from gitea-bin on old vds instance to gitea-docker on new vds instance, I had not used gitea-docker before migration :)

@zeripath
Copy link
Contributor

I hadn't forgotten about the change to docker, I was just checking whether dumping was working in your new setup. If not there's a problem with the gitea's dumping in general, rather than something specific to your setup.

Basically you've just been bitten by a backup and restore problem, so you should ensure that your backups work now and if not fix it before you need to restore again in future. This is one of the benefits of docker, spinning up duplicate instances should be relatively cheap.

@iddm
Copy link
Contributor Author

iddm commented Nov 7, 2018

It is no longer an issue for me, I have done the work manually - by cloning all the repositories back into new instance with fresh gitea, without importing old dumps, so I probably can't provide any more information on this.

@lafriks
Copy link
Member

lafriks commented Nov 7, 2018

@vityafx can issue be closed then?

@lafriks lafriks added the type/question Issue needs no code to be fixed, only a description on how to fix it yourself. label Nov 7, 2018
@iddm
Copy link
Contributor Author

iddm commented Nov 7, 2018

Yes, but only because of that. :)

@iddm iddm closed this as completed Nov 7, 2018
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/question Issue needs no code to be fixed, only a description on how to fix it yourself.
Projects
None yet
Development

No branches or pull requests

3 participants