Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation violations with Ubuntu version #9

Open
pielambr opened this issue Jul 31, 2023 · 17 comments
Open

Segmentation violations with Ubuntu version #9

pielambr opened this issue Jul 31, 2023 · 17 comments
Labels
bug Something isn't working

Comments

@pielambr
Copy link

Describe the bug

Since updating to the latest version, which includes the bumping of the Ubuntu version, we are repeatedly getting segmentation violations - around one every 6 to 10 minutes
image

This goes away when reverting to an earlier version.

Steps to Reproduce

  • Use the latest tag
  • Send a bunch of requests
  • Starts emitting segmentation violations every so often

Expected behavior

No segmentation violations

Environment (please complete the following information):

The container is running inside of a Kubernetes cluster on Google Cloud Services

Pastebin/Screenshots

image

Additional context

References

@pielambr pielambr added the bug Something isn't working label Jul 31, 2023
@orangejulius
Copy link
Member

Hi @pielambr, we've noticed this as well. I wouldn't be surprised if it's related to pelias/docker-libpostal_baseimage#12. There have been some changes lately in libpostal that cause segfaults. While some may be fixed it's very likely some issues still remain.

We can try reverting to an older commit of libpostal again, stay tuned.

@missinglink
Copy link
Member

I think we might be able to move back up to HEAD since openvenues/libpostal#632 was merged, I'll try building and releasing a new docker image tomorrow.

@missinglink
Copy link
Member

@pielambr do you have an example query which caused the fault which I can use to confirm the fix?

@pielambr
Copy link
Author

pielambr commented Aug 2, 2023

@missinglink I'm afraid not, we just observed it in production that the pod went down quite often, usually with larger paragraphs of text.

@missinglink
Copy link
Member

It seems the latest docker image already includes code from the PR I linked above.

@pielambr can you please tell me which version of the docker image you are running?

docker images
REPOSITORY                 TAG       IMAGE ID       CREATED      SIZE
pelias/libpostal-service   latest    846cd5bdb6db   9 days ago   2.3GB

@missinglink
Copy link
Member

Could you please add some instrumentation to capture the query causing the segfault if possible?

From what I'm seeing here it's difficult to resolve this issue without knowing which version(s) and which query(ies) are causing it.

@missinglink
Copy link
Member

After some trial and error I was able to get 846cd5bdb6db to segfault by increasing the input query length, this is the query which finally caused it to fail on my machine:

30 w 26th st, new york, ny,30 w 26th st, new york, ny,30 w 26th st, new york, ny,30 w 26th st, new york, ny

@missinglink
Copy link
Member

What I'll do is to revert to the last known stable version and write up an issue on the libpostal repo to make them aware, it seems to be affecting HEAD so maybe there was a regression introduced.

@missinglink
Copy link
Member

missinglink commented Aug 2, 2023

Okay, bad news, I rebuilt this image pinned to an older version of our libpostal baseimage and I was still able to trigger the segfault by sending 5 to 10 long ugly queries like the one above.

diff --git a/Dockerfile b/Dockerfile
index c91a18c..5c85161 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
 # build the libpostal-server binary separately
-FROM pelias/libpostal_baseimage as builder
+FROM pelias/libpostal_baseimage:pin-to-version-that-builds-2023-07-04-5f89119a11fbcce5df475eba9a3f337181d2d8ad as builder

 RUN apt-get update && apt-get install -y make pkg-config build-essential

@missinglink
Copy link
Member

missinglink commented Aug 2, 2023

It's not clear when exactly the regression was introduced but I checked out an old version from 2021-11-03 and it isn't affected, so that can provide a bookend for the bisect.

I don't have loads more time to spend on this today but if someone could provide more information about which versions between master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e and master-2023-07-23-c289dda8d47cb6d21b2a1aa74e68cb5e9d12a872 work or don't work that would be super useful to getting this resolved 🙏

docker run -d -p 4400:4400 pelias/libpostal-service:master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e

@missinglink
Copy link
Member

In fact there have been fairly few releases since 2021 due to not much activity on the upstream repos:

Screenshot 2023-08-02 at 13 34 49

@pielambr
Copy link
Author

pielambr commented Aug 2, 2023

If I have some spare time, I'll have a look at what version introduced it for us, but that might be a while. We currently reverted all the way to version ca4ffcc just to make sure, because it was blocking production.

@mreid-exiger
Copy link

Hi folks, I am seeing this issue as well. I've tried the following images:

master-2023-07-23-c289dda8d47cb6d21b2a1aa74e68cb5e9d12a872 <- crash
master-2023-07-16-d6483672db70596a2ee0d97782567b12917c6ae6 <- crash
master-2023-07-04-b02f6f14cfe2dbf2dfee9e458a372f0aca13caa4 <- no crash
master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e <- no crash

I haven't done a huge amount of testing, but the crash is pretty easy to reproduce, occurring after about ~500 requests. The 2023-07-04 image appears to be the latest that is holding up for 1000s of requests in my environment (Kubernetes with 4Gi mem limit).

@missinglink
Copy link
Member

Thanks for the continued reports, they are helpful to discover which versions are affected.

These memory issues are being discussed over on the main libpostal issue tracker and we hope to adopt the patches as soon as they are available.

We would be happy to accept some code in this repo which could reliably cause the CI to crash (and therefore docker images not created) so that no new releases could be generated until it is fixed upstream.

@mreid-exiger
Copy link

@missinglink perhaps my message was formatted a little confusingly. The crashing images that I've tested are:

  • master-2023-07-23-c289dda8d47cb6d21b2a1aa74e68cb5e9d12a872
  • master-2023-07-16-d6483672db70596a2ee0d97782567b12917c6ae6

The images which I've tested that appear stable are:

  • master-2023-07-04-b02f6f14cfe2dbf2dfee9e458a372f0aca13caa4
  • master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e

@missinglink
Copy link
Member

Got it thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants