Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateHome fails #2995

Closed
kulmann opened this issue Jan 20, 2022 · 15 comments
Closed

CreateHome fails #2995

kulmann opened this issue Jan 20, 2022 · 15 comments
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Storage:OCIS Type:Bug

Comments

@kulmann
Copy link
Member

kulmann commented Jan 20, 2022

Describe the bug

Sometimes ™️ the CreateHome grpc request fails on first user login. As a result the logged in user sees a Resource not found page, which is yielded by a 404 from PROPFIND on the user home.
What I noticed in the filesystem is that the following folder structure was created, with a symlink to a non-existing folder:

.
├── blobs
├── nodes
│   └── root
│       └── ddc2004c-0977-11eb-9d3f-a793888cd0f8 -> ../ddc2004c-0977-11eb-9d3f-a793888cd0f8
├── spaces
│   ├── personal
│   └── share
├── trash
└── uploads

Steps to reproduce

Steps to reproduce the behavior:

  1. unknown, seemingly random

A hunch that I thought I had, but still couldn't reproduce it in a stable way:

  1. delete ~/.ocis
  2. start ocis single binary (in my case it was with a WEB_ASSET_PATH set to a locally build web): WEB_ASSET_PATH=/path/to/web/dist OCIS_INSECURE=true ./bin/ocis server
  3. try to login via https://localhost:9200 (failing)
  4. have a 2022/01/20 10:23:35 [ERR] mdns: Failed to bind to udp port: listen udp4 0.0.0.0:0: socket: too many open files listen udp6 [::]:0: socket: too many open files
  5. restart ocis with same command as above
  6. try to login again (succeeding)
  7. see the Resource not found page as described above
  8. see the ocis error log as quoted below

Again, I don't

Expected behavior

User home gets provisioned reliably

Actual behavior

User home sometimes doesn't get provisioned

Setup

oCIS single binary from current master at fdb2f55b77532c1ea52452664d8f7577e2bf8805

Additional context

oCIS error log

2022-01-18T10:48:44+01:00 ERR failed to create storage space error="Decomposedfs: root has no parent" pkg=rgrpc request={"name":"Admin","opaque":{"map":{"space_id":{"decoder":"plain","value":"ZGRjMjAwNGMtMDk3Ny0xMWViLTlkM2YtYTc5Mzg4OGNkMGY4"}}},"owner":{"display_name":"Admin","gid_number":30000,"groups":["users"],"id":{"idp":"https://localhost:9200","opaque_id":"ddc2004c-0977-11eb-9d3f-a793888cd0f8","type":1},"mail":"[email protected]","mail_verified":true,"opaque":{"map":{"roles":{"decoder":"json","value":"WyI3MTg4MTg4My0xNzY4LTQ2YmQtYTI0ZC1hMzU2YTJhZmRmN2YiXQ=="}}},"uid_number":20004,"username":"admin"},"type":"personal"} service=storage status={"code":15,"message":"error listing spaces","trace":"00000000000000000000000000000000"} traceid=00000000000000000000000000000000
2022-01-18T10:48:44+01:00 ERR error when calling Createhome error="gateway: grpc failed with code CODE_INTERNAL" service=proxy

OS

macOS 11.5.2
I only have 256 file descriptors configured, but then again, I'm not even sure that the too many open files mdns error is related in any way. Of course oCIS is not fully functioning when I have the error, but I'm not sure if this is the thing that leads to the error with the failing CreateHome.

@wkloucek
Copy link
Contributor

wkloucek commented Jan 20, 2022

Also happens on continuously deployed demo instances (eg. https://ocis.ocis-web.latest.owncloud.works/).

I did a dump of the user storage (storage-users.zip), that is similar to @kulmann s example, that can be use to reproduce the state oCIS is in after the CreateHome didn't succeed... Though it doesn't bring us futher to the answer why this happens at all.

tree ~/.ocis/storage/users/
/home/kloucek/.ocis/storage/users/
├── blobs
├── nodes
│   ├── 9bde6eee-1537-4228-a699-0a2afb6accc7
│   ├── ddc2004c-0977-11eb-9d3f-a793888cd0f8
│   └── root
│       ├── 4c510ada-c86b-4815-8820-42cdf82c3d51 -> ../4c510ada-c86b-4815-8820-42cdf82c3d51
│       └── ddc2004c-0977-11eb-9d3f-a793888cd0f8 -> ../ddc2004c-0977-11eb-9d3f-a793888cd0f8
├── spaces
│   ├── personal
│   │   └── ddc2004c-0977-11eb-9d3f-a793888cd0f8 -> ../../nodes/ddc2004c-0977-11eb-9d3f-a793888cd0f8
│   └── share
├── trash
└── uploads

12 directories, 1 file

@wkloucek
Copy link
Contributor

@butonic can you please have a look again?

@kulmann
Copy link
Member Author

kulmann commented Jan 20, 2022

@wkloucek did you see any too many open files log lines on the demo deployments? Would like to get a feeling if that's related or unrelated.

@wkloucek
Copy link
Contributor

Possible fix: cs3org/reva#2457

@pascalwengerter
Copy link
Contributor

Side note: Started the WOPI deployment example today and ran into this issue multiple times

@wkloucek
Copy link
Contributor

Side note: Started the WOPI deployment example today and ran into this issue multiple times

which version of oCIS did you use? (if latest, could you please post your docker image digest?)

@pascalwengerter
Copy link
Contributor

Side note: Started the WOPI deployment example today and ran into this issue multiple times

which version of oCIS did you use? (if latest, could you please post your docker image digest?)

sha256:55f2b4d71249d06d45241b83780a396eea83aa6c1d906fbbd1e5f90bf9e373e9

@wkloucek
Copy link
Contributor

55f2b4d71249d06d45241b83780a396eea83aa6c1d906fbbd1e5f90bf9e373e9

did you start from scratch (no volumes present before) or did you have pre-existing volumes? You can check that by running docker inspect ocis_web_ocis-data (which should also be name of the volume on your side). It will stat the creation date in the CreatedAt field.

Background: It was potentially fixed in cs3org/reva#2457. It is already included in your image but if you have a old volume, the bug will be persisted forever...

@pascalwengerter
Copy link
Contributor

55f2b4d71249d06d45241b83780a396eea83aa6c1d906fbbd1e5f90bf9e373e9

did you start from scratch (no volumes present before) or did you have pre-existing volumes? You can check that by running docker inspect ocis_web_ocis-data (which should also be name of the volume on your side). It will stat the creation date in the CreatedAt field.

Background: It was potentially fixed in cs3org/reva#2457. It is already included in your image but if you have a old volume, the bug will be persisted forever...

I thought I had removed the volumes before starting the newly pulled images, but "CreatedAt": "2021-10-14T09:50:18+02:00" suggests otherwise 🤔 weirddd

@pascalwengerter
Copy link
Contributor

55f2b4d71249d06d45241b83780a396eea83aa6c1d906fbbd1e5f90bf9e373e9

did you start from scratch (no volumes present before) or did you have pre-existing volumes? You can check that by running docker inspect ocis_web_ocis-data (which should also be name of the volume on your side). It will stat the creation date in the CreatedAt field.

Background: It was potentially fixed in cs3org/reva#2457. It is already included in your image but if you have a old volume, the bug will be persisted forever...

Had some issues because volumes seemed not to be deleted the way I expected them to...now they're all fresh but the problem persists, yes

@pascalwengerter
Copy link
Contributor

Just set up a new machine (and both pulled fresh docker images & compiled oCIS locally), still an issue

@rpocklin
Copy link
Contributor

rpocklin commented Feb 13, 2022

I have the issue with current owncloud/ocis:latest build (image id: 70d9f96b99e6) - seems to occur inconsistency (eg every 3 or 4th time when you provision a new docker image, even when deleting volumes).

After creating the container and logging in, (I was using admin:admin), every 3rd or 4th time the API request fails with a 404 (proxy makes request to storage-frontend which fails).

Screenshot
153524529-30244b9d-ac53-4acc-be89-8c4f40fa002f

Error Log (in container)

{"level":"debug","service":"storage","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.44.1-dev","from":"tcp://127.0.0.1:48818","uri":"/cs3.storage.registry.v1beta1.RegistryAPI/GetStorageProviders","start":"11/Feb/2022:01:30:07 +0000","end":"11/Feb/2022:01:30:07 +0000","time_ns":103300,"code":"OK","time":"2022-02-11T01:30:07Z","message":"unary"}

{"level":"error","service":"storage","pkg":"rgrpc","traceid":"00000000000000000000000000000000","error":"Decomposedfs: root has no parent","status":{"code":15,"message":"error creating space","trace":"00000000000000000000000000000000"},"request":{"opaque":{"map":{"space_id":{"decoder":"plain","value":"ZGRjMjAwNGMtMDk3Ny0xMWViLTlkM2YtYTc5Mzg4OGNkMGY4"}}},"owner":{"id":{"idp":"https://ocis.owncloud.test","opaque_id":"ddc2004c-0977-11eb-9d3f-a793888cd0f8","type":1},"username":"admin","mail":"[email protected]","mail_verified":true,"display_name":"Admin","groups":["users"],"opaque":{"map":{"roles":{"decoder":"json","value":"WyI3MTg4MTg4My0xNzY4LTQ2YmQtYTI0ZC1hMzU2YTJhZmRmN2YiXQ=="}}},"uid_number":20004,"gid_number":30000},"type":"personal","name":"Admin"},"time":"2022-02-11T01:30:07Z","message":"failed to create storage space"}

{"level":"debug","service":"storage","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.44.1-dev","from":"tcp://127.0.0.1:43778","uri":"/cs3.storage.provider.v1beta1.ProviderAPI/CreateStorageSpace","start":"11/Feb/2022:01:30:07 +0000","end":"11/Feb/2022:01:30:07 +0000","time_ns":127600,"code":"OK","time":"2022-02-11T01:30:07Z","message":"unary"}

{"level":"debug","service":"storage","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.44.1-dev","from":"tcp://127.0.0.1:48750","uri":"/cs3.gateway.v1beta1.GatewayAPI/CreateHome","start":"11/Feb/2022:01:30:07 +0000","end":"11/Feb/2022:01:30:07 +0000","time_ns":1073000,"code":"OK","time":"2022-02-11T01:30:07Z","message":"unary"}

2022-02-11T01:30:07Z ERR error when calling Createhome error="gateway: grpc failed with code CODE_INTERNAL" service=proxy

2022-02-11T01:30:07Z DBG director found path=/remote.php/webdav/ policy=ocis prefix=/remote.php/ routeType=prefix service=proxy

{"level":"debug","service":"storage","pkg":"rhttp","traceid":"00000000000000000000000000000000","time":"2022-02-11T01:30:07Z","message":"access token is already provided"}

I did not see any error relating to too many open files in my case (running in Ubuntu under WSL2 in Windows 10).

See #3152 for original (probable duplicate) bug report.

@nealey
Copy link

nealey commented Jun 7, 2022

I seem to be able to reproduce this 100% of the time if the backing filesystem is a tmpfs.

For example, with the following argument to docker run

-v /tmp/ocis/data:/var/lib/ocis

@micbar micbar added the Priority:p2-high Escalation, on top of current planning, release blocker label Jun 14, 2022
@butonic
Copy link
Member

butonic commented Jun 20, 2022

@nealey IIRC tmpfs does not support extended attributes in the user.* namespace, which is required by the decomposed filesystem ocis uses by default: https://cateee.net/lkddb/web-lkddb/TMPFS_XATTR.html Try using a different filesystem, or use a docker volume.

@butonic
Copy link
Member

butonic commented Jun 20, 2022

Reading this thread it seems to either be the default (low) ulimit on macos or conservative oses, or a filesystem without extattr support. There have also been a lot of fixes with regards to file locking and atomicity, so this should no longer be a problem. Closing. Please open a new issue if you think you are still encountering this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Storage:OCIS Type:Bug
Projects
Archived in project
Development

No branches or pull requests

7 participants