Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configured share_folder is removed from filecache when storage is unavailable #33485

Open
mdusher opened this issue Nov 12, 2018 · 9 comments
Open
Labels

Comments

@mdusher
Copy link
Contributor

mdusher commented Nov 12, 2018

We've been experiencing an issue where our "Shared" directory is removed from the filecache for some (not all) users when our underlying storage becomes unavailable (ie. we take it offline for an upgrade or an unplanned outage).

This causes the affected user's to also lose all their current shares (I suspect a background job is cleaning them up when the folder no longer exists in the filecache) and it also appears in the activity log as the user deleting the folder (which is not the case).

I've been unable to pin down exactly what is causing this to happen as it is an event that occurs pretty irregularly and it seems pretty hard to replicate in our test environment.

My suspicion is that one of the following is may be part of the cause but have been unable to confirm it:

  1. Having 'share_folder' configured in config.php
  2. ownCloud cron jobs are still running when the storage becomes unavailable and removes the share folder as it

Any suggestions to troubleshoot this are welcome!

Steps to reproduce

Unknown, the only consistent symptom is that it occurs when our file system becomes unavailable.

Expected behaviour

Shared directory is not removed from the filecache when the storage becomes unavailable.

Actual behaviour

Shared directory is removed from the filecache when the storage becomes unavailable.

Server configuration

Operating system: RHEL7

Web server: Apache 2.4.6

Database: MariaDB 10.0.28

PHP version: PHP-FPm 7.0.30

ownCloud version: 10.0.3

Updated from an older ownCloud or fresh install: Updated

Where did you install ownCloud from: TAR on the ownCloud website

Signing status (ownCloud 9.0 and above): Integrity checker has been disabled. Integrity cannot be verified.

The content of config/config.php:

{
    "system": {
        "instanceid": "5230042dc1897",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": {
            "0": "cloudstor.aarnet.edu.au",
        },
        "datadirectory": "\/cloudstor\/data\/owncloud\/data",
        "version": "10.0.3.3",
        "dbtype": "mysql",
        "dbname": "owncloudstable82",
        "dbhost": "127.0.0.1:6033",
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "dbtableprefix": "",
        "installed": true,
        "operation.mode": "clustered-instance",
        "default_language": "en_GB",
        "defaultapp": "files",
        "knowledgebaseenabled": true,
        "enable_avatars": false,
        "allow_user_to_change_display_name": false,
        "session_lifetime": 86400,
        "session_keepalive": true,
        "token_auth_enforced": false,
        "mail_domain": "aarnet.edu.au",
        "mail_from_address": "cloudstor-noreply",
        "mail_smtpmode": "php",
        "overwriteprotocol": "https",
        "overwrite.cli.url": "https:\/\/cloudstor.aarnet.edu.au\/plus",
        "htaccess.RewriteBase": "\/plus",
        "trashbin_retention_obligation": "30, 60",
        "appcodechecker": false,
        "updatechecker": false,
        "has_internet_connection": true,
        "check_for_working_webdav": false,
        "check_for_working_htaccess": true,
        "log_type": "owncloud",
        "logfile": "\/cloudstor\/logs\/owncloud\/owncloud.log",
        "loglevel": 2,
        "logtimezone": "UTC",
        "log_query": false,
        "customclient_desktop": "https:\/\/cloudstor.aarnet.edu.au\/client-download\/",
        "customclient_android": "https:\/\/play.google.com\/store\/apps\/details?id=au.edu.aarnet.cloudstor.android",
        "customclient_ios": "https:\/\/itunes.apple.com\/au\/app\/cloudstor\/id1215476371?mt=8",
        "cron_log": true,
        "appstore.experimental.enabled": false,
        "apps_paths": [
            {
                "path": "\/cloudstor\/www\/owncloud\/apps",
                "url": "\/apps",
                "writable": true
            },
            {
                "path": "\/cloudstor\/www\/owncloud\/3rdparty-apps",
                "url": "\/3rdparty-apps",
                "writable": true
            }
        ],
        "enable_previews": true,
        "enabledPreviewProviders": [
            "OC\\Preview\\PNG",
            "OC\\Preview\\JPEG",
            "OC\\Preview\\GIF",
            "OC\\Preview\\BMP",
            "OC\\Preview\\XBitmap",
            "OC\\Preview\\TXT",
            "OC\\Preview\\MarkDown",
            "OC\\Preview\\Illustrator",
            "OC\\Preview\\Postscript",
            "OC\\Preview\\Photoshop",
            "OC\\Preview\\Movie"
        ],
        "maintenance": false,
        "singleuser": false,
        "memcache.local": "\\OC\\Memcache\\APCu",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "redis.cluster": {
            "seeds": [
                "127.0.0.1:6379"
            ],
            "timeout": 0,
            "read_timeout": 0,
            "failover_mode": 2
        },
        "memcached_servers": [
            [
                "127.0.0.1",
                11211
            ]
        ],
        "blacklisted_files": [
            ".htaccess"
        ],
        "share_folder": "\/Shared",
        "cipher": "AES-256-CFB",
        "minimum.supported.desktop.version": "2.4.2",
        "quota_include_external_storage": false,
        "filesystem_check_changes": 0,
        "filesystem_cache_readonly": false,
        "forwarded_for_headers": [
            "HTTP_X_FORWARDED",
            "HTTP_FORWARDED_FOR"
        ],
        "filelocking.enabled": false,
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "upgrade.disable-web": true,
        "upgrade.automatic-app-update": false,
        "integrity.check.disabled": true,
        "cache_path": "\/cloudstor\/data\/tmp",
        "tempdirectory": "\/cloudstor\/data\/tmp",
        "mail_smtpdebug": false,
        "mail_smtphost": "smtp.aarnet.edu.au",
        "mail_smtpport": "25",
        "mail_smtptimeout": 10,
        "preview_office_cl_parameters": "",
        "preview_max_scale_factor": 10,
        "preview_max_filesize_image": 100,
        "openssl": [],
        "activity_expire_days": 365,
    }
}

List of activated apps:

Enabled:
  - activity: 2.3.4
  - cloudstortheme: 1.0.0
  - collections: 1.1.1
  - comments: 0.3.0
  - configreport: 0.1.1
  - dav: 0.3.0
  - dicomviewer: 0.0.6
  - federatedfilesharing: 0.3.1
  - federation: 0.1.0
  - files: 1.5.1
  - files_clipboard: 0.6.4
  - files_external: 0.7.1
  - files_jmol: 0.0.1
  - files_pdfviewer: 0.8.2
  - files_sharing: 0.10.1
  - files_texteditor: 2.2
  - files_thingiview: 0.0.1
  - files_trashbin: 0.9.1
  - files_versions: 1.3.0
  - files_videoplayer: 0.9.8
  - filescan: 0.0.1
  - filesenderapp: 1.0
  - firstrunwizard: 1.1
  - gallery: 16.1.0
  - impersonate: 0.1.0
  - market: 0.2.2
  - music: 0.9.2
  - notifications: 0.3.1
  - onlyoffice: 1.3.0
  - password_policy: 2.0.0
  - provisioning_api: 0.5.0
  - security: 0.0.2
  - updatenotification: 0.2.1
  - user_saml: 0.4
Disabled:
  - encryption
  - external
  - files_antivirus
  - systemtags
  - templateeditor
  - user_external

Are you using external storage, if yes which one: No

Are you using encryption: No

Are you using an external user-backend, if yes which one: No

@ownclouders
Copy link
Contributor

GitMate.io thinks the contributors most likely able to help are @ownclouders, and @PVince81.

Possibly related issues are #240 (--- Removed ---), #3260 (- Removed -), #31165 (remove duplicated storages while checking the filecache corruption), and #29708 (Case sensitive usernames when logging in with an app password via webdav).

@PVince81
Copy link
Contributor

I suspect a background job is cleaning them up when the folder no longer exists in the filecache

Yes, there's one.

Not sure why the filecache would clean itself.

The question is also "how" is the storage not available. Is it a NFS mount that is missing ?

ownCloud has some checks in place to find out if the data folder is missing (when ".ocdata" does not exist) or for user homes if the "files" subdir is missing and the user already logged in before. In both these cases, it would throw a "StorageNotAvailableException" (mapped to 503) which tells the clients to go away when accessing over Webdav.

Maybe in your setup the outage looks different on the FS level in a way that would make OC unable to detect this with the current mechanisms ?
Assuming you're not talking about external storages but the regular home storages.

@mdusher ^

@mdusher
Copy link
Contributor Author

mdusher commented Nov 15, 2018

@PVince81

In terms of storage, I'm referring to the user's home directories. We're running CERN's EOS as our underlying storage and interact with it via a FUSE mount. In terms of defining "unavailable", I mean when we've experienced an issue and the storage has crashed or we've taken it down on purpose for maintenance.

I've definitely encountered the StorageNotAvailableException before and narrowed it down to the .ocdata file being unavailable so that part definitely works! My suspicion is that something is running that checks for .ocdata at the start of it's run and then goes through the rest of it's logic with the assumption that the storage is there (which is why my suspicion was a cron job).

I'm wondering if there is a way to possibly exclude the directory set in "share_folder" from being removed from the filecache?

@PVince81
Copy link
Contributor

PVince81 commented Nov 16, 2018

In this specific scenario about unavailability, I think the filecache should stay untouched.

It is also likely that you ran into a race condition where some PHP request already went past the "ocdata" check and continued processing. Or maybe the cron job was already running and the test had already been done at the time the storage was suddenly gone.

This would only happen for a single cron job or single PHP request, so I'd find it strange if subsequent PHP requests would also bypass the check for whatever reason and result in empty folders.

Did everything disappear from file cache or really just that one "share_folder" ?

Things to verify:

  • whether cron jobs, especially the "cleanup orphaned shares" properly triggers "ocdata" check
  • what happens if such cron job is running while in parallel the storage becomes available
  • what other operations might behave badly if storage becomes unavailable after ocdata check.
  • investigate "share_folder" code paths to find out what could cause that one folder alone to disappear

@mdusher
Copy link
Contributor Author

mdusher commented Nov 18, 2018

@PVince81
I completely agree that it's quite likely it's already past the "ocdata" check. Due to the size of our installation, we are taking advantage of the ability to run cron.php multiple times asynchronously (we get terribly behind on the jobs otherwise).

As far as I can tell, it is just the "share_folder" that goes missing from the filecache (it's the only one that gets reported by our users).

@mdusher
Copy link
Contributor Author

mdusher commented Mar 8, 2019

Is there any updates on this issue?

@micbar
Copy link
Contributor

micbar commented Mar 11, 2019

@pvince @mdusher
I just had a customer Case where files disappeared in the /Shared folder while having network issues with their primary storage (NFS mount).

I wonder if the file_exists() check on .ocdata can get a cached result when NFS Caching is on (which is by default).

The customer reported that the nfs mount disappeared and the sync clients were able to upload files to a new folder at the same location where the nfs mount usually is.

@cdamken
Copy link
Contributor

cdamken commented Jul 8, 2019

@micbar I'm able to reproduce the problem, please contact me

@mdusher
Copy link
Contributor Author

mdusher commented Nov 26, 2019

I've been chasing problems with the Shared folder and shares being removed on our side for a while and think I have come up with a fairly reliable theory on what is happening on our side.

We are using CERN's EOS storage as our backend provider via a fuse mount provided by eosd.

After some testing, it seems when that fuse mount is under high I/O load or is unavailable - file_exists() will return false (I'm guessing because it doesn't get a timely response). This also happens when you stop the eosd service (which stops providing the fuse mount).

My theory is that when the OC\Files\ScanFiles background job runs to clean up the filecache table, it's been encountering this edge case which results in file_exists() return false and so file metadata being removed from the filecache table. Following that, the OC\Files_Sharing\DeleteOrphanedShares job runs, which then deletes the shares that point to file ids that do not exist in the filecache table.

Now, my understanding is that when cron.php (or index.php) is run, it checks for the existence of .ocdata in the data directory. However, because the background tasks executed via cron.php can run for up to 15 minutes - it's entirely possible that the storage might become unresponsive during that time and it will continue to perform tasks assuming that the storage is still there.

As a test, I've modified remove() and removeChildren() in lib/private/Files/Cache/Cache.php to run \OC_Util::checkDataDirectoryValidity() before performing the DELETE query and exit quietly if it reports any errors (see: mdusher@e4dcf56)

While this is technically a performance hit when a user deletes a file or folder, we are willing to take a hit in delete performance rather than having a user lose access to their data due to an edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants