Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs load succeeded but no file processed #18043

Closed
wwq2333 opened this issue Aug 22, 2023 · 3 comments
Closed

fs load succeeded but no file processed #18043

wwq2333 opened this issue Aug 22, 2023 · 3 comments
Labels
type-bug This issue is about a bug

Comments

@wwq2333
Copy link

wwq2333 commented Aug 22, 2023

Alluxio Version:
2.9.3

Describe the bug

for the preheated alluxio cluster, when the worker is restarted due to oom, use fs loadmeta and fs load --verify to warm up again;

./bin/alluxio fs load / --submit --partial-listing --verify
./bin/alluxio fs load / --progress
Progress for loading path '/':
	Settings:	bandwidth: unlimited	verify: true
	Job State: SUCCEEDED
	Files Processed: 0
	Bytes Loaded: 0B
	Throughput: 0B/s
	Block load failure rate: 0.00%
	Files Failed: 0

But using fs ls can still find that there are no cached files. and alluxio cluster has enough capacity:

./bin/alluxio fs ls /write_newtrain_facepoints_temp.py
           1507       PERSISTED 07-26-2023 12:54:49:659   0% /write_newtrain_facepoints_temp.py


fsadmin report
Total Capacity: 3120.00GB
        Tier: SSD  Size: 3120.00GB
    Used Capacity: 777.11GB
        Tier: SSD  Size: 777.11GB
    Free Capacity: 2342.89GB

master, worker no err log, try warming up a specific file instead of /

./bin/alluxio fs ls /write_newtrain_facepoints_temp.py
           1507       PERSISTED 07-26-2023 12:54:49:659   0% /write_newtrain_facepoints_temp.py


./bin/alluxio fs load /write_newtrain_facepoints_temp.py --submit --partial-listing --verify
Load '/write_newtrain_facepoints_temp.py' is successfully submitted. JobId: c2facde4-d3f4-4f1d-be0a-4a26aa9ca78e


./bin/alluxio fs load /write_newtrain_facepoints_temp.py --progress
Progress for loading path '/write_newtrain_facepoints_temp.py':
	Settings:	bandwidth: unlimited	verify: true
	Job State: FAILED (alluxio.exception.runtime.InternalRuntimeException: Job failed because it's not healthy.)
	Files Processed: 2052
	Bytes Loaded: 7.36KB
	Block load failure rate: 99.76%
	Files Failed: 1


./bin/alluxio fs ls /write_newtrain_facepoints_temp.py
           1507       PERSISTED 07-26-2023 12:54:49:659 100% /write_newtrain_facepoints_temp.py

worker no err log, master err log:

2023-08-17 06:18:36,790 WARN  [master-rpc-executor-TPE-thread-297](DefaultBlockMaster.java:1133) - Rejecting attempt to change block length from 1507 to 0
2023-08-17 06:18:36,792 WARN  [master-rpc-executor-TPE-thread-333](DefaultBlockMaster.java:1133) - Rejecting attempt to change block length from 1507 to 0
2023-08-17 06:18:36,796 WARN  [master-rpc-executor-TPE-thread-402](DefaultBlockMaster.java:1133) - Rejecting attempt to change block length from 1507 to 0
2023-08-17 06:18:36,798 WARN  [master-rpc-executor-TPE-thread-405](DefaultBlockMaster.java:1133) - Rejecting attempt to change block length from 1507 to 0

Additional context

@wwq2333 wwq2333 added the type-bug This issue is about a bug label Aug 22, 2023
@ssz1997
Copy link
Contributor

ssz1997 commented Aug 25, 2023

@jja725 Could you take a look?

@jja725
Copy link
Contributor

jja725 commented Aug 25, 2023

As we discussed in the meeting, this seems like a master metadata issue since the master thinks the data is already in the cluster but the new worker actually doesn't have the data(no Persistency). This is not reproducible in the test environment.
In the new architecture this should not happen since we check file cache status at worker side

alluxio-bot pushed a commit that referenced this issue Sep 14, 2023
### What changes are proposed in this pull request?

Check for more results when the filtered result is null when loading data.

### Why are the changes needed?
Fix #18043.


			pr-link: #18133
			change-id: cid-a05d66db230971ba6585b732c7bb2990ba02f7f7
@jja725
Copy link
Contributor

jja725 commented Sep 15, 2023

Fixed by #18133

@jja725 jja725 closed this as completed Sep 15, 2023
maobaolong pushed a commit to maobaolong/alluxio that referenced this issue Jan 3, 2024
### What changes are proposed in this pull request?

Check for more results when the filtered result is null when loading data.

### Why are the changes needed?
Fix Alluxio#18043.

			pr-link: Alluxio#18133
			change-id: cid-a05d66db230971ba6585b732c7bb2990ba02f7f7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

3 participants