Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constantly restarts when started, fails to start (ZFS issue?) #135

Closed
Xynonners opened this issue Nov 13, 2023 · 8 comments
Closed

Constantly restarts when started, fails to start (ZFS issue?) #135

Xynonners opened this issue Nov 13, 2023 · 8 comments

Comments

@Xynonners
Copy link

Nov 12 23:52:55 shadowlands systemd[1]: nohang.service: Scheduled restart job, restart counter is at 5.
Nov 12 23:52:55 shadowlands systemd[1]: nohang.service: Start request repeated too quickly.
Nov 12 23:52:55 shadowlands systemd[1]: nohang.service: Failed with result 'exit-code'.
Nov 12 23:52:55 shadowlands systemd[1]: Failed to start Sophisticated low memory handler.

After setting StartLimitBurst=0

Nov 12 23:53:37 shadowlands nohang[197674]:   File "/usr/bin/nohang", line 1918, in check_mem_swap_ex
Nov 12 23:53:37 shadowlands nohang[197674]:     mem_available, swap_total, swap_free = check_mem_and_swap()
Nov 12 23:53:37 shadowlands nohang[197674]:                                            ^^^^^^^^^^^^^^^^^^^^
Nov 12 23:53:37 shadowlands nohang[197674]:   File "/usr/bin/nohang", line 1344, in check_mem_and_swap
Nov 12 23:53:37 shadowlands nohang[197674]:     ma += arcstats()
Nov 12 23:53:37 shadowlands nohang[197674]:           ^^^^^^^^^^
Nov 12 23:53:37 shadowlands nohang[197674]:   File "/usr/bin/nohang", line 183, in arcstats
Nov 12 23:53:37 shadowlands nohang[197674]:     elif n == arc_meta_min_index:
Nov 12 23:53:37 shadowlands nohang[197674]:               ^^^^^^^^^^^^^^^^^^
Nov 12 23:53:37 shadowlands nohang[197674]: NameError: name 'arc_meta_min_index' is not defined. Did you mean: 'arc_meta_used_index'?

Obviously, restart counter skyrockets.

Nov 12 23:53:45 shadowlands systemd[1]: nohang.service: Scheduled restart job, restart counter is at 274.
@hakavlad
Copy link
Owner

hakavlad commented Nov 13, 2023

Thanks for the report. What Linux distribution and kernel are you using?

I probably won't be able to fix this quickly.

@hakavlad
Copy link
Owner

Please show cat /proc/spl/kstat/zfs/arcstats

@Xynonners
Copy link
Author

@hakavlad
image

9 1 0x01 147 39984 10456012352 15310549183646
name                            type data
hits                            4    432253
iohits                          4    1400
misses                          4    2605
demand_data_hits                4    64066
demand_data_iohits              4    1
demand_data_misses              4    1010
demand_metadata_hits            4    365773
demand_metadata_iohits          4    579
demand_metadata_misses          4    809
prefetch_data_hits              4    0
prefetch_data_iohits            4    0
prefetch_data_misses            4    1
prefetch_metadata_hits          4    2414
prefetch_metadata_iohits        4    820
prefetch_metadata_misses        4    785
mru_hits                        4    59344
mru_ghost_hits                  4    0
mfu_hits                        4    372909
mfu_ghost_hits                  4    0
uncached_hits                   4    0
deleted                         4    31
mutex_miss                      4    0
access_skip                     4    0
evict_skip                      4    3
evict_not_enough                4    0
evict_l2_cached                 4    0
evict_l2_eligible               4    599552
evict_l2_eligible_mfu           4    131072
evict_l2_eligible_mru           4    468480
evict_l2_ineligible             4    4096
evict_l2_skip                   4    0
hash_elements                   4    1289489
hash_elements_max               4    1289492
hash_collisions                 4    99253
hash_chains                     4    89577
hash_chain_max                  4    4
meta                            4    1073741824
pd                              4    2147483648
pm                              4    2147483648
c                               4    2100819456
c_min                           4    2100819456
c_max                           4    33613111296
size                            4    231617760
compressed_size                 4    36570624
uncompressed_size               4    75461120
overhead_size                   4    63649280
hdr_size                        4    692928
data_size                       4    37670912
metadata_size                   4    62548992
dbuf_size                       4    1758512
dnode_size                      4    3501072
bonus_size                      4    1664128
anon_size                       4    0
anon_data                       4    0
anon_metadata                   4    0
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    48380416
mru_data                        4    329216
mru_metadata                    4    48051200
mru_evictable_data              4    0
mru_evictable_metadata          4    9293824
mru_ghost_size                  4    0
mru_ghost_data                  4    0
mru_ghost_metadata              4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    51839488
mfu_data                        4    37341696
mfu_metadata                    4    14497792
mfu_evictable_data              4    0
mfu_evictable_metadata          4    20480
mfu_ghost_size                  4    0
mfu_ghost_data                  4    0
mfu_ghost_metadata              4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
uncached_size                   4    0
uncached_data                   4    0
uncached_metadata               4    0
uncached_evictable_data         4    0
uncached_evictable_metadata     4    0
l2_hits                         4    772
l2_misses                       4    1669
l2_prefetch_asize               4    356007936
l2_mru_asize                    4    446973075968
l2_mfu_asize                    4    61340277248
l2_bufc_data_asize              4    506269115392
l2_bufc_metadata_asize          4    2400245760
l2_feeds                        4    14936
l2_rw_clash                     4    0
l2_read_bytes                   4    5741056
l2_write_bytes                  4    6633984
l2_writes_sent                  4    83
l2_writes_done                  4    83
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    587133774336
l2_asize                        4    508669361152
l2_hdr_size                     4    123542112
l2_log_blk_writes               4    0
l2_log_blk_avg_asize            4    12941
l2_log_blk_asize                4    23464960
l2_log_blk_count                4    1260
l2_data_to_meta_ratio           4    82398
l2_rebuild_success              4    1
l2_rebuild_unsupported          4    0
l2_rebuild_io_errors            4    0
l2_rebuild_dh_errors            4    0
l2_rebuild_cksum_lb_errors      4    0
l2_rebuild_lowmem               4    0
l2_rebuild_size                 4    587130350080
l2_rebuild_asize                4    508664224768
l2_rebuild_bufs                 4    1287720
l2_rebuild_bufs_precached       4    51
l2_rebuild_log_blks             4    1260
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
memory_all_bytes                4    67226222592
memory_free_bytes               4    53826768896
memory_available_bytes          3    45236834304
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    193707744
arc_dnode_limit                 4    3361311129
async_upgrade_sync              4    10
predictive_prefetch             4    4002
demand_hit_predictive_prefetch  4    159
demand_iohit_predictive_prefetch 4    71
prescient_prefetch              4    18
demand_hit_prescient_prefetch   4    14
demand_iohit_prescient_prefetch 4    4
arc_need_free                   4    0
arc_sys_free                    4    8589934592
arc_raw_size                    4    16457728
cached_only_in_progress         4    0
abd_chunk_waste_size            4    239104

@flaviut
Copy link
Contributor

flaviut commented Nov 22, 2023

arc_meta_min is gone since openzfs/zfs@a8d83e2

Is this much simpler implementation correct?

if ZFS:
    log('WARNING: ZFS found. Available memory will not be calculated '
        'correctly (issue#89)')
    try:
        # find indexes
        with open(arcstats_path, 'rb') as f:
            a_list = f.read().decode().split('\n')
        for n, line in enumerate(a_list):
            if line.startswith('size '):
                size_index = n
            else:
                continue
    except Exception as e:
        log(e)
        ZFS = False

…

def arcstats():
    """
    """
    with open(arcstats_path, 'rb') as f:
        a_list = f.read().decode().split('\n')

    for n, line in enumerate(a_list):
        if n == size_index:
            size = int(line.rpartition(' ')[2]) / 1024
        else:
            continue

    zfs_available = size
    # all ARC bytes are available
    return zfs_available

@stevleibelt
Copy link

@hakavlad

Thanks for your wonderful software.

I'm having the same issues.

ArchLinux 6.6.2-arch1-1
zfs-2.2.1-1
zfs-kmod-2.2.1-1
zpool upgraded to latest version

Since than, the reported message is created in the journal when I want to start nohang.

Best regards,
Stev

@WildPenquin
Copy link

Hi,

Same issue here.

local/zfs-dkms-2.2.2-1
local/zfs-utils-2.2.2-1
Linux-6.6.7-zen1-1-zen

Moreover, there are no zfs filesystems mounted nor pools imported at the moment (there will be in near future, and I've been testing zfs with files / images).

@hakavlad
Copy link
Owner

hakavlad commented Dec 17, 2023

Possible hotfix: disable arcstats monitoring: replace this string

ZFS = os.path.exists(arcstats_path)

with

ZFS = False

in nohang src.

flaviut added a commit to flaviut/nohang that referenced this issue Dec 31, 2023
- more reliable and readable stats file parsing
- uses original logic for older versions of zfs
  - I'm not convinced the original logic is right after reading
  https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSARCItsVariousSizes,
  but I don't want to potentially break things
@mauromorales
Copy link

I'm hitting the same issue on Ubuntu 23.10, also no ZFS on any of my devices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants