-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty EA metadata leads to assert in ad_header_read_ea() #368
Comments
@PuffyRainbowCloud Please share a few hundreds of lines of syslog from when you're attempting to start up netatalk. F.e. do this on the terminal: The lockfile message is a red herring, I think. If I'm not mistaken it's a temporary state that resolves itself. If you see it repeatedly while netatalk is running you may want to check the permissions of that lockfile and make sure it's readable by everyone. |
So, I have an interesting update: I believe I managed to comply with your request by running The output consists mainly of some error from deluged. This is everything about Netatalk: |
One way to test if the AFP server is running is to do
Also, please note that if you went through the guide for installing from source, the resulting netatalk daemon will be reading from configuration files in a different location than the deb netatalk. This is most likely why you're not seeing any AFP shares on your Mac. To check which afp.conf file it's reading you can do
Then make sure that your configurations are reflected in |
Alright, running that gives me this: AFP reply from localhost:548 via IPv4 Network address: 192.168.1.3 (IPv4 address) Which looks promising, right? |
Hold several horses! I removed the semicolons from the lines in apf.conf and now I do see shares when I connect manually vi Go > Connect to server... and I can successfully connect to and mount them via AFP. However, the shares that show up in finder are still SMB rather than AFP. |
Is Zeroconf support enabled in netatalk? |
I'm not sure. How would I find that out? |
If you look at my earlier comment you can see the sample output for |
Hm. Indeed, despite following the guide, which should set that up, I get this: $ netatalk -V This program is free software; you can redistribute it and/or modify it under netatalk has been compiled with support for these features:
|
Well there's your problem right there. So either Zeroconf support wasn't built into netatalk (i.e. libraries not found at configuration time) or is dynamically disabled (i.e. turned off in afp.conf). First you can try to explicitly enable Zeroconf in afp.conf ( As a next step run the configure script again with the same parameters you used previously, and then observe the summary output at the end. Under "Options" you should see something like:
|
I figured out what happened. I made a type so I literally didn't install libavahi-client-dev. I can't believe I missed the error message on that. Note to self: copy package names, even when they're listed instead of being a copy-paste-able command. Double check apt. A configure now shows these options:
I'm gonna go ahead and install this and see how that goes. |
That did it. I feel immensely silly now. I guess I will forever wonder why my original install stopped working but at least I don't have to worry about it for now. If the problem returns in the future I'll have to make a new issue and hopefully troubleshoot it properly. Is there anything else I should think about or know before closing the issue? |
Well I hope it was a learning experience for you at least! I'm a little bit concerned about your original issue. My initial guess is that it was the same as #236 but if updating to 3.1.15 proper solved the problem then it may have been something else. |
It certainly was! I have very little experience when it comes to compiling software so every opportunity to mess around with that is good. I'm going to keep my eyes peeled for further issues, definitely. Thank you for all of your help! |
Alright. So, some further information that's arisen since I thought this was fixed yesterday: Aug 09 19:26:10 servercluwub afpd[2860]: ad_header_read_ea("/media/alice/Filserver"): invalid metadata EA this is now being treated as a fatal error. if you see this log entry, please file a bug ticket with your upstream vendor and attach the generated core file. I was aware of this error message and tried to research it yesterday but didn't come up with a solution. I've tried to rebuild the CNID databases for my shares but that didn't make a difference. What's weird is that the share seems to work now, despite the PID error message returning after further restarts of the service as well as these EA errors. Not quite sure what to make of it. |
This is what I expected. It means your issue is a duplicate with #236 as I first thought. If you have some time to spare for troubleshooting, I have a PR at #363 with additional logging to help pinpoint which metadata is causing trouble. I have not been able to reproduce the issue myself. It seems to be caused by the root of the volume having EA metadata that was created with an earlier version of netatalk, which is now treated as invalid. Also, please run |
Fascinating! I have all of the time in the world but very little knowledge or experience so I'm gonna need some handholding. When you say you have a PR, what does that mean and what am I expected to do to help? I don't seem to have that command nor can i find a package by that name using apt search. Is it part of another package? |
Also, I have again lost the ability to stay connected and access files. When trying to log on it fails and I now get messages like these in journalctl: Aug 09 19:45:23 servercluwub afpd[3748]: afp_disconnect: primary reconnect failed |
Great, thanks for offering to help out! If you look at the PR that I linked to, it says that the git branch name is "rdmark-issue-236". Therefore you want to check out this branch. Do something like |
I assume that I need to run |
That is not necessary. However, you do want to make sure netatalk is not running when you install the new version. So if you're running it through the systemd service, do |
Alright. I should have your version set up now. What do we need to do next? |
The same steps as before: Have journalctl running to see the logs (or use any other means to collect syslog) and then go through the steps to start up and try to use netatalk which leads to the error. |
Alright. I'm running it now and actively logging everything from the netatalk unit. I'm not seeing anything different in the log so far (it hasn't stopped working yet). Is that normal? |
You're saying that you're not getting the internal error with stacktrace again? How about invalid metadata EA messages? Please share your logs so that I can see what they say. |
I'll leave the exported terminal log here. |
Hmm something must have gone wrong when you installed it. This log still has the "please file a bug ticket with your upstream vendor" error message which I removed in the branch with added logging. You should not be seeing this anymore, but rather a similar message that tells us exactly which type of metadata is affected. Are you sure you stopped all running instances of netatalk before you installed? |
I executed |
Wait a minute, can you please hold off on testing for a moment. I realized an inconsistency in the code. |
I will standby until further notice. |
Ready to test now! What I spotted was that 0 entry length is checked in ad_entry_check_size() and treated as an allowed case. After that, the execution returns to ad_entry() where it checks for 0 entry length again where it's treated as a failure and triggers the assert later on. I think allowing 0 entry length is correct. So I changed the latter check to only check for 0 offset. |
Good and bad news! |
Alright, now you have run into the other related bug which is reported in #357 ! This bug is essentially the inverse: it's trying to write metadata headers but get a zero buffer size. This one I can reproduce on my end so it should only be a matter of time (hopefully). BTW I pushed a small change to silence the noisy "got_len entry present but empty" message. It's now maxdebug level so should only show up when you absolutely need it. |
Good luck! Let me know if you need anything else. Since we've now solved the issue in the title, should I close this issue or leave it open? |
Let's keep it open until the fix is merged. |
A tentative fix for the latter issue now in the same branch. Please give it a whirl when you have a chance! There may be unintended consequences from the fix, so please try moving / copying / executing various files! Basically, the nature of the fix is that I now check for the buffer in the destination object before attempting to copy EA metadata, and skip the process if the length is zero. |
I'll leave it running today and do some copying and executing whenever I would anyway, logging all of it with journalctl. So far it has spammed these messages when playing back a symlinked video file:
|
Apart from the log messages, are there any interruptions to the service at all? No crashes with stack traces? From what I've learned, netatalk handling of symlinks has a long history of bugs and edge cases. There's an option in apf.conf "follow symlinks" where you can turn this on and off. In fact, looking at this recent change 477af53 we should not attempt to read AD headers through symlinks in the first place, so the proper fix might be to add some more conditions to stop this from happening here. |
I haven't experienced any issues. I decided a decent method for stress testing was batch conversion of video files since that involves both reading and writing a lot of data, so I've been running ffmpeg on folders of files for hours and it hasn't crashed yet. While I'm not very well versed in this type of work, what you're suggesting sounds reasonable to me. |
I've pushed some major changes to the branch now. When you have a moment to spare, can you please see what the logs says now about that symlinked video and malformed AppleDouble? Also, are you able to share details on how your symlinks are set up? I tried playing around with them locally but couldn't reproduce your issue. |
Absolutely! I've included build output and journalctl output as before. As for the symlinks, I created them using a script that ChatGPT wrote for me to automate the process. I've included a copy of this script. The symlink and the original file are in separate folders on the same drive. The symlink has been renamed. That's all I really know to tell. Regrettably, I cannot test the exact same symlink as it no longer exists. However, I did try a bunch of other symlinks that were created using the same tool, from the same parent folder. I also created a symlink of that same file, using the same tool, renamed it, and played it back. As you can see, there is no error in the log. I wish I had that original symlink but regrettably it was absentmindedly overwritten as part of the converting I did to stress test earlier. I want to add that that was the only symlink that threw that error during my testing earlier. Every other one, even in the same folder, showed no errors. |
Thanks for this. Looking at the script, you basically create symlinks in a parallel directory. And then, you say you launch the symlink to play back the video? |
Basically, yes. I do this so I can rename them for Jellyfin's sake without having to make a copy and without having to modify original file names which another application need to read as they originally were. |
How are the video files encoded, and which client application do you use for playback? (in case the client is silently writing something back to the file meta data) |
I was playing the file using mpv. ffprobe on the original file returns:
However, the error no longer appears on that or any other file so far. Edit: I apologise for extra tect appearing within the code box. Github is refusing to respect my ending characters for it. |
Thanks! For some reason I just can't get macOS to follow symlinks on the afp shared volume. The symlinks are there and recognized as "alias" but macOS can't find the original file. Not sure what I'm doing wrong. Doesn't help if I enable the I'll take a break from trying to reproduce this for now, until I think of some other approach. |
That's interesting! It's worth noting that I created the symlinks on the server as opposed to on the Mac as that never worked in the past. |
Yep I'm doing the same. Just running
(BTW I'm not a Scooter fan. It was just the first result on archive.org when searching for mkv files ^^; ) |
That's fascinating! I have no idea why it wouldn't work on your end. Are you also running an Ubuntu server? Which version of macOS are you using? I'm on Ubuntu Server 22.04.3 LTS and macOS Ventura 13.4 (22F66). |
This is with a Debian Bookworm server (which should be 99% identical to Ubuntu) and macOS Ventura. Are you able to share your afp.conf just to check if there's some option that you're enabling that I aren't. |
Absolutely:
|
Thank you! If you encounter the symlink issue again, please add a comment to #270 Now for the original zero length AD entry issue that started this issue ticket, I've merged a fix through #178 so let me go ahead and close this ticket. If you encounter the "invalid metadata EA this is now being treated as a fatal error" errors again, please file a new ticket so that we can look into what metadata is acting up this time! |
I ran my server with Netatalk for over a year with no issue. Suddenly, after an update, it stopped working. At the time I didn't have much knowledge (I'm still a beginner) about running Ubuntu Server and after a reinstall of Netatalk didn't help I left it.
Today, I decided to look into it again. So far, I've been able to run
systemctl status netatalk
and saw the following in the log:netatalk.service: Can't open PID file /var/lock/netatalk (yet?) after start: Operation not permitted
This is the only thing I know how to see that looks like an error. Any ideas?
The text was updated successfully, but these errors were encountered: