Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TR should not rename/recreate log files on rollover #6533

Closed
rawlinp opened this issue Feb 1, 2022 · 9 comments · Fixed by #7234
Closed

TR should not rename/recreate log files on rollover #6533

rawlinp opened this issue Feb 1, 2022 · 9 comments · Fixed by #7234
Assignees
Labels
logging related to logging infrastructure regression bug a bug in existing functionality introduced by a new version Traffic Router related to Traffic Router

Comments

@rawlinp
Copy link
Contributor

rawlinp commented Feb 1, 2022

This Improvement request (usability, performance, tech debt, etc.) affects these Traffic Control components:

  • Traffic Router

Current behavior:

The current log4j2 config causes TR to rename access.log to access.log.2022-01-30-30 (for example) then recreate the access.log file. This can cause data loss problems for log forwarding tools that expect to tail the same log file without having it change out from underneath them.

New behavior:

TR's log4j2.xml config should use a rollover strategy that doesn't cause the log file to be renamed then recreated. I think this can be accomplished with a DirectWriteRolloverStrategy which is documented here. However, in order to limit the number of rolled-over log files on disk, we may need to combine that with a custom delete action which is documented here.

@rawlinp rawlinp added Traffic Router related to Traffic Router improvement The functionality exists but it could be improved in some way. labels Feb 1, 2022
@zrhoffman zrhoffman added the logging related to logging infrastructure label Sep 15, 2022
@zrhoffman zrhoffman added regression bug a bug in existing functionality introduced by a new version and removed improvement The functionality exists but it could be improved in some way. labels Sep 22, 2022
@villajo
Copy link
Contributor

villajo commented Oct 29, 2022

Can this be assigned to me?

@zrhoffman
Copy link
Member

Assigned!

@villajo
Copy link
Contributor

villajo commented Nov 11, 2022

So, after reviewing the issue, the code and then reviewing the default configuration used on Traffic Router. I feel that a different approach would be better.

The problem with the directwriterolloverstrategy and the delete function is that the access.log is renamed/moved instead of copied. This changes the inode on access.log and other log forwarding applications will retain the lock to the file with the original inode.. Which is renamed. This is why we're seeing this problem.

After reviewing the log4j2 rotation strategies there is nothing available in log4j2 that does what we need, which would be a copy to new file retaining the inode, then truncate access.log. Now, I could create a plugin, but that leads me back to a philosophy idea that the goal of Traffic Router in this situation should be just getting the logs to a location and from there, it should be the job of the operating system or admin of said system to figure out how the logs are sliced up and rotated using an outside application. In this case, I feel logrotate does this reliably.

I simplified the configuration of our log4j2.xml to only use one file for both the traffic_router.log and access.log. Then I used logrotate with the configuration to copytruncate daily.

I'm still waiting for word from Zach whether this is acceptable or not, but I feel this is probably cleanest way to fix this issue. Thoughts?

@rawlinp
Copy link
Contributor Author

rawlinp commented Nov 11, 2022

That sounds reasonable to me, but the trade-off is that the log storage requirements will be increased since logrotate is only run 1x/day whereas log4j can rollover whenever a threshold is reached. We just need to be aware of that when changing this.

@villajo
Copy link
Contributor

villajo commented Nov 11, 2022

We can change the size requirements and when we want it to rotate as well. By adding logrotate to a cron outside of cron.daily, we can add it to anywhere we want to search for a condition we'd like. EG: traffic_router.log being over 100m. I really just created what I had to make sure it fit the requirements enough to the point to we could talk about refinement.

@zrhoffman
Copy link
Member

I feel logrotate does this reliably.

Using logrotate for TR logs sounds good to me, too

I simplified the configuration of our log4j2.xml to only use one file for both the traffic_router.log and access.log.

What is the advantage to using only a single log file for both traffic_router.log and access.log?

@villajo
Copy link
Contributor

villajo commented Nov 14, 2022

So, they would both have separate log files still. That portion would still be controlled by log4j.

It's the log rotation that would be removed from the log4j2.xml and then handled by logrotate.

I took one of our production servers and tested the configuration. This is how the log folder would look with daily rotations. The 11th is small because I did this later on in the day.

drwxr-xr-x 4 root root 4.0K Nov 14 04:02 .
drwxr-xr-x 4 root root 4.0K Nov 11 18:22 ..
-rw-r----- 1 root root 231M Nov 14 19:12 access.log
-rw-r----- 1 root root 13K Nov 11 16:22 access.log-11112022.gz
-rw-r----- 1 root root 9.1M Nov 12 04:01 access.log-12112022.gz
-rw-r----- 1 root root 20M Nov 13 04:01 access.log-13112022.gz
-rw-r----- 1 root root 19M Nov 14 04:01 access.log-14112022.gz
-rw-r----- 1 root root 3.7M Nov 14 19:12 traffic_router.log
-rw-r----- 1 root root 2.0K Nov 11 16:22 traffic_router.log-11112022.gz
-rw-r----- 1 root root 73K Nov 12 04:01 traffic_router.log-12112022.gz
-rw-r----- 1 root root 143K Nov 13 04:02 traffic_router.log-13112022.gz
-rw-r----- 1 root root 145K Nov 14 04:01 traffic_router.log-14112022.gz

The advantage would be that we would keep the same inode for the log file allowing it to be copied instead of moved then truncated. The log forwarding application would be able to continue reading logs and we won't have issues related to open rotated files that we have been seeing on other nodes. I have looked and compared two of our nodes in one location and have verified this takes care of all of the extra open files we were seeing open via lsof after a rotation occured.

@zrhoffman
Copy link
Member

That all sounds great

@srijeet0406
Copy link
Contributor

Excellent, thanks @villajo !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logging related to logging infrastructure regression bug a bug in existing functionality introduced by a new version Traffic Router related to Traffic Router
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants