[core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] #28188

rickyyx · 2022-08-30T22:57:44Z

Why are these changes needed?

With verbose logging, the log file size might grow significantly. This PR prevents the grpc buffer overflow when tailing with large number of lines specified:

Instead of reading last X lines into memory, it looks for the start of the last X lines, and read afterwards.
Always stream log data in chunks

Related issue number

Closes #27009

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests

Signed-off-by: rickyyx <[email protected]>

…g-tail-fix

Signed-off-by: rickyyx <[email protected]>

rickyyx · 2022-09-16T20:28:52Z

@rkooo567 🏓

rickyyx · 2022-09-16T20:31:09Z

TODO: rerun the release test after review

rkooo567 · 2022-09-26T14:22:31Z

Let me review this asap

Signed-off-by: rickyyx <[email protected]>

rickyyx · 2023-02-25T01:38:54Z

release/nightly_tests/stress_tests/test_state_api_scale.py

@@ -276,7 +276,7 @@ def write_log(self, log_file_size_byte: int):

    time_taken = 0
    t_start = time.perf_counter()
-    for s in get_log(actor_id=actor._actor_id.hex(), tail=-1):
+    for s in get_log(actor_id=actor._actor_id.hex(), tail=1000000000):


I believe this was the original failure case

rickyyx · 2023-02-25T01:41:31Z

It was a bit over-engineering for the original tailing usecases so I was planning to maybe simplify it. But with the recent requirement of streaming logs from offsets (begin -> end) for task logs, the added complexity is actually needed so I decided to just revive this PR.

cc @rkooo567

rkooo567

Should I assume the most of logics are changed when reviewing the PR? Looks like there are lots of removals.

…g-tail-fix

Signed-off-by: rickyyx <[email protected]>

rickyyx · 2023-03-21T17:18:46Z

Should I assume that most of logic are changed when reviewing the PR? Looks like there are lots of removals.

Yes, the end behavior should be the same.
The change is - how we are finding out the last X lines to tail

Before we are reading chunk by chunk, collecting them into a single string in memory, and split by new lines to find out the last X lines, and send them in a single gRPC.
Now we are still reading chunk by chunk, but only keeping track of the offsets (similar to how linux's tail is implemented) rather than keeping the content of all chunks in memory. We will then find out the start and end offsets in the file to stream, and stream the file in the streaming grpc API.

I added quite some tests hoping to make sure the offsets bookkeeping is correct.

rkooo567 · 2023-03-21T23:52:03Z

Btw, is this API sufficient to implement the following logic?

Print the last 1000 lines
Print the previous 1000 lines
Repeat until the beg of the file

can you tell me how we can achieve this one?

rickyyx · 2023-03-22T00:11:25Z

Btw, is this API sufficient to implement the following logic?

Print the last 1000 lines

Print the previous 1000 lines

Repeat until the beg of the file

can you tell me how we can achieve this one?

So for pagination with lines. We will probably need some code (another PR on top of this) to translate line count to specific offsets. I think the frontend needs to keep track of lines count, an example API I could think of will be:

StreamLog (REST equivalent): start_lines=-0, end_lines=-1000
StreamLog: start_lines=-1000, end_lines=-2000
...

This will probably require finding out the corresponding offsets to start_lines each time. Alternatively we could return the offsets together with file content:

PrepStreamLog: start_offset=-0, lines=1000 -> end_offset
StreamLog: start_offset=-0, end_offset=<end_offset>

rkooo567 · 2023-03-22T00:16:14Z

Hmm maybe -1000 approach could have duplicated logs depending on the situation? Maybe we can make the API return the absolute offset and we can do

start = prev_absolute_offset - 1000, end = absolute

But it makes sense we can build on top of this as a follow up!

rkooo567 · 2023-03-23T06:59:08Z

dashboard/modules/log/log_agent.py

-def tail(f: io.TextIOBase, lines: int):
-    """Tails the given file (in 'rb' mode)
+                # Default stream entire file
+                start_offset = 0


This should be configurable if we want to make it work with task log right?

yeah, will add fields when supporting it.

dashboard/modules/log/log_agent.py

rkooo567 · 2023-03-23T07:01:25Z

dashboard/modules/log/log_agent.py

+    Return:
+        Async generator of StreamReply
+    """
+    assert "b" in file.mode, "Only binary file is supported."


is it correct? what are other file formats? (is regular log also binary file?)

we opened it in binary format for offset. Added comments.

rkooo567 · 2023-03-23T07:03:35Z

dashboard/modules/log/log_agent.py

+    cur_offset = start_offset
+
+    # Until gRPC is done
+    while not context.done():


sorry I forgot the semantics lol... When is the context set to be "done"?

i guess it's just that the streaming context not closed.

dashboard/modules/log/log_agent.py

…g-tail-fix

Signed-off-by: rickyyx <[email protected]>

rickyyx · 2023-04-26T00:19:58Z

Refactored find_start_offset_last_n_lines_from_offset simpler.
Added find_end_offset_next_n_lines_from_offset to support for pagination in the future.
Added better testing

rkooo567

Generally LGTM! I will look into it one more time tomorrow

rkooo567 · 2023-04-26T13:39:55Z

dashboard/modules/log/log_agent.py

+    return end
+
+
+def find_end_offset_next_n_lines_from_offset(


is it test only?

not used for now - but I suppose it will be used to support pagination

rkooo567 · 2023-04-26T13:41:10Z

dashboard/modules/log/log_agent.py

+    old_pos = file.tell()  # store old position
+    file.seek(0, io.SEEK_END)  # move file pointer to end of file
+    end = file.tell()  # return end of file offset
+    file.seek(old_pos, io.SEEK_SET)


consider making it a context manager?

to guratantee the API semantic

Hmm I thought we could make it seek back if there's any exception occurred. But I think it should be fine for this case because if this will raise an exception the whole operation will just fail.

dashboard/modules/log/log_agent.py

rkooo567 · 2023-04-26T13:52:11Z

dashboard/modules/log/log_agent.py

+            # in the block.
+            # Use `split` here to split away the last
+            lines = block_data.split(b"\n", num_lines - lines_more)
+            # len(lines[0]) + 1 for the new line character split


Hmm not sure if I understood this comment. Also, isn't this supposed to be len(sum(len(line) for line in lines))?

Oh, since it's tailing, so only the last lines in the block need to be accounted for from nbytes_from_end.

line 1 \n line 2 \n line 3 \n line 4\n

If lines_more = 1, only line 4 needs to be included in the tail, so we split them with num_lines - lines_more = 3 splits, producing 4 parts:

line 1

line 2

line 3

line 4

And we will include line 4's len in the tail results.

Signed-off-by: Ricky Xu <[email protected]>

…ort tailing from offsets [2/4] (ray-project#28188) With verbose logging, the log file size might grow significantly. This PR prevents the grpc buffer overflow when tailing with large number of lines specified: Instead of reading last X lines into memory, it looks for the start of the last X lines, and read afterwards. Always stream log data in chunks

rickyyx added 17 commits July 27, 2022 01:54

Customized options for grpc

18f3910

Signed-off-by: rickyyx <[email protected]>

Merge branch 'master' of github.com:ray-project/ray

7b33988

Merge branch 'master' of github.com:ray-project/ray

0c47576

Merge branch 'master' of github.com:ray-project/ray

8724518

move test_state_api back to large test

e6fc625

Signed-off-by: rickyyx <[email protected]>

Merge branch 'master' of github.com:ray-project/ray

e89b93f

Merge branch 'master' of github.com:ray-project/ray

1d3a8c1

Merge branch 'master' of github.com:ray-project/ray

6cd5289

Merge branch 'master' of github.com:ray-project/ray

526dae0

Merge branch 'master' of github.com:ray-project/ray

442b6b5

Merge branch 'master' of github.com:ray-project/ray

9117d17

Merge branch 'master' of github.com:ray-project/ray

1582777

Merge branch 'master' of github.com:ray-project/ray

d1ef033

wip

cef3b29

Signed-off-by: rickyyx <[email protected]>

finished

5c076b3

Signed-off-by: rickyyx <[email protected]>

Merge branch 'master' of github.com:ray-project/ray into ricky/obs-lo…

1c5f747

…g-tail-fix

lint

cca83dd

Signed-off-by: rickyyx <[email protected]>

rickyyx added this to the Ray State Observability milestone Aug 30, 2022

rickyyx assigned rkooo567 Aug 30, 2022

rickyyx added 2 commits February 25, 2023 01:37

merged

1ace89f

Signed-off-by: rickyyx <[email protected]>

line

aeb3ccd

Signed-off-by: rickyyx <[email protected]>

rickyyx commented Feb 25, 2023

View reviewed changes

rickyyx mentioned this pull request Mar 20, 2023

[core][state] Task log - Add task log support to state API logging [4/4] #33479

Closed

8 tasks

rickyyx changed the title ~~[Core][State Observability] Improve log tailing from log_client~~ [Core][State Observability] Improve log tailing from log_client and support tailing from offsets [2/4] Mar 20, 2023

rickyyx changed the title ~~[Core][State Observability] Improve log tailing from log_client and support tailing from offsets [2/4]~~ [Core][State Observability] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] Mar 20, 2023

rickyyx changed the title ~~[Core][State Observability] Task log - Improve log tailing from log_client and support tailing from offsets [2/4]~~ [core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] Mar 20, 2023

rkooo567 reviewed Mar 21, 2023

View reviewed changes

rickyyx added 2 commits March 21, 2023 16:56

Merge branch 'master' of github.com:ray-project/ray into ricky/obs-lo…

390bdc5

…g-tail-fix

merge

3e0f6f9

Signed-off-by: rickyyx <[email protected]>

rkooo567 reviewed Mar 23, 2023

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Mar 23, 2023

rickyyx added 3 commits April 25, 2023 20:39

Merge branch 'master' of github.com:ray-project/ray into ricky/obs-lo…

486fb71

…g-tail-fix

hi

88da29b

Signed-off-by: rickyyx <[email protected]>

fix

679b169

Signed-off-by: rickyyx <[email protected]>

rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 26, 2023

rkooo567 reviewed Apr 26, 2023

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 26, 2023

rickyyx added 2 commits April 26, 2023 14:41

comments

c63461e

Signed-off-by: Ricky Xu <[email protected]>

fix

93ba5c0

Signed-off-by: Ricky Xu <[email protected]>

rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 3, 2023

rkooo567 approved these changes May 5, 2023

View reviewed changes

rkooo567 merged commit 0a15649 into ray-project:master May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] #28188

[core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] #28188

rickyyx commented Aug 30, 2022 •

edited

Loading

rickyyx commented Sep 16, 2022

rickyyx commented Sep 16, 2022

rkooo567 commented Sep 26, 2022

rickyyx Feb 25, 2023

rickyyx commented Feb 25, 2023

rkooo567 left a comment

rickyyx commented Mar 21, 2023 •

edited

Loading

rkooo567 commented Mar 21, 2023

rickyyx commented Mar 22, 2023 •

edited

Loading

rkooo567 commented Mar 22, 2023

rkooo567 Mar 23, 2023

rickyyx Apr 26, 2023

rkooo567 Mar 23, 2023

rickyyx Apr 26, 2023

rkooo567 Mar 23, 2023

rickyyx Apr 26, 2023

rickyyx commented Apr 26, 2023

rkooo567 left a comment

rkooo567 Apr 26, 2023

rickyyx Apr 26, 2023

rkooo567 Apr 26, 2023

rkooo567 Apr 26, 2023

rickyyx Apr 26, 2023

rkooo567 May 5, 2023

rkooo567 Apr 26, 2023

rickyyx Apr 26, 2023

[core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] #28188

[core][state] Task log - Improve log tailing from log_client and support tailing from offsets [2/4] #28188

Conversation

rickyyx commented Aug 30, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

rickyyx commented Sep 16, 2022

rickyyx commented Sep 16, 2022

rkooo567 commented Sep 26, 2022

Choose a reason for hiding this comment

rickyyx commented Feb 25, 2023

rkooo567 left a comment

Choose a reason for hiding this comment

rickyyx commented Mar 21, 2023 • edited Loading

rkooo567 commented Mar 21, 2023

rickyyx commented Mar 22, 2023 • edited Loading

rkooo567 commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rickyyx commented Apr 26, 2023

rkooo567 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rickyyx commented Aug 30, 2022 •

edited

Loading

rickyyx commented Mar 21, 2023 •

edited

Loading

rickyyx commented Mar 22, 2023 •

edited

Loading