Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grid logs for large logs #29390

Merged
merged 6 commits into from
Feb 6, 2023

Conversation

bbovenzi
Copy link
Contributor

@bbovenzi bbovenzi commented Feb 6, 2023

If a task instance an very large log file, the UI would have difficulty rendering it. This PR fixes this on the UI side but truncating lines in the logs to less than 1 million characters and including a warning to the user.

Screenshot 2023-02-06 at 10 00 04 AM

Example DAG:

import time
import pendulum

from airflow import DAG
from airflow.decorators import task

with DAG(
    dag_id="bad_dag",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    schedule=None,
):

    @task()
    def one():
        print("Hello DAG is starting")
        file_contents = """
        2022-08-29T07:19:55.933Z [LogLevel = Information] XXXXXX.SetGameState.SetGameStateFunction: Entering handler  
        [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
        Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
        AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
        2022-08-29T07:19:56.119Z [LogLevel = Information] XXXXXX.Serverless.Shared.Repositories.Dynamo.LowLevelDynamoClient: 
        Sending request "(attribute_not_exists(#N0)) OR (#N1 = :v0)" [{"#N0":"PK","#N1":"ActiveDeviceId"}] 
        [{":v0":{"BOOL":false,"IsBOOLSet":false,"BS":[],"L":[],"IsLSet":false,"M":{},"IsMSet":false,"NS":[],"NULL":false,
        "S":"cde4646da573124b61d843d2afe48b74","SS":[]}}]  
        [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
        Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
        AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
        2022-08-29T07:19:56.200Z [LogLevel = Information] XXXXXX.SetGameState.SetGameStateFunction: Operation complete  
        [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
        Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
        AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
        2022-08-29T07:19:56.202Z END RequestId: c4e55d4c-718e-41bc-857e-15cbdf07751b
        """ * 5000
        re_pattern = r"(\d{4}-\d{2}-\d{2}.*?)\s+?\n\s+?(\[.*?\])"
        all_records = re.findall(re_pattern, file_contents, flags=re.S)
        print(f"All records {all_records}")
        print(all_records)
        print("All Done!")
        return

    @task()
    def two():
        time.sleep(5)
        print("two")

    one() >> two()

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@bbovenzi bbovenzi added this to the Airflow 2.5.2 milestone Feb 6, 2023
@boring-cyborg boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Feb 6, 2023
@eladkal
Copy link
Contributor

eladkal commented Feb 6, 2023

mmm we had a PR that added the "Full logs" checkbox as far as I remember the PR show part (tail) of the logs to avoid loading all of it and allow the user the option to show the full log so I thought we no longer have this problem?

I find the message confusing with the box:
Screenshot 2023-02-06 at 17 46 22

If we know we can't load the log in the UI we should disable the box.

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Feb 6, 2023

Thanks Brent for this fix.

'Full Logs' impacts only how we request logs to the rest API. Most of the time, without requesting full logs, reading by chunk/tail will be enough to avoid most of UI issues due to large file. Unfortunatly if a log chunk contains a really really long line we can still have a huge chunk, Brent sample shows 1 line of log that is 5Mb of text by itself and this can cause rendering + performance issues.

I agree that the wording between these two elements are a bit confusing.

Note: Looking at the get_log implementation, log_reader, and log_file_task_handler, we use streaming capabilities, with read_log_stream but aggregate everything before returning the payload in the view. I must be missing something, but I am even wondering if this 'Full Logs' button really makes sense for us here.

@bbovenzi
Copy link
Contributor Author

bbovenzi commented Feb 6, 2023

There is a log_pos and offset metadata in read_log_stream but none of that is exposed to the REST API. We should probably fix that. But in the meantime, the UI should have some guardrails. Also, I don't think the Full Logs buttons really does anything for the UI, so I'll remove it entirely.

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Feb 6, 2023

Yep I agree, this isn't something I realised when implementing it. We can remove it for now, and add it back when we actually have support for it.

@bbovenzi bbovenzi merged commit 1ed0802 into apache:main Feb 6, 2023
@bbovenzi bbovenzi deleted the fix-grid-logs-for-large-logs branch February 6, 2023 21:50
@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Feb 27, 2023
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
* slice extra long file lines, remove full content

* add warning if logs are truncated

* Add warning for `.split()` errors

* remove extraneous useTaskLog change

* remove Show Full Content checkbox

* remove extraneous console.error

(cherry picked from commit 1ed0802)
@eladkal eladkal mentioned this pull request Aug 22, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants