feat: extend component run info #8436

tstadel · 2024-10-02T15:57:49Z

Related Issues

fixes logging component run info via logging extra in addition to tracing + adds additional information (e.g. input and output lengths).

Background:
Currently there is little to no information about a component's run invocation in our logs. We get a log Running component writer without any further information. Additional information is only available via tracing. Traces are not always available or suitable in certain situations as for example user-facing logs.

Proposed Changes:

emit component run info via logging extra in addition to tracing
add input lengths
add output types
add output lengths

How did you test it?

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

silvanocerza · 2024-10-02T16:12:52Z

Are inputs and outputs lengths really necessary? Which use case would stop people from calculating it themselves? 🤔

tstadel · 2024-10-02T17:07:31Z

Are inputs and outputs lengths really necessary? Which use case would stop people from calculating it themselves? 🤔

@silvanocerza Not sure what you mean with "calculating it themselves" exactly.
Input and output lengths can vary heavily througout the pipeline and additonally can depend heavily on the actual input.
For example take the following pipeline in which you

start with one file (e.g. a pdf) -> 1 path
split the pdf into 1 image per page -> n_pages images
run OCR using an llm per page -> n_pages documents
build strides of 3 pages -> n_pages-2 strides
extract some information for each stride using llm -> n_pages-2 information
join all information -> n_information
write the information to document_store

To actually monitor progress and to decide whether an invocation takes too long or is within bounds you need to know input and output lengths of the individual components. OCR might be quick for 5 pages, but take an hour for 1k pages.

coveralls · 2024-10-04T09:54:39Z

Pull Request Test Coverage Report for Build 11178179903

Details

0 of 0 changed or added relevant lines in 0 files are covered.
17 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.01%) to 90.312%

Files with Coverage Reduction	New Missed Lines	%
core/pipeline/pipeline.py	17	81.63%

Totals
Change from base Build 11144701295:	0.01%
Covered Lines:	7467
Relevant Lines:	8268

💛 - Coveralls

shadeMe

I'm afraid this change falls outside the purview of the the Pipeline class. The current tracing hooks expose the inputs of each component run invocation, and that's the extent of the class' involvement - It's up to the consumer of the tracing instrumentation to house the logic that reasons about those inputs.

We can additionally log the parameters that we currently passing to the tracing instrumentation (and add the output types to that), but logic to infer lengths doesn't belong here.

tstadel · 2024-10-07T12:18:12Z

I'm afraid this change falls outside the purview of the the Pipeline class. The current tracing hooks expose the inputs of each component run invocation, and that's the extent of the class' involvement - It's up to the consumer of the tracing instrumentation to house the logic that reasons about those inputs.

We can additionally log the parameters that we currently passing to the tracing instrumentation (and add the output types to that), but logic to infer lengths doesn't belong here.

@shadeMe So how would I log the number of input and output items (which I believe is an important number for debugging/monitoring pipelines)?

shadeMe · 2024-10-08T09:28:00Z

@shadeMe So how would I log the number of input and output items (which I believe is an important number for debugging/monitoring pipelines)?

Not sure what your constraints are, but perhaps by creating a custom tracer that inspects the inputs and output and writes their characteristics to to the logs? The haystack.tracing module should have the required scaffolding.

On second thought, such a tracer would be a useful tool for Haystack users too.

tstadel · 2024-10-09T12:19:10Z

@shadeMe So how would I log the number of input and output items (which I believe is an important number for debugging/monitoring pipelines)?

Not sure what your constraints are, but perhaps by creating a custom tracer that inspects the inputs and output and writes their characteristics to to the logs? The haystack.tracing module should have the required scaffolding.

On second thought, such a tracer would be a useful tool for Haystack users too.

@shadeMe If the LoggingTracer is the way to go, do you think adding the extra to the existing log lines in this PR makes still sense?

shadeMe · 2024-10-16T10:28:19Z

@shadeMe If the LoggingTracer is the way to go, do you think adding the extra to the existing log lines in this PR makes still sense?

I don't see an issue with logging the input and output types, if only for completeness' sake.

silvanocerza · 2024-10-24T14:52:55Z

LoggingTracer introduced with #8447 should solve the problem this PR is trying to solve. Closing.

feat: extend component run info

e3680d0

tstadel requested a review from a team as a code owner October 2, 2024 15:57

tstadel requested review from julian-risch and removed request for a team October 2, 2024 15:57

github-actions bot added topic:core type:documentation Improvements on the docs labels Oct 2, 2024

silvanocerza requested review from silvanocerza and shadeMe and removed request for julian-risch October 2, 2024 16:14

fix tests

2dd43bf

github-actions bot added the topic:tests label Oct 4, 2024

tstadel and others added 4 commits October 4, 2024 12:01

add tests for lengths

05a2f59

extend logging test

0526ae6

Merge branch 'main' into feat/extend_component_run_info

5c3e11e

fix merge

7fdc3ec

shadeMe requested changes Oct 4, 2024

View reviewed changes

anakin87 mentioned this pull request Oct 9, 2024

feat: Logging Tracer #8447

Merged

silvanocerza closed this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extend component run info #8436

feat: extend component run info #8436

tstadel commented Oct 2, 2024 •

edited

Loading

silvanocerza commented Oct 2, 2024

tstadel commented Oct 2, 2024

coveralls commented Oct 4, 2024 •

edited

Loading

shadeMe left a comment •

edited

Loading

tstadel commented Oct 7, 2024

shadeMe commented Oct 8, 2024 •

edited

Loading

tstadel commented Oct 9, 2024 •

edited

Loading

shadeMe commented Oct 16, 2024

silvanocerza commented Oct 24, 2024

feat: extend component run info #8436

feat: extend component run info #8436

Conversation

tstadel commented Oct 2, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

silvanocerza commented Oct 2, 2024

tstadel commented Oct 2, 2024

coveralls commented Oct 4, 2024 • edited Loading

Pull Request Test Coverage Report for Build 11178179903

Details

💛 - Coveralls

shadeMe left a comment • edited Loading

Choose a reason for hiding this comment

tstadel commented Oct 7, 2024

shadeMe commented Oct 8, 2024 • edited Loading

tstadel commented Oct 9, 2024 • edited Loading

shadeMe commented Oct 16, 2024

silvanocerza commented Oct 24, 2024

tstadel commented Oct 2, 2024 •

edited

Loading

coveralls commented Oct 4, 2024 •

edited

Loading

shadeMe left a comment •

edited

Loading

shadeMe commented Oct 8, 2024 •

edited

Loading

tstadel commented Oct 9, 2024 •

edited

Loading