-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More accurate log processing. #643
More accurate log processing. #643
Conversation
The Docker logs can come in non normalized form. The line breaks can be not at end of line, the frame's boundary can split multibyte unicode symbols in the middle and so on. This patch address this problems and normalize logs so, that they can be processed strictly line by line. To achive this, the extra base class BaseConsumer have been added. That class normalize the incoming in method accept() logs and forward it to method process() for it's child classes. Other *Consumer classes have been reworked to be child to the BaseConsumer class and only do own work. Adititionally, BaseConsumer have new withRemoveAnsiCodes(boolean) method for ability to disable ANSI color codes removing (true by default).
public abstract void process(OutputFrame outputFrame); | ||
|
||
@Override | ||
public final synchronized void accept(OutputFrame outputFrame) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid making it final
breaks the binary compatibility because there might be some strategies based on i.e. Slf4jLogConsumer
but overriding accept
(I think I even have one)
import java.util.List; | ||
import java.util.function.Consumer; | ||
|
||
@SuppressWarnings("Duplicates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what does it duplicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh... It's just a refactoring artifact. Will remove this.
public abstract class BaseConsumer<SELF extends BaseConsumer<SELF>> implements Consumer<OutputFrame> { | ||
private StringBuilder logString = new StringBuilder(); | ||
private OutputFrame brokenFrame; | ||
private boolean removeColorCodes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add Lombok's @Setter
to this field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plus @Getter
@vektory79 please add a line to |
It's help to not ruin the API of *Consumer classes.
…or codes now. Added record to the changelog.
CHANGELOG.md
Outdated
|
||
### Changed | ||
- Support multiple HTTP status codes for HttpWaitStrategy ([\#630](https://github.com/testcontainers/testcontainers-java/issues/630)) | ||
- Standard log consumers is are subclasses of `BaseConsumer` class. Other consumers can still only implement a `Consumer` interface, but in `BaseConsumer` class added new `removeColorCodes` property and `withRemoveAnsiCodes(boolean)` method, that can control the log preprocessor to remove/not remove ANSI color codes from log frame before send it to the `Consumer`. By default, preprocessor remove the color codes ([PR \#643](https://github.com/testcontainers/testcontainers-java/pull/643)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- it should be either "Changed" or "Fixed"
- no need to go deeply into the details of the change, just something short and simple (check other entries)
} else { | ||
consumer.accept(outputFrame); | ||
if (frame.getStreamType() == StreamType.RAW) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both methods have if (outputFrame != null) { if(utf8String != null && !utf8String.isEmpty()) { ... } }
, I would move it here
|
||
/** | ||
* Created by rnorth on 26/03/2016. | ||
*/ | ||
public class ToStringConsumer implements Consumer<OutputFrame> { | ||
public class ToStringConsumer extends BaseConsumer<ToStringConsumer> { | ||
private static final byte[] ENTER_BYTES = "\n".getBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: "NEW_LINE" or "LINE_BREAK"
|
||
import java.util.function.Consumer; | ||
|
||
public abstract class BaseConsumer<S extends BaseConsumer<S>> implements Consumer<OutputFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, why S
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I named this like SELF
, but Codacity told me, that type parameters should be length one character. And I have trim this name to S
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this has the same semantics than the SELF
used in GenericContainer
, I'd prefer SELF
(and ignore the warning 🙂).
} | ||
// Logger chunks can break the string in middle of multibyte unicode character. | ||
// Backup the bytes to reconstruct proper char sequence with bytes from next frame. | ||
if (Character.getType(utf8String.charAt(utf8String.length() - 1)) == Character.OTHER_SYMBOL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like Character.getType(utf8String.charAt(utf8String.length() - 1))
in a variable.
|
||
private void normalizeLogLines(String utf8String, Consumer<OutputFrame> consumer) { | ||
// Reformat strings to normalize enters. | ||
List<String> lines = new ArrayList<>(Arrays.asList(utf8String.split("((\\r?\\n)|(\\r))"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move split regex into a constant.
What's the effect of the current implementation when faced with non-normalized logs? |
@kiview For example: instead of receiving this log lines:
I have receive:
|
One more example:
Instead of:
|
Thanks for the clarification, maybe #455 could be related to this as well (I think we never analyzed, if we are missing the log lines, or if the lines have been formatted in strange ways). |
…t for it. If StreamType is STDERR or STDOUT the log always have newline at line end. Therefore preprocessor should trim it to be consistent with RAW type processing.
…ng directly in Slf4jLogConsumer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent to me - thank you for contributing this @vektory79 !
The Docker logs can come in non normalized form. The line breaks can be not at end of line, the frame's boundary can split multibyte unicode symbols in the middle and so on.
This patch address this problems and normalize logs so, that they can be processed strictly line by line.
To achive this, the extra base class BaseConsumer have been added. That class normalize the incoming in method accept() logs and forward it to method process() for it's child classes. Other *Consumer classes have been reworked to be child to the BaseConsumer class and only do own work.
Adititionally, BaseConsumer have new withRemoveAnsiCodes(boolean) method for ability to disable ANSI color codes removing (true by default).