Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate log processing. #643

Merged
merged 14 commits into from
Apr 17, 2018
Merged

More accurate log processing. #643

merged 14 commits into from
Apr 17, 2018

Conversation

vektory79
Copy link
Contributor

The Docker logs can come in non normalized form. The line breaks can be not at end of line, the frame's boundary can split multibyte unicode symbols in the middle and so on.

This patch address this problems and normalize logs so, that they can be processed strictly line by line.

To achive this, the extra base class BaseConsumer have been added. That class normalize the incoming in method accept() logs and forward it to method process() for it's child classes. Other *Consumer classes have been reworked to be child to the BaseConsumer class and only do own work.

Adititionally, BaseConsumer have new withRemoveAnsiCodes(boolean) method for ability to disable ANSI color codes removing (true by default).

The Docker logs can come in non normalized form. The line breaks can be not at end of line, the frame's boundary can split multibyte unicode symbols in the middle and so on.

This patch address this problems and normalize logs so, that they can be processed strictly line by line.

To achive this, the extra base class BaseConsumer have been added. That class normalize the incoming in method accept() logs and forward it to method process() for it's child classes. Other *Consumer classes have been reworked to be child to the BaseConsumer class and only do own work.

Adititionally, BaseConsumer have new withRemoveAnsiCodes(boolean) method for ability to disable ANSI color codes removing (true by default).
public abstract void process(OutputFrame outputFrame);

@Override
public final synchronized void accept(OutputFrame outputFrame) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid making it final breaks the binary compatibility because there might be some strategies based on i.e. Slf4jLogConsumer but overriding accept (I think I even have one)

import java.util.List;
import java.util.function.Consumer;

@SuppressWarnings("Duplicates")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what does it duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh... It's just a refactoring artifact. Will remove this.

public abstract class BaseConsumer<SELF extends BaseConsumer<SELF>> implements Consumer<OutputFrame> {
private StringBuilder logString = new StringBuilder();
private OutputFrame brokenFrame;
private boolean removeColorCodes = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add Lombok's @Setter to this field

Copy link
Member

@bsideup bsideup Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus @Getter

@bsideup
Copy link
Member

bsideup commented Apr 13, 2018

@vektory79 please add a line to CHANGELOG.md about your improvement.
Also, maybe you have an idea how to test it? At least with a Unit test?

CHANGELOG.md Outdated

### Changed
- Support multiple HTTP status codes for HttpWaitStrategy ([\#630](https://github.com/testcontainers/testcontainers-java/issues/630))
- Standard log consumers is are subclasses of `BaseConsumer` class. Other consumers can still only implement a `Consumer` interface, but in `BaseConsumer` class added new `removeColorCodes` property and `withRemoveAnsiCodes(boolean)` method, that can control the log preprocessor to remove/not remove ANSI color codes from log frame before send it to the `Consumer`. By default, preprocessor remove the color codes ([PR \#643](https://github.com/testcontainers/testcontainers-java/pull/643))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. it should be either "Changed" or "Fixed"
  2. no need to go deeply into the details of the change, just something short and simple (check other entries)

} else {
consumer.accept(outputFrame);
if (frame.getStreamType() == StreamType.RAW) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both methods have if (outputFrame != null) { if(utf8String != null && !utf8String.isEmpty()) { ... } }, I would move it here


/**
* Created by rnorth on 26/03/2016.
*/
public class ToStringConsumer implements Consumer<OutputFrame> {
public class ToStringConsumer extends BaseConsumer<ToStringConsumer> {
private static final byte[] ENTER_BYTES = "\n".getBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: "NEW_LINE" or "LINE_BREAK"


import java.util.function.Consumer;

public abstract class BaseConsumer<S extends BaseConsumer<S>> implements Consumer<OutputFrame> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, why S?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I named this like SELF, but Codacity told me, that type parameters should be length one character. And I have trim this name to S 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has the same semantics than the SELF used in GenericContainer, I'd prefer SELF (and ignore the warning 🙂).

}
// Logger chunks can break the string in middle of multibyte unicode character.
// Backup the bytes to reconstruct proper char sequence with bytes from next frame.
if (Character.getType(utf8String.charAt(utf8String.length() - 1)) == Character.OTHER_SYMBOL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like Character.getType(utf8String.charAt(utf8String.length() - 1)) in a variable.


private void normalizeLogLines(String utf8String, Consumer<OutputFrame> consumer) {
// Reformat strings to normalize enters.
List<String> lines = new ArrayList<>(Arrays.asList(utf8String.split("((\\r?\\n)|(\\r))")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move split regex into a constant.

@kiview
Copy link
Member

kiview commented Apr 16, 2018

What's the effect of the current implementation when faced with non-normalized logs?

@vektory79
Copy link
Contributor Author

vektory79 commented Apr 16, 2018

@kiview For example: instead of receiving this log lines:

13:17:56.189 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_All = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_InOperation = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_MyInOperation = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_MyOnCheck = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms.

I have receive:

13:17:56.189 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: 
13:17:56.189 INFO  - [DOCKER] STDOUT:  FD_ReportingSubject_All = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_InOperation = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.ut 
13:17:56.190 INFO  - [DOCKER] STDOUT: ils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_MyInOperation = ru.krista.consolidation.generators.reporting.ReportTreeCollectionForms. 
13:17:56.190 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:17:56,134 INFO  [ru.krista.common.gui.utils.FormControllersManager] (default task-32) Подключение контроллера к форме: FD_ReportingSubject_MyOnCheck = ru.krista.consolidation.generators.reporting.Repor 
13:17:56.190 INFO  - [DOCKER] STDOUT: tTreeCollectionForms.

@vektory79
Copy link
Contributor Author

One more example:

13:16:51.753 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:16:51,741 INFO  [ru.krista.jpa.inflatable.InflatablePersistenceProvider] (default task-1) Для генерации DDL используе� 
13:16:51.753 INFO  - [DOCKER] STDOUT: �ся провайдер [ru.krista.jpa.inflatable.InflatablePersistenceProvider] 

Instead of:

13:16:51.753 INFO  - [DOCKER] STDOUT:   [wildfly] 2018-04-09 13:16:51,741 INFO  [ru.krista.jpa.inflatable.InflatablePersistenceProvider] (default task-1) Для генерации DDL используется провайдер [ru.krista.jpa.inflatable.InflatablePersistenceProvider] 

@kiview
Copy link
Member

kiview commented Apr 16, 2018

Thanks for the clarification, maybe #455 could be related to this as well (I think we never analyzed, if we are missing the log lines, or if the lines have been formatted in strange ways).

Victor Verbitsky and others added 4 commits April 16, 2018 14:30
…t for it.

If StreamType is STDERR or STDOUT the log always have newline at line end. Therefore preprocessor should trim it to be consistent with RAW type processing.
Copy link
Member

@rnorth rnorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent to me - thank you for contributing this @vektory79 !

@bsideup bsideup added this to the next milestone Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants