Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep field name for csv timestamp column (don't implicitly convert) #8440

Merged
merged 1 commit into from
Nov 20, 2020

Conversation

stephanie-engel
Copy link
Contributor

@stephanie-engel stephanie-engel commented Nov 19, 2020

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Root Cause Analysis:
Within the csv parser, the field name for the csv_timestamp_column is, initially, converted from a string into an int. This causes the field name (for example: 202011131605) to be interpreted as a Unix format (instead of the Go reference time 200601021504).

The Fix:

  • Prevent implicit type conversion on the timestamp column, so that parseTimestamp can correctly parse the time as unix or UTC
  • Added a unit test to verify TimestampFormat: "200601021504"

closes #7288

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @stephanie-engel, nice work! However, I think we also have to handle the cases where the column-type is given explicitly. This is the section above your change. How about handling both cases by adding

		if fieldName == p.TimestampColumn {
				recordFields[fieldName] = value
                                continue
                }

directly before line 208

			if len(p.ColumnTypes) > 0 {

This way you can leave the lines 241--250 unchanged.

@srebhan srebhan self-assigned this Nov 20, 2020
@stephanie-engel
Copy link
Contributor Author

Hey @stephanie-engel, nice work! However, I think we also have to handle the cases where the column-type is given explicitly. This is the section above your change. How about handling both cases by adding

		if fieldName == p.TimestampColumn {
				recordFields[fieldName] = value
                                continue
                }

directly before line 208

			if len(p.ColumnTypes) > 0 {

This way you can leave the lines 241--250 unchanged.

Thanks for the great feedback, @srebhan ! I just went ahead and made the changes you requested. I confirmed that the timestamp is still parsed correctly and the unit tests still pass 😄

Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, congratulations. 🥇

@stephanie-engel stephanie-engel merged commit 247230c into master Nov 20, 2020
@stephanie-engel stephanie-engel deleted the se-csv-7288 branch November 20, 2020 15:52
ssoroka pushed a commit that referenced this pull request Dec 1, 2020
@Hipska Hipska added the area/csv csv parser/serialiser related label Feb 15, 2022
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csv csv parser/serialiser related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timestamp format parsing is confusing yyyyMMddHHmm for a unix timestamp
4 participants