Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch csv and tsv method 'sv' from ReadAll() to stream each record with Read() #355

Merged

Conversation

splashing-atom
Copy link
Contributor

@splashing-atom splashing-atom commented Dec 16, 2022

This should improve memory usage when parsing large files and checking if they match CSV/TSV.

fixes #354

viprerk and others added 18 commits December 17, 2022 11:15
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3.1.0 to 3.1.1.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](codecov/codecov-action@v3.1.0...v3.1.1)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.3.0 to 3.5.0.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@v3.3.0...v3.5.0)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.2 to 3.3.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3.0.2...v3.3.0)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.22 to 2.2.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@v2.1.22...v2.2.4)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…e#360)

Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3.2.0 to 3.4.0.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](golangci/golangci-lint-action@v3.2.0...v3.4.0)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.4 to 2.2.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@v2.2.4...v2.2.5)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.7.0 to 0.8.0.
- [Release notes](https://github.com/golang/net/releases)
- [Commits](golang/net@v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore: ioutil deprecated and add ignore file

* fix: delete ignore file

Signed-off-by: guoguangwu <[email protected]>

---------

Signed-off-by: guoguangwu <[email protected]>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.8.0 to 0.10.0.
- [Commits](golang/net@v0.8.0...v0.10.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Revert ioutil deprecation commit

While go 1.16 is 2 years old, I feel like I should not go out of my way
to break the library for users on go versions lower that 1.16. This
commit reverts part of the changes from gabriel-vasile#392.

* fix some doc comments for go v1.19 rules

* Fix wrong total number of supported mimes

The mistake was introduced in a merge commit.
* Remove old travis build status link from readme

* Remove unneeded go-version tag in CI
It is not supported inside golangci-lint-action and triggers a warning
One group is for gomod and one is for github actions PRs
Bumps the gomod group with 1 update: [golang.org/x/net](https://github.com/golang/net).

- [Commits](golang/net@v0.10.0...v0.12.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: gomod
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps the github-actions group with 5 updates:

| Package | Update |
| --- | --- |
| [actions/checkout](https://github.com/actions/checkout) | 3.3.0 to 3.5.3 |
| [github/codeql-action](https://github.com/github/codeql-action) | 2.2.5 to 2.20.4 |
| [actions/setup-go](https://github.com/actions/setup-go) | 3.5.0 to 4.0.1 |
| [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) | 3.4.0 to 3.6.0 |
| [codecov/codecov-action](https://github.com/codecov/codecov-action) | 3.1.1 to 3.1.4 |


Updates `actions/checkout` from 3.3.0 to 3.5.3
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3.3.0...v3.5.3)

Updates `github/codeql-action` from 2.2.5 to 2.20.4
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@v2.2.5...v2.20.4)

Updates `actions/setup-go` from 3.5.0 to 4.0.1
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@v3.5.0...v4.0.1)

Updates `golangci/golangci-lint-action` from 3.4.0 to 3.6.0
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](golangci/golangci-lint-action@v3.4.0...v3.6.0)

Updates `codecov/codecov-action` from 3.1.1 to 3.1.4
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v3.1.1...v3.1.4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <[email protected]>
These files were introduced when changing csv detection from allocating
all results at once to streaming each line. This change improves
performance but the functionality remains exactly the same. Since the
functionality is unchanged, existing test cases should suffice.
Copy link
Owner

@gabriel-vasile gabriel-vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @splashing-atom and sorry for the late response.
Considering your change improves performance without affecting the functionality, I will remove the additional test cases.

@gabriel-vasile gabriel-vasile merged commit 9df6903 into gabriel-vasile:master Oct 10, 2023
7 checks passed
lines++
}

return r.FieldsPerRecord > 1 && lines > 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this only checks for "lines > 1": The loop above could just break out, and there is no point at all to exhaust the reader (which could still be millions of lines?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadAll() method would return error if any line in the file was not valid CSV. For this reason the loop calls Read() on each line and returns false overall if any line is not valid CSV.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking out of the loop when lines > 1 means input like this would wrongfully pass as csv:

val1, val2, val3
val4, val5, val6

some text that is not related to the csv above

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking and replying!

kodiakhq bot referenced this pull request in cloudquery/plugin-pb-go Nov 1, 2023
…#149)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/gabriel-vasile/mimetype](https://togithub.com/gabriel-vasile/mimetype) | indirect | patch | `v1.4.2` -> `v1.4.3` |

---

### Release Notes

<details>
<summary>gabriel-vasile/mimetype (github.com/gabriel-vasile/mimetype)</summary>

### [`v1.4.3`](https://togithub.com/gabriel-vasile/mimetype/releases/tag/v1.4.3)

[Compare Source](https://togithub.com/gabriel-vasile/mimetype/compare/v1.4.2...v1.4.3)

#### What's Changed

-   Switch csv and tsv method 'sv' from ReadAll() to stream each record with Read() by [@&#8203;splashing-atom](https://togithub.com/splashing-atom) in [https://github.com/gabriel-vasile/mimetype/pull/355](https://togithub.com/gabriel-vasile/mimetype/pull/355)
-   Bump golang.org/x/net from 0.8.0 to 0.17.0 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/gabriel-vasile/mimetype/pull/441](https://togithub.com/gabriel-vasile/mimetype/pull/441)
-   enable reusing records in csv/tsv detection by [@&#8203;gabriel-vasile](https://togithub.com/gabriel-vasile) in [https://github.com/gabriel-vasile/mimetype/pull/443](https://togithub.com/gabriel-vasile/mimetype/pull/443)

#### New Contributors

-   [@&#8203;splashing-atom](https://togithub.com/splashing-atom) made their first contribution in [https://github.com/gabriel-vasile/mimetype/pull/355](https://togithub.com/gabriel-vasile/mimetype/pull/355)

**Full Changelog**: gabriel-vasile/mimetype@v1.4.2...v1.4.3

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4xMDkuNCIsInVwZGF0ZWRJblZlciI6IjM2LjEwOS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce memory consumption of CSV/TSV detectors
5 participants