Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typos #1025

Closed
musvaage opened this issue Sep 14, 2022 · 4 comments
Closed

typos #1025

musvaage opened this issue Sep 14, 2022 · 4 comments

Comments

@musvaage
Copy link
Contributor

$ grep -nr Homogenity CSV.jl
CSV.jl/test/testfiles/norwegian_data.csv:1084:177;4;0;1001;1;Sneisvatn;Aktiv;1916-09-20 12:00:00.0000000;2018-08-11 12:00:00.0000000;0;NULL;NULL;NULL;ja;NULL;open;68,40743256;15,70544243;528971;7588456;0;0;0;0;2047;29,25;29,25;CCR. Homogenity break 1972/73.;NULL;2011-12-09 00:00:00.0000000;NULL;21,90999985;NULL;18,60000038;18;88;138;191;235;302;387;457;556;624;970;7,28000021;9,680000305;NULL;NULL;0;1,710000038;NULL;2,269999981;19,96999931;0;10,05000019;54,5;0;NULL;NULL;NULL;NULL;NULL;NULL;NULL;Y;Y;Y;Y;Y;NULL;NULL;Y;NULL;Y;Y;Y;Y;NULL;Y;Y;NULL;NULL
$

Incidentally, norwegian_data.csv is not rendered correctly in a browser.

The file utility returns that it is UTF-8 text.

Dowloaded and opened in an editor set to UTF-8 encoding likewise does not render norwegian_data.csv correctly.

$ grep -nr "concatenante " CSV.jl
CSV.jl/docs/src/examples.md:32:# of the inputs together and vertically concatenante them into one "long" table.
$ grep -nr concatenanted CSV.jl
CSV.jl/docs/src/index.md:25:  * [`CSV.read`](@ref): a convenience function identical to `CSV.File`, but used when a `CSV.File` will be passed direclty to a sink function, like a `DataFrame`. In some cases, sinks may make copies of incoming data for their own safety; by calling `CSV.read(file, DataFrame)`, no copies of the parsed `CSV.File` will be made, and the `DataFrame` will take direct ownership of the `CSV.File`'s columns, which is more efficient than doing `CSV.File(file) |> DataFrame` which will result in an extra copy of each column being made. Keyword arguments are identical to `CSV.File`. Any valid Tables.jl sink function/table type can be passed as the 2nd argument. Like `CSV.File`, a vector of data inputs can be passed as the 1st argument, which will result in a single "long" table of all the inputs vertically concatenanted. Each input must have identical schemas (column names and types).
$ grep -nr delimted CSV.jl
CSV.jl/src/README.md:21:By providing the `pool` keyword argument, users can control how this optimization will be applied to individual columns, or to all columns of the delimted text being read.
$ grep -nr describign CSV.jl
CSV.jl/docs/src/examples.md:319:# row describign second row of data
$ grep -nr determing CSV.jl
CSV.jl/src/rows.jl:8:# use the same inference logic used in `CSV.File` for determing a cell's typed value
$ grep -nr direclty CSV.jl
CSV.jl/docs/src/index.md:25:  * [`CSV.read`](@ref): a convenience function identical to `CSV.File`, but used when a `CSV.File` will be passed direclty to a sink function, like a `DataFrame`. In some cases, sinks may make copies of incoming data for their own safety; by calling `CSV.read(file, DataFrame)`, no copies of the parsed `CSV.File` will be made, and the `DataFrame` will take direct ownership of the `CSV.File`'s columns, which is more efficient than doing `CSV.File(file) |> DataFrame` which will result in an extra copy of each column being made. Keyword arguments are identical to `CSV.File`. Any valid Tables.jl sink function/table type can be passed as the 2nd argument. Like `CSV.File`, a vector of data inputs can be passed as the 1st argument, which will result in a single "long" table of all the inputs vertically concatenanted. Each input must have identical schemas (column names and types).
$ grep -nr emtpy CSV.jl/{src,test}
CSV.jl/src/detection.jl:507:        # emtpy file, use column names if provided
CSV.jl/test/testfiles.jl:369:    ("transposed_emtpy.csv", (transpose=true,),
$ grep -nr expressiong CSV.jl
CSV.jl/src/utils.jl:571:    ex isa Expr || throw(ArgumentError("must pass an expressiong to @refargs"))
$ grep -nr footrpint CSV.jl
CSV.jl/docs/src/index.md:26:  * [`CSV.Rows`](@ref): an alternative approach for consuming delimited data, where the input is only consumed one row at a time, which allows "streaming" the data with a lower memory footrpint than `CSV.File`. Supports many of the same options as `CSV.File`, except column type handling is a little different. By default, every column type will be essentially `Union{Missing, String}`, i.e. no automatic type detection is done, but column types can be provided manually. Multithreading is not used while parsing. After constructing a `CSV.Rows` object, rows can be "streamed" by iterating, where each iteration produces a `CSV.Row2` object, which operates similar to `CSV.File`'s `CSV.Row` type where individual row values can be accessed via `row.col1`, `row[:col1]`, or `row[1]`. If each row is processed individually, additional memory can be saved by passing `reusebuffer=true`, which means a single buffer will be allocated to hold the values of only the currently iterated row. `CSV.Rows` also supports the Tables.jl interface and can also be passed to valid sink functions.
$ grep -nr homogenous CSV.jl
CSV.jl/README.md:49:  This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogenous element type. 
$ 
@quinnj
Copy link
Member

quinnj commented Sep 15, 2022

Is there..........an issue here? I'm not quite sure what the goal here is or what you're trying to do or what problem you're running into. If you can help clarify, we can help diagnose what is going on.

@musvaage
Copy link
Contributor Author

Whether repository Members are interested in a PR to attend to this Issue's identified .jl and .md file typos is one matter, and it's a separate matter as to whether the cited mixed language .csv file with an English typo and faulty rendering characters is a good choice to include as a test file.

Perhaps it's possible to locate the original Norwegian file where the characters are properly rendered.

These are from the Norwegian alphabet.

Æ æ Ø ø Å å

@quinnj
Copy link
Member

quinnj commented Sep 15, 2022

I'm not personally worried about the rendering of test files. They're just for tests. They usually come from reported issues where we fix whatever was wrong and the author gives the ok to use the file as a test file. This one is more about the extremely wide dataset (something like ~20K columns or something) that is really useful for testing wide table scenarios.

@musvaage
Copy link
Contributor Author

#1026

Per request I might have the time available to fix the Norwegian text in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants