Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readFrame :: using string as column data type #834

Open
auge opened this issue Sep 27, 2024 · 1 comment
Open

readFrame :: using string as column data type #834

auge opened this issue Sep 27, 2024 · 1 comment
Labels
feature missing/requested features IO Input/Output related issues (e.g., reading and writing files)

Comments

@auge
Copy link
Contributor

auge commented Sep 27, 2024

(How) is it currently possible to load a dataframe with string as column type?

According to https://daphne-eu.github.io/daphne/FileMetaDataFormat/, there is only numeric data possible?

can we have support for valueType: string?

data.csv

Algiers,3.4
St. John's,4.3
Dodoma,26.3
Toliara,17.0
Yellowknife,4.0
Batumi,24.5
Istanbul,31.9
Tampa,41.6
Gjoa Haven,-1.3
Paris,18.2

data.csv.meta

{
  "numRows": 10,
  "numCols": 2,
  "schema": [
    {
      "label": "city",
      "valueType": "string"
    },
    {
      "label": "temperature",
      "valueType": "f64"
    }
  ]
}

daphne script:

path = "data.csv";
data = readFrame(path);
print(data);
@pdamme pdamme added feature missing/requested features IO Input/Output related issues (e.g., reading and writing files) labels Sep 27, 2024
@pdamme
Copy link
Collaborator

pdamme commented Sep 27, 2024

Hi @auge, it is currently not possible to read string data from files, but this feature is already WIP and will be added soon.

PR #797, which is about to be finalized and merged, will bring support for reading CSV files into matrices of string value type. These matrices (or individual columns) can then be processed with some basic string operations (e.g., concatenation, lower/upper case) or converted to numerical data (e.g., through number parsing, dictionary coding, or one-hot encoding).

As a follow-up, we're already working on support for frames with string columns and reading CSV files with string columns into a frame directly.

In the meantime, a work-around can be to convert the string data to numbers with some external tool/script (e.g., through dictionary coding) and to read just the numerical data into DAPHNE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature missing/requested features IO Input/Output related issues (e.g., reading and writing files)
Projects
None yet
Development

No branches or pull requests

2 participants