You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
using ReadStat
file = download("http://www.stata-press.com/data/r15/fullauto.dta",
"data/ologit.dta")
data = read_dta(file)
using StatFiles, DataFrames
output = load(file) |> DataFrame
If you take a look at data you will see that categorical variables have a mapping to labels given by val_labels_keys and val_label_dict. Without taking into account that nuance, the default behavior specified here yields the values instead of the labels (e.g., rep77 gives [1, 2, 3, 4, 5] instead of ["Poor", "Fair", "Average", "Good", "Excellent"]). It might be the case for other file formats, but this is confirmed for Stata's dta.
The text was updated successfully, but these errors were encountered:
Do you think it would be appropriate to return a CategoricalArray for such cases? Something that I've been wondering recently is whether CategoricalArrays should be able to preserve the original value code in addition to the label. In R the fact that you can represent those either as factors or as labelled numeric vectors creates a divide which isn't optimal IMO.
FWIW ReadStatTables.jl supports values labels via a special LabeledArray type. There's no way currently to convert these to CategoricalArray while preserving the ordering of levels though.
If you take a look at
data
you will see that categorical variables have a mapping to labels given byval_labels_keys
andval_label_dict
. Without taking into account that nuance, the default behavior specified here yields the values instead of the labels (e.g., rep77 gives[1, 2, 3, 4, 5]
instead of["Poor", "Fair", "Average", "Good", "Excellent"])
. It might be the case for other file formats, but this is confirmed for Stata's dta.The text was updated successfully, but these errors were encountered: