Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider support yaml aliases #93

Closed
Yablon opened this issue Dec 3, 2019 · 8 comments
Closed

Consider support yaml aliases #93

Yablon opened this issue Dec 3, 2019 · 8 comments

Comments

@Yablon
Copy link

Yablon commented Dec 3, 2019

Some yaml files are organizing as the following, could you please consider to support that ?
Like the followling from the official documentation https://pyyaml.org/wiki/PyYAMLDocumentation

Aliases
Note that PyYAML does not yet support recursive objects.

Using YAML you may represent objects of arbitrary graph-like structures. If you want to refer to the same object from different parts of a document, you need to use anchors and aliases.

Anchors are denoted by the & indicator while aliases are denoted by ``. For instance, the document

left hand: &A
name: The Bastard Sword of Eowyn
weight: 30
right hand: *A
expresses the idea of a hero holding a heavy sword in both hands.

PyYAML now fully supports recursive objects. For instance, the document

&A [ *A ]
will produce a list object containing a reference to itself.

@omry
Copy link
Owner

omry commented Dec 3, 2019

Hi Yablon, can you show a more concrete example of a yaml file that does not work right now with OmegaConf?

I am not sure what you are trying to represent.

@Yablon
Copy link
Author

Yablon commented Dec 3, 2019

@omry Yes, thank you for your reply !
I have 2 yaml files.

test_1.yaml

sample_rate: &SR !!int "4000"
sample_rate_test: *SR

test_2.yaml

sample_rate: &SR !!int "8000"

but when I merge the 2 files using omegaconf,

a = OmegaConf.load('test_1.yaml')
b = OmegaConf.load('test_2.yaml')
c = OmegaConf.merge(a, b)
print(c)

I got this

{'sample_rate': 8000, 'sample_rate_test': 4000}

What I expected is I got all the variables with the same aliases changed.

I think if you support the merge method, it is reasonable to support the above usage.

Thank you!

@omry
Copy link
Owner

omry commented Dec 3, 2019

The problem is that PyYAML resolves the aliases when you load each individual file.
luckily, OmegaConf does support this functionality through interpolations:

file1:

a: 10
b:
   a: 20

file1:

a1 : ${a}
b1 : ${b.a}
c1 = OmegaConf.load('file1.yaml')
c2 = OmegaConf.load('file2.yaml')
c = OmegaConf.merge(a, b)
# will print as is
print(c.pretty())

# will print with values resolved.
print(c.pretty(resolve=True))

@Yablon
Copy link
Author

Yablon commented Dec 3, 2019

@omry That is great !
However, I think maybe I didn't express myself well.

I usually use a yaml config file as an instead of tf.contrib.training.HParams for now.

The yaml config file is usually a little bit complicated, and contains many items.

For simplicity, I usually write two yaml files when training. One is the big yaml config file, the other is a small file that may contain many nodes. When training, I will read the big yaml config file first and the replace the variables in the big file with the variables in the small file.

In that way, my training history is saved with not so many efforts.

But the problem is that, some aliases in the big file, can't changed once for all with the variable in small files. For a long time I have considered how to resolve this, until I see your awesome omegaconf. That's why I give the examples above.

In the end, I want to replace variables that has aliases all with another value. Is that possilbe with omegaconf ?

By the way, I found lists variables read using omegaconf don't support list operations.
like the following,

a = OmegaConf.load('config.yaml')
b = a.some_list + [1]  # that's ok
b = [1] + a.some_list # throw an error

will throw an error, can that be fixed ?

Thank you!

@omry
Copy link
Owner

omry commented Dec 3, 2019

Hi Yablon,
Thanks for sharing more context. I am happy to hear you are using OmegaConf for machine learning. it was actually created with a machine learning use case in mind.

Firstly, since you already know about OmegaConf, spend some time going through the documentation of what OmegaConf can do.
I then strongly suggest that you look at Hydra which builds on OmegaConf to make it even more powerful and good to use for complex use cases like ML.

Hydra make it easy to compose configurations with OmegaConf in a way that is most likely powerful enough to do what you need to do.
It also offers other useful features (parameter sweeps, tab completion and more).

Your explanation of the specific problem you are facing is not good enough for me to understand yet. it's best to show with a small example what the problem is.

About your other problem with list: it is a known issue, OmegaConf list is not really a list and primitive list does not like adding itself with it.
You can work around it by doing:

a = OmegaConf.load('config.yaml')
b = OmegaConf.create([1]) + a.some_list

Feel free to ask followup questions. you can also join the chat (see chat link in README for this project), I can answer more questions about both there.

@Yablon
Copy link
Author

Yablon commented Dec 4, 2019

Thank you @omry

Sorry for my low level of English language, what makes you confused.
I write an example like example

Hydra seems to be a big repository, and I will go through it later.

@omry
Copy link
Owner

omry commented Dec 4, 2019

A few notes about your repo:

  1. Stop using yaml anchors with OmegaConf, they do not work well when you combine different configs.
  2. You don't need to declare the type like max_wav_value: !!float "32768.0", This would be just fine: max_wav_value: 32768.0

Try something like this (with OmegaConf interpolation):

all_config.yaml:

sampling_rate: 8000.0

feature_extract:
  sample_rate: ${sampling_rate}

train.yaml:

sampling_rate: 16000.0  
base = OmegaConf.load("all_config.yaml")
traincfg = OmegaConf.load("train.yaml")
config = OmegaConf.merge(base, traincfg) # first base, then traincfg
print(config.pretty(resolve=True))

This should print something like (I didn't test):

sampling_rate: 16000.0
feature_extract:
  sample_rate: 16000.0

And really, please spend 20 minutes to go through the Hydra tutorial. It will really make your life easier to switch to it.

@Yablon
Copy link
Author

Yablon commented Dec 4, 2019

@omry Thank you for your example, it is very helpful. I will learn Hydra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants