Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ObjectSpec hashable if an arg value is a list #1771

Merged
merged 6 commits into from
Sep 12, 2023

Conversation

bidyapati-p
Copy link
Contributor

@bidyapati-p bidyapati-p commented Aug 4, 2023

When tags is passed as args parameter into any Scenario object. It fails to to create hash because of List object.

Fixes #1768

When tags is passed as args parameter into any Scenario object. It fails to to create hash because of List object.

Issue : stanford-crfm#1768
@yifanmai
Copy link
Collaborator

Couple of questions:

  • Do you have an example in the code where we hash ObjectSpec or a subclass of it, thus triggering this bug?
  • Is there a more general way to fix this i.e. recursively descend and try to convert Dicts and Lists into immutable counterparts? _canonicalize_key() does something similar. The current code still fails if args contains a nested list as an arg value.

@yifanmai
Copy link
Collaborator

Another request - could you use an if-else branch instead of a ternary expression, for readability?

@bidyapati-p
Copy link
Contributor Author

I have created MyGenericScenario class with tags as a list.

I see there is another class GrammerScenario which takes tags as comma separated string and split them as list inside the class init(). Actually tags are list data only.

class MyGenericScenario(Scenario):

  # name = "my_generic"
  # description = "my Generic dataset"
  # tags = ["question_answering"]

  def __init__(self, name, description, tags):
      super().__init__()
      
      self.name = name
      self.description = description
      self.tags = tags

Here is the run_specs.py changes (note the tags here):

 scenario_args = {"name": f"my_{task}", "description": f"My Org {task}", "tags": [task]}

scenario_spec = ScenarioSpec(
    class_name="helm.benchmark.scenarios.my_generic_scenario.MyGenericScenario", args=scenario_args
)

Because ScenarioSpec takes ObjectSpec as arguement, it fails for list object.
I think the input object may contains integer, float, string, list or dictionary.

I guess the best fix would be:
if the data type is not premitive object like int/float/str, we can call obj._str_() method
So the code will be more generic and can handle future code which uses any other objects.

Let me know your thoughts on it.

from collections.abc import Hashable

def __hash__(self):
    return hash((self.class_name, tuple((k, self.args[k]) if isinstance(obj, Hashable) else (k, self.args[k].__str__()) for k in sorted(self.args.keys()))))

@bidyapati-p
Copy link
Contributor Author

If this looks more generic, I will commit the changes with if-else branches

@yifanmai yifanmai changed the title Fix for issue 1768 Make ObjectSpec hashable if an arg value is a list Aug 24, 2023
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. This looks like a reasonable use case.

I have two requests for changes, and then we can merge the code after they are addressed.

src/helm/common/object_spec.py Outdated Show resolved Hide resolved
@yifanmai
Copy link
Collaborator

yifanmai commented Sep 5, 2023

Hi @bidyapati-p are you still working on this? Otherwise, I will make the requested corrections to this PR and merge it.

return hash((self.class_name, tuple((k, self.args[k]) for k in sorted(self.args.keys()))))
t = tuple()
for k in sorted(self.args.keys()):
t = t + ((k, tuple(self.args[k])),)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this now breaks on non-list values because tuple(self.args[k]) doesn't work if self.args[k] is not a list:

from helm.common.object_spec import ObjectSpec

o = ObjectSpec("myclass", {"myarg": 42})
hash(o)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yifanmai/oss/helm/src/helm/common/object_spec.py", line 18, in __hash__
    t = t + ((k, tuple(self.args[k])),)
TypeError: 'int' object is not iterable

Could we do something like this instead?

    def __hash__(self):
        def get_arg_value(key: str) -> Any:
            value = self.args[key]
            # lists are not hashable, so convert them to tuples
            if isinstance(value, list):
                return tuple(value)
            return value

        args_tuple = tuple((k, get_arg_value(k)) for k in sorted(self.args.keys()))
        return hash((self.class_name, args_tuple))

Copy link
Contributor Author

@bidyapati-p bidyapati-p Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I should run few test.

The new code you suggest is again specific to List and premitive data type, which is good enough for our usecase, which will either have list of strings or a single string.

It works in these 2 datatype, however, it fails for dict.
o = ObjectSpec("myclass", {"myarg": {"a":1}})
hash(o)

Do you think this could have been a better solution for other data types as well for future proof? I was suggesting the same earlier too.

from  typing import Hashable
def __hash__(self):
    def get_arg_value(key: str) -> Any:
        value = self.args[key]
        # convert all the non-hashable objects into string
        if not isinstance(value, Hashable):
            return value.__str__()
        return value

    args_tuple = tuple((k, get_arg_value(k)) for k in sorted(self.args.keys()))
    return hash((self.class_name, args_tuple))

@bidyapati-p
Copy link
Contributor Author

still fails for :
o = ObjectSpec("myclass", {"myarg": [1,2,3,[4,5,6,[7,8,9]]]})
hash(o)

@bidyapati-p
Copy link
Contributor Author

Test cases:

o = ObjectSpec("myclass", {"myarg": 10})
hash(o)

o = ObjectSpec("myclass", {"myarg": [10, 20, 30]})
hash(o)

o = ObjectSpec("myclass", {"myarg": {"aa": [1,2,3]}})
hash(o)

o = ObjectSpec("myclass", {"myarg": {"aa": {"bb":[1,2,3]}}})
hash(o)

o = ObjectSpec("myclass", {"myarg": [1,2,3,[4,5,6,[7,8,9]]]})
hash(o)

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks again!

@yifanmai yifanmai merged commit 18000c6 into stanford-crfm:main Sep 12, 2023
3 checks passed
@bidyapati-p bidyapati-p deleted the fix_issue_1768 branch September 15, 2023 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When tags list is passed in Scenario args , it fails at ObjectSpec.__hash__()
2 participants