Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update main to define arguments directly #118

Closed
wants to merge 3 commits into from

Conversation

snova-zoltanc
Copy link
Contributor

Summary

Change main to accept arguments so it can be called from python (not just CLI). This is necessary for Studio to be able to call main

PR Checklist

  • My PR is less than 500 lines of code
  • I have added sufficient comment as docstrings in my code
  • I have made corresponding changes to the documentation

@snova-zoltanc snova-zoltanc added the enhancement New feature or request label Sep 17, 2024
@snova-zoltanc snova-zoltanc requested a review from a team as a code owner September 17, 2024 21:09
@snova-zoltanc
Copy link
Contributor Author

Example of it working

>>> from generative_data_prep import main
>>> main("hello")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: main() missing 32 required positional arguments: 'output_path', 'overwrite_output_path', 'log_file_path', 'pretrained_tokenizer', 'tokenizer_class', 'vocab_file', 'merges_file', 'special_tokens_dict', 'categories_path', 'input_path', 'disable_space_separator', 'keep_prompt_only_sequences', 'ignore_input_format_error', 'prompt_keyword', 'completion_keyword', 'shuffle', 'num_workers', 'do_not_balance_hdf5', 'keep_split_jsonls', 'max_seq_length', 'input_packing_config', 'packing_boundary', 'attention_boundary', 'num_training_splits', 'num_dev_splits', 'num_test_splits', 'dev_ratio', 'test_ratio', 'prompt_prefix', 'prompt_postfix', 'apply_chat_template', and 'silent

@snova-jerrym
Copy link
Collaborator

Looks like some tests are still failing. Is it possible to do something like a proxy function? Like the old way of calling main is kept and create a new jumper function that takes args and expands it to call the new main.

@snova-edwardm
Copy link

snova-edwardm commented Sep 18, 2024

It seems likely that this line causes the error. vars can be used for a class but not for a dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants