Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up more the output of convert_docx_to_markdown #1068

Open
3 of 4 tasks
samarth9008 opened this issue Jul 1, 2024 · 18 comments
Open
3 of 4 tasks

Clean up more the output of convert_docx_to_markdown #1068

samarth9008 opened this issue Jul 1, 2024 · 18 comments
Assignees

Comments

@samarth9008
Copy link
Collaborator

samarth9008 commented Jul 1, 2024

The doc for the flow is here
docs/documentation_meta/all.gdocs.how_to_guide.md

The script is here
dev_scripts/documentation/convert_docx_to_markdown.py

Remove - [<span class="underline">Google Authenticator</span>]
Remove - 
[<u>Google Authenticator</u>]
… (one char) -> …

Remove the style `<img src="figs/setup_vpn_and_dev_server_access/image1.png" style="width:5.43272in;height:4.52727in" />`

I would focus on one thing at a time. Start with something easy and have multiple PRs to achieve the goal to ease up the review

FYI @gpsaggese @sonaalKant

@mayank922
Copy link
Contributor

Hi @samarth9008

Could you please provide access to the Gdoc file mentioned above

@samarth9008
Copy link
Collaborator Author

@surbhi498 and @mayank922

Seeing the complexity of the task, can you guys pls co-ordinate and work on it together. It might take long for one person to work on it. Consider it as one of the team work testing task.

FYI @DanilYachmenev @gpsaggese @sonaalKant

@surbhi498
Copy link
Contributor

Hi @samarth9008

Could you please provide access to the Gdoc file mentioned above

@surbhi498 surbhi498 self-assigned this Jul 7, 2024
@gpsaggese
Copy link
Contributor

Shared.

@mayank922
Copy link
Contributor

Hi @samarth9008

I am unable to find dev_scripts/lint_md.sh. Has the name of this file changed or is there any other way to use linter on md files?

@samarth9008
Copy link
Collaborator Author

@mayank922
Copy link
Contributor

Hi @samarth9008

I did follow the one in the docs for my previous PR but this doc docs/documentation_meta/all.gdocs.how_to_guide.md mentions it incorrectly.

We could correct this line

@samarth9008
Copy link
Collaborator Author

samarth9008 commented Jul 8, 2024

This was the old way of running linter specifically on md files. Lately we merged with our python linter and even md files can be lint using i lint --files .... .

Feel free to create a PR for it.

@mayank922
Copy link
Contributor

Hi @samarth9008

The second task would be done by you right?

@samarth9008
Copy link
Collaborator Author

Yes

@surbhi498
Copy link
Contributor

surbhi498 commented Jul 18, 2024

Hi @samarth9008,
When we are running the test cases against the fun it gives error due to the interactive mode used in the function in script. We are using an alternative "pytest -s -v" to run the test cases locally. Is there any other alternative for this that would not give error while we create a PR for the same.

@samarth9008
Copy link
Collaborator Author

Instead of testing the script, can we only tests the function used in the function.

@mayank922
Copy link
Contributor

Do you mean that we just test the rest of the functionalities of this fun and don't test the docker command which uses the interactive mode?

docker_cmd = f"docker run --rm --user $(id -u):$(id -g) -it --workdir {work_dir} --mount {mount} {docker_container_name} {convert_docx_to_markdown_cmd}"

@mayank922
Copy link
Contributor

Hi @samarth9008

How do you want us to test the _move_media( ) function?

We can write a test case where if there is No media directory found, it doesn't give any output just logs it.

To test for a directory that exists do you want us to create a test directory?

@samarth9008
Copy link
Collaborator Author

For now lets do one thing

Try to find tests cases similar to this case. Test case about scripts and see how they are tested.

Try to look around the code to understand how temp dir can be created. Looking through other code can give more idea. If function seems complex or have no idea how to move forward, lets leave a TODO and we will address it separately in a different issue.

@mayank922
Copy link
Contributor

I think we can leave a TODO for _convert_docx_to_markdown function.

We will try again working on _move_media( ) function and start working on _clean_up_artifacts function

@surbhi498
Copy link
Contributor

surbhi498 commented Jul 23, 2024

I have raised PR for the Script. The Link is given as below #1092 (comment)

@surbhi498
Copy link
Contributor

surbhi498 commented Jul 31, 2024

Hi @samarth9008,
I have raised PR for the script enhancement Link is enclosed here with PR_For_Script_Enhancement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants