New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Update bulk image conversion script to support multiple formats #226

Open

Anushlinux wants to merge 2 commits into Udayraj123:dev from Anushlinux:Anushlinux-dev-convertor_hook

+130 −88

Anushlinux commented Oct 3, 2024 •

edited

Loading

I have updated the conversion code and implemented my function to the convert_image_hook as you required

Pls review the code
(ps: sorry for deleting the previous pr i did it bimistake)


          Update image conversion script to support multiple formats

65ad6ae

Udayraj123 requested changes

View reviewed changes

scripts/hooks/convert_images_hook.py Outdated Show resolved Hide resolved

scripts/hooks/convert_images_hook.py Outdated Show resolved Hide resolved


          changes on the basis of hooks and bulk conversion

84ddb53

Anushlinux requested a review from Udayraj123

October 5, 2024 21:07

Udayraj123 requested changes

View reviewed changes

Owner

Udayraj123 left a comment

@Anushlinux you have understood and handled a lot of code flows(in local scripts) to OMRChecker. Great work on your first PR! 🎉
I've finished my review and I think we will need some discussion to make it even more extensible. Let's discuss async in the discord channel once you go through my comments.

scripts/local/utils/bulk_ops_common.py

@@ @@ -3,43 +3,20 @@ @@
               import glob
               import operator
               import os
+              from PIL import Image
+              from pdf2image import convert_from_path

Owner

Udayraj123 Oct 7, 2024

new packages in local scripts need to be added to requirements.dev.txt

scripts/local/utils/bulk_ops_common.py

-              # Usual pre-processing commands for speedups (useful during re-runs)
-              # python3 scripts/local/convert_images.py -i inputs/ --replace [--filter-ext png,jpg] --output-ext jpg
-              # python3 scripts/local/resize_images.py -i inputs/ -o outputs --max-width=1500

Owner

Udayraj123 Oct 7, 2024

Let's keep all these comments as a guide for the contributors to figure out what needs to be done here

scripts/local/utils/bulk_ops_common.py

                   extracted_files = []
                   for _dir, _subdir, _files in os.walk(input_dir):
                       matching_globs = [
-                          glob(os.path.join(_dir, f"*.{file_extension}"))
+                          glob.glob(os.path.join(_dir, f"*.{file_extension}"))

Owner

Udayraj123 Oct 7, 2024

let's change the import to from glob import glob instead

Owner

Udayraj123 Oct 7, 2024

Also, thanks for adding the code comment

scripts/local/utils/bulk_ops_common.py

@@ @@ -72,29 +52,79 @@ def get_local_argparser(): @@
                   local_argparser.add_argument(
                       "-r",
                       "--recursive",
-                      required=True,
-                      type=bool,
+                      action='store_true',

Owner

Udayraj123 Oct 7, 2024

why should we have this default? It's better to take recursive updates as an explicit input

scripts/local/utils/bulk_ops_common.py

Comment on lines +85 to +88

+                  pages = convert_from_path(input_path)
+                  for i, page in enumerate(pages):
+                      output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")
+                      page.save(output_path, 'JPEG')

Owner

Udayraj123 Oct 7, 2024

let's also include file name here

Suggested change

      
                pages = convert_from_path(input_path)
          
                for i, page in enumerate(pages):
          
                    output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")
          
                    page.save(output_path, 'JPEG')
          
                pages = convert_from_path(input_path)
          
                input_path = Path(PurePosixPath(input_path).as_posix())
          
                file_name = input_path.name
          
                for i, page in enumerate(pages):
          
                    output_path = os.path.join(output_dir, f"{file_name}_page_{i + 1}.jpg")
          
                    print(f"Saving page: {output_path}")
          
                    page.save(output_path, 'JPEG')

Owner

Udayraj123 Oct 7, 2024

Also convert_image and convert_pdf_to_jpg should come from image_utils instead of bulk_ops_common.py.
I am thinking of keeping this file only for operations that are common across all sorts of bulk scripts (bulk resize, convert, watermark, etc)

scripts/local/utils/bulk_ops_common.py

+                  """
+                  pages = convert_from_path(input_path)
+                  for i, page in enumerate(pages):
+                      output_path = os.path.join(output_dir, f"page_{i + 1}.jpg")

Owner

Udayraj123 Oct 7, 2024

I have a feature extension idea here, see if you can implement -
Let's add a support to extract only the first page based on a flag (in that case the image's output_dir shouldn't be created and the single image should be output directly with same filename without the page_ prefix.

scripts/local/utils/bulk_ops_common.py

+                  os.makedirs(output_dir, exist_ok=True)
+                  extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']
+                  filepaths = walk_and_extract_files(input_dir, extensions)

Owner

Udayraj123 Oct 7, 2024

snake case

Suggested change

      
                filepaths = walk_and_extract_files(input_dir, extensions)
          
                file_paths = walk_and_extract_files(input_dir, extensions)

scripts/local/utils/bulk_ops_common.py

+                  Bulk converts images and PDFs to the specified format.
+                  """
+                  os.makedirs(output_dir, exist_ok=True)
+                  extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']

Owner

Udayraj123 Oct 7, 2024

Let's keep pdf handling in a separate script (check other comments)

Suggested change

      
                extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf']
          
                extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif']

scripts/local/utils/bulk_ops_common.py

Comment on lines +100 to +104

+                  for input_path in filepaths:
+                      relative_path = os.path.relpath(os.path.dirname(input_path), input_dir)
+                      output_subdir = os.path.join(output_dir, relative_path) if not in_place else os.path.dirname(input_path)
+                      os.makedirs(output_subdir, exist_ok=True)

Owner

Udayraj123 Oct 7, 2024

The in_place flag handling can be a common bulk ops utility. I propose we define a function bulk_apply_on_files(func, input_dir, file_paths, in_place)

The first arg func can be convert_image or convert_pdf_to_image
The bulk_apply_on_files applies this passed function with the same arguments
- if in_place is True and the conversion was successful, the input file gets removed (pdf or image) and the output files are added in the same directory
- if in_place is False and the conversion was successful, the output files are added in a relative directory output_dir = f"{input_dir}/outputs/"
- If the conversion was unsuccessful, no file changes happen

Let's discuss more about this in the discord group if needed.

scripts/local/convert_images.py

-                  converted_count = convert_images_in_tree(args)
-                  trigger_size = args["trigger_size"]
+                  converted_count = convert_images(args.filenames, args.format.upper())

Owner

Udayraj123 Oct 7, 2024

Here we can call the function bulk_apply_on_files(convert_image, input_dir, file_paths, in_place) or something similar based on our discussion (check other comments below)

Udayraj123 added the hacktoberfest label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels