Transcribed subtitle file paths are not returned when skip_trans=True is specified #54

MaleicAcid · 2024-08-18T22:21:37Z

For those who use GPU computing power on the cloud, they usually choose to run the transcription task separately first. But currently, when skip_trans=Ture is enabled in openlrc, the path of transcribed subtitle files will not be returned.

openlrc/openlrc/openlrc.py

Lines 193 to 219 in d55fb5b

    
           if not skip_trans and bilingual_sub: 
        
               bilingual_subtitle = BilingualSubtitle.from_preprocessed( 
        
                   transcribed_path.parent, audio_name.replace('_preprocessed', '') 
        
               ) 
        
               bilingual_optimizer = SubtitleOptimizer(bilingual_subtitle) 
        
               bilingual_optimizer.extend_time() 
        
               # TODO: consider the edge case (audio file name contains _preprocessed) 
        
               getattr(bilingual_subtitle, f'to_{subtitle_format}')() 
        
               bilingual_lrc_path = bilingual_subtitle.filename.with_suffix(bilingual_subtitle.suffix) 
        
               shutil.move(bilingual_lrc_path, result_path.parent / bilingual_lrc_path.name) 
        
               non_translated_subtitle = transcribed_opt_sub 
        
               optimizer = SubtitleOptimizer(non_translated_subtitle) 
        
               optimizer.extend_time()  # Extend 0.5s like what translated do 
        
               getattr(non_translated_subtitle, f'to_{subtitle_format}')() 
        
               non_translated_lrc_path = non_translated_subtitle.filename.with_suffix(non_translated_subtitle.suffix) 
        
               shutil.move( 
        
                   non_translated_lrc_path, 
        
                   result_path.parent / subtitle_path.name.replace( 
        
                       f'_preprocessed.{subtitle_format}', 
        
                       f'_nontrans.{subtitle_format}' 
        
                   ) 
        
               ) 
        
               logger.info(f'Translation fee til now: {self.api_fee:.4f} USD') 
        
               self.transcribed_paths.append(result_path)

 [2024-08-18 22:14:37] INFO     [ThreadPoolExecutor-1_1] Optimized json file saved to /data/jid-4qk9Ku89-data/home/user00/gitspace/video_tools/data/preprocessed/1. Section 2_Introduction_preprocessed_transcribed_optimized.json
 [2024-08-18 22:14:37] INFO     [ThreadPoolExecutor-1_1] File saved to /data/jid-4qk9Ku89-data/home/user00/gitspace/video_tools/data/preprocessed/1. Section 2_Introduction_preprocessed.srt
...
2024-08-18 22:14:37.817 | INFO     | transcriber:invoke:73 - transcribe end, result: []

Another hope is that openlrc can provide a separate translation module that does not rely on pytorch.
I built a translation application with this issue and compiled it using nuitka. I found that it still requires pytorch to be compiled to run, although this seems unnecessary.

openlrc/openlrc/utils.py

Line 17 in d55fb5b

import torch

The text was updated successfully, but these errors were encountered:

zh-plus · 2024-08-20T03:05:32Z

For those who use GPU computing power on the cloud, they usually choose to run the transcription task separately first. But currently, when skip_trans=Ture is enabled in openlrc, the path of transcribed subtitle files will not be returned.

I will fix it soon in the next minor version.

Another hope is that openlrc can provide a separate translation module that does not rely on pytorch.
I built a translation application with this https://github.com/zh-plus/openlrc/issues/34and compiled it using nuitka. I found that it still requires pytorch to be compiled to run, although this seems unnecessary.

I have looked into solutions that would enable users to install only the translation-related dependencies using pip install openlrc[trans]. However, openlrc[trans] in Poetry installs additional dependencies rather than entirely different ones (see Poetry documentation). If I classify PyTorch-related dependencies as "extras," it could confuse regular users who want both transcription and translation features.

I'm still exploring other options. Any recommendations would be appreciated.

MaleicAcid · 2024-08-20T12:44:50Z

For those who use GPU computing power on the cloud, they usually choose to run the transcription task separately first. But currently, when skip_trans=Ture is enabled in openlrc, the path of transcribed subtitle files will not be returned.

I will fix it soon in the next minor version.

Another hope is that openlrc can provide a separate translation module that does not rely on pytorch.
I built a translation application with this https://github.com/zh-plus/openlrc/issues/34and compiled it using nuitka. I found that it still requires pytorch to be compiled to run, although this seems unnecessary.

I have looked into solutions that would enable users to install only the translation-related dependencies using pip install openlrc[trans]. However, openlrc[trans] in Poetry installs additional dependencies rather than entirely different ones (see Poetry documentation). If I classify PyTorch-related dependencies as "extras," it could confuse regular users who want both transcription and translation features.

I'm still exploring other options. Any recommendations would be appreciated.

Is it possible to provide different version of python packages just like:

# A and B are two completely independent packages
torch==2.2.2+cu121(openlrc==1.5.0+full)
torch==2.2.2+cpu(openlrc==1.5.0+translate-only)

people who install openlrc==1.5.0+translate-only can run translation task without pytorch.

or

# package B,C depend on A
torch==2.2.2(openlrc-base==1.5.0)
torchaudio==2.2.2(openlrc-translate==1.5.0)
torchvision==1.7.0(openlrc-transcribe==1.5.0)

people who install openlrc-base and openlrc-transcribe can run translation task without pytorch.

zh-plus · 2024-09-10T08:37:09Z

Fixed in v1.5.2

zh-plus closed this as completed Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribed subtitle file paths are not returned when skip_trans=True is specified #54

Transcribed subtitle file paths are not returned when skip_trans=True is specified #54

MaleicAcid commented Aug 18, 2024 •

edited

Loading

zh-plus commented Aug 20, 2024

MaleicAcid commented Aug 20, 2024 •

edited

Loading

zh-plus commented Sep 10, 2024

Transcribed subtitle file paths are not returned when skip_trans=True is specified #54

Transcribed subtitle file paths are not returned when skip_trans=True is specified #54

Comments

MaleicAcid commented Aug 18, 2024 • edited Loading

zh-plus commented Aug 20, 2024

MaleicAcid commented Aug 20, 2024 • edited Loading

zh-plus commented Sep 10, 2024

MaleicAcid commented Aug 18, 2024 •

edited

Loading

MaleicAcid commented Aug 20, 2024 •

edited

Loading