Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 secs "No markers found" analysis #13

Open
civanovich-senck opened this issue Nov 18, 2020 · 16 comments
Open

0 secs "No markers found" analysis #13

civanovich-senck opened this issue Nov 18, 2020 · 16 comments

Comments

@civanovich-senck
Copy link

civanovich-senck commented Nov 18, 2020

Dear Developers,

After running my 20 genomes with the Markers Discovery pipeline around a 100 times (trying different parameters combos), I always ended with the same result: "No markers identified"
Even with genomes belonging to the same species, the pipeline find nothing. Another weird thing that happens is that the whole analysis takes between 0 to 3 secs, so no mapping step is happening I guess. Also I always get this flag:

%%%%%%%%%%%%%%%%%%%%%%%%%%%% Get FASTQ files of the contigs generated %%%%%%%%%%%%%%%%%%%%%%%%%%%
Which doesnt make sense as started the fasta assemblies from SPAdes, and also tried converting these fastas to .fq, but I still get that.
an example of parameters im using :

-VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28

output example :

#################################################################################################
#################################### Mapping Process started ####################################
#################################################################################################


[ System Call: perl /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl /lustre/scratch/grp/fslg_Lecanomics/DOMINO_outputs 1600790935 ]


%%%%%%%%%%%%%%%%%%%%%%%%%%%% Get FASTQ files of the contigs generated %%%%%%%%%%%%%%%%%%%%%%%%%%%

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++ Analysis of Molecular Markers started +++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


+ Markers Directory: /lustre/scratch/grp/fslg_Lecanomics/DOMINO_outputs/202009221008_DM_markers ...OK
[ Tue Sep 22 10:08:56 2020 ]	Step took 00 hours, 00 minutes, and 01 seconds




#################################################################################################
##################################### No markers identified #####################################
#################################################################################################
+ Termination of DOMINO marker identification
+ No markers were identified using these parameters...


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++ ANALYSIS FINISHED +++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ Tue Sep 22 10:08:56 2020 ] Whole process took 00 hours, 00 minutes, and 01 seconds


 Early termination, exiting the script

please help!

Cristóbal

@JFsanchezherrero
Copy link
Member

JFsanchezherrero commented Nov 20, 2020

Hi there Cristobal,

Definitely, something is going wrong here. Let's work it out.

I am sorry but I can not understand exactly what do you mean with: "Which doesnt make sense as started the fasta assemblies from SPAdes, and also tried converting these fastas to .fq, but I still get that."

Do you mean you provide DOMINO marker script with fasta assemblies (assembled elsewhere) and fastq reads or what exactly?

Provide us with the full log details and full command call. You can mask full path if desired from files or send it to me via email if you prefer so.

I have just seen you previously opened an issue (#12). Check the details to provide reads and contigs assembled.

Thanks

@civanovich-senck
Copy link
Author

Hi José,

What I meant was that Im using clean_reads coming out from Dominos + assemblies I did with these clean reads on SPAdes. As you saw on my previous issue, I was never truly able to run the assembly portion of the pipeline. Now what I also did was to force a convertion of the assemblies, from .fasta to .fq in order to bypass the "Get FASTQ files of the contigs generated" flag.

a command example:

perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28 -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_bermCTAB_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_cadu255_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_cadu255B_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_carp385_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_disp377_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_intm388_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_lec391_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_mdeus387_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_pmur380_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_poly381_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_rupi384_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarc11_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarcC_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarcJ_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subcar389_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subint237_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subint237B_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_var239_contigs.fasta" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R2.fq" -DM discovery

and its output:
slurm-39087745.txt

Precissely now im testing the assembly pipeline again, on another supercomputing server, lets see what comes out of that.

Thanks,
Cristóbal

@JFsanchezherrero
Copy link
Member

Hi Cristobal,

I still cannot get what you mean with "force a convention of the assemblies from .fasta to .fq in order to bypass...".

What I guess is happening here is a misunderstanding with terms. Basically, what DOMINO does is to map sequencing reads (R1 & R2 or single end) to a reference, either the closest genome provided (by user) or the assemblies previously generated.

To be clear, lets make an example. I would do it simple and just use 3 samples: Dmelanogaster, Dsimulans and Dyakuba.
As sequencing read files. I would have:

  • Dmelanogaster_R1.fq & Dmelanogaster_R2.fq
  • Dyakuba_R1.fq & Dyakuba_R2.fq
  • Dsimulans_R1.fq & Dsimulans_R2.fq

Once cleaned, these reads would be renamed to:

  • reads_id-Dmelanogaster.clean.R1.fastq & reads_id-Dmelanogaster.clean.R2.fastq
  • reads_id-Dsimulans.clean.R1.fastq & reads_id-Dsimulans.clean.R2.fastq
  • reads_id-Dyakuba.clean.R1.fastq & reads_id-Dyakuba.clean.R2.fastq

Imagine we do not have a close and well assembled reference. We would need to create assemblies for each taxa. After the assembly, I would have:

  • clean_assembly_id-Dmelanogaster.contigs.fasta
  • clean_assembly_id-Dsimulans.contigs.fasta
  • clean_assembly_id-Dyakuba.contigs.fasta

Now, for the marker discovery, we need to provide DOMINO with clean sequencing reads AND assembled contigs. Both are required and mandatory (only under a specific circunstance it is not, but here and for this example I would not enter into details).

This would be the command:

 perl DM_MarkerScan_v1.1.pl -option user_assembly_contigs -type_input pair_end -o test/ 
 -taxa_names Dmelanogaster,Dsimulans,Dyakuba -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 
 -user_contig_files path_to_file1/clean_assembly_id-Dmelanogaster.contigs.fasta 
 -user_contig_files path_to_file2/clean_assembly_id-Dsimulans.contigs.fasta 
 -user_contig_files path_to_file3/clean_assembly_id-Dyakuba.contigs.fasta 
 -user_cleanRead_files reads_id-Dmelanogaster.clean.R1.fastq -user_cleanRead_files reads_id-Dmelanogaster.clean.R2.fastq 
 -user_cleanRead_files reads_id-Dsimulans.clean.R1.fastq -user_cleanRead_files reads_id-Dsimulans.clean.R2.fastq 
 -user_cleanRead_files reads_id-Dyakuba.clean.R1.fastq -user_cleanRead_files reads_id-Dyakuba.clean.R2.fastq 
 -DM discovery 

So, I can see your command is correct but I am afraid you misunderstood something an not providing reads but contigs that you renamed.

If this is not the case and you proceed correctly, provide me with the details in files: DOMINO_dump_information.txt under mapping folder generated. No mapping is done because this folder is either empty or something else is happening.

Thanks

@civanovich-senck
Copy link
Author

civanovich-senck commented Nov 20, 2020

Hi José

I still cannot get what you mean with "force a convention of the assemblies from .fasta to .fq in order to bypass...".

what I did (after failing so hard with normal files), was that through a script, I transformed some of my contigs.fasta to contigs.fq by giving them some fake scores, in order to bypass the Get FASTQ files of the contigs generated flag.

What I guess is happening here is a misunderstanding with terms. Basically, what DOMINO does is to map sequencing reads (R1 & R2 or single end) to a reference, either the closest genome provided (by user) or the assemblies previously generated

No. I am using the clean reads generated by DOMINO, and the assemblies I generated on SPAdes, using these same clean reads.
For example for taxa berm9:

/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta" -user_contig_files -> contig done with SPAdes, based on the cleaned reads outputted by DOMINOs

user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -> DOMINOs cleaned pair of reads.

some files generated in the mapping folder:
202011200742_Mapping-Parameters.txt
DOMINO_dump_param.txt

The file DOMINO_dump_information for this run is empty!

Another thing that want to bring to attention regarding the DM_Clean, is an issue with the numbers of cores command call -p : seems to be that despite my flag for using 28 cores, the pipeline defaults to use 2. This has been a problem the IT guy at the supercomputer had brough to my attention. Quoting from him

It's possible that the problem just doesn't scale well, or perhaps multiple cores are only used in a small part of the workload, in which case just using a few cores is probably the way to go.

any clues on this issue?

Cristóbal

@JFsanchezherrero
Copy link
Member

JFsanchezherrero commented Nov 21, 2020

Hi there Cristobal,

I guess I found a bug! DOMINO is not correctly processing external assemblies and reads. I will come with a solution in a few days. I will let you know as soon as possible.

Regarding the topic of transforming contigs.fasta to contigs.fq,

what I did (after failing so hard with normal files), was that through a script, I transformed some of my contigs.fasta to contigs.fq by giving them some fake scores, in order to bypass the Get FASTQ files of the contigs generated flag.

I can see what you did but I can NOT understand the purpose of it. What was the point? Are you using them finally? Was it just a desperate action for the message "Get FASTQ files of the contigs generated". Maybe it is wrong written and it should not be. It might be appropriate to re-write it as "Get FASTQ files that assembled the contigs".

Finally, regarding the CPU implementation. It might happen as the guy in IT mentioned that in some steps of the process, the total amount of CPUs provided are not fully used. In some cases, it is difficult to implement threads especially in cases where no parallel solution is provided by third parties and in other cases there are limitations in data processing.

I can assure that for most of the mapping and marker discovery steps, threads are implemented and for most of the time fully working. Anyhow, it might be appropriate to set a number of CPU according to your system disponibility and other users workload.

I will came back with a solution.
Thanks for the detailed information provided and comments.

Have a nice day

@civanovich-senck
Copy link
Author

Hi José

Was it just a desperate action for the message "Get FASTQ files of the contigs generated".

yep, basically.

Anyhow, it might be appropriate to set a number of CPU according to your system disponibility and other users workload

Basically for me has been eyeballing number of cores and ram. I had to migrate from our supercomputers here in frankfurt to the computing resources at utah, because I was overusing disk space, ram and computing time with DOMINO (which also lead to some very stern calls and emails from the former IT researcher). It is an odd thing this situation because I was taking aprox. 10 hrs for a DM_Clean run of 4 reads, plus added reference genomes as database for mapping, on 12 cores, whereas in the american server my tests lead from 16 to 14 hrs on 28 cores.

Stay safe!

@JFsanchezherrero
Copy link
Member

Hi Cristobal,

Sorry for the delay.

I have been working today in this issue, but it does work for me. I try something similar to what you did. I use DOMINO to clean and trimm reads, I used spades externally to assemble reads, I renamed files and then I use DOMINO marker to create markers. The command was:

perl ../bin/DM_MarkerScan_v1.1.pl -option user_assembly_contigs -type_input single_end -o test/ -taxa_names sp1,sp2,sp3,sp4 -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp1.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp2.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp3.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp4.contigs.fasta -user_cleanRead_files ./test/202012081052_DM_clean_data/QC-filtered_id-sp1.fastq -user_cleanRead_files ./test/202012081052_DM_clean_data/QC-filtered_id-sp2.fastq -user_cleanRead_files ./test_example/test/202012081052_DM_clean_data/QC-filtered_id-sp3.fastq -user_cleanRead_files ./test_example/test/202012081052_DM_clean_data/QC-filtered_id-sp4.fastq -DM discovery

I have tried using the 4 fastq reads provided within the example. I haven't tested paired-end but I doubt the problem is in there. Can you try and clean all previous old folders? All files and folders generated such as 2020...Mapping/Mapping_old_xx/Markers, etc.

Try using just a couple of assemblies and 4-5 samples. Let me know what happens.

About the performance difference I am afraid is a common issue between different computer nodes. Each node has a given RAM, CPU type and capacity and these differences generate (not only for DOMINO) differences in perfomance. For example, I have recently encountered differences up to 50x in time for a simple awk process (2 million gunzip line) ranging from 10 minutes in one node to 12 hours in other nodes.

Best,

@JFsanchezherrero
Copy link
Member

Ok, I might have found where the problem is. Check this out.

This is a summary of your previous command

perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28
-user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta"
-user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_bermCTAB_contigs.fasta"
...
-user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq"
-user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq"
-user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq"
-user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq"
...
-DM discovery

First of all, contig names should have xxx_id-name.contigs.fasta. Take into the character "_" and "-". You did not use them correctly.

It is Ok for clean reads.

In the middle of clean reads, you include a user_contig_files entry with reads. This one might be generating the problem!

Take this into account!

Let me know what happen using these new fixed settings.

Thanks,

Best

@civanovich-senck
Copy link
Author

Hi José,

Ok, thing are advancing. I completely missed that underscore and also the underscore on _contigs.fasta. Now the output is still not generating markers, but Ive saw that bowtie is not being called correctly:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Aligning Reads Individually %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • User clean reads files would be mapped
  • Obtain information of the reference sequences
  • Mapping process would be divided into 19 parts using up to 0/12 CPUs provided
    • Mapping reads (bermCTAB) vs reference (berm9): [1/19]
      Undefined subroutine &Pod::Usage::pod2usage called at /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../lib/DOMINO.pm line 298.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ERROR !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Exiting the script. Some error happened when calling bowtie for mapping the file /lustre/scratch/grp/fslg_Lecanomics/clnreads_id-bermCTAB_R1.fastq...

Try 'perl /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl -h|--help or -man' for more information.
Exit program.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ Wed Dec 9 06:14:57 2020 ]

Try perl /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl -man for more information

Also on one of my test, the output is basically showiong me the same bowtie problem, plus 50 megas worth o this message:

###################### Fetching information from all the PROFILEs generated #####################

  • Checking profiles of variation for each contig and merging information...
  • Using a sliding window approach...
  • Using parallel threads (12 CPUs)...
  • Dataset would be splitted for speeding computation into 400 subsets...
    Use of uninitialized value in concatenation (.) or string at /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerDiscovery.pl line 310.

This message appears after mothur is called.

Thanks for taking the time on this.

Cristóbal

@JFsanchezherrero
Copy link
Member

JFsanchezherrero commented Dec 9, 2020

Hi there,

Can you provide me with the full command you send?

Make sure you clean all previous markers and mapping folders generated.

Set the option --debug and provide me with log and error details, use a txt file if necessary.

Let's see what is going on and fix it!

Regards

@civanovich-senck
Copy link
Author

Hi José,

Comando:

perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -p 12 --debug -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-berm9.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-bermCTAB.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-cadu255.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-cadu255B.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-carp385.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-disp377.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-intm388.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-lec391.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-mdeus387.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-pmur380.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-poly381.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-rupi384.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarc11.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarcC.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarcJ.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subcar389.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subint237.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subint237B.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-var239.contigs.fasta" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R2.fq" -DM discovery

Attached are some of the output files

202012100555_Mapping_ERROR.txt
202012100555_Markers-Parameters.txt
DOMINO_dump_information.txt

slurm-39264041.txt

Cheers,
Cristóbal

@JFsanchezherrero
Copy link
Member

Hi there,

I have checked DOMINO logs and additional information provided. It seems there is a problem with bowtie calling.

We are going to check the version you are using within DOMINO. Can you execute the following script and provide me with the output.

perl bin/scripts/DM_DOMINO_dependencies.pl

On the other hand, just one comment for your command. Fix for sample mdeus387 the input R2 string:

-user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq" 
-user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" 

Change user_contig_files to user_cleanRead_files. I dont think this would be the problem, at least, not that early in the process.

Can you re-run the perl DOMINO command (--debug mode ON) with the last issue fixed and provide with the previous files (*_Mapping_ERROR.txt, _Markers-Parameters.txt, DOMINO_dump_information.txt, slurm-.txt).

Also, provide me if any of the files generated within /zhome/fslcollab260/DOMINO_output_MarkerDisc/**_DM_mapping/berm9 as it seems to be the first analyzed and the one providing problems.

Thanks in advance!

Best,

@civanovich-senck
Copy link
Author

Hi,

the DM_DOMINO_dependencies.pl:

`#################################################################################################
############################################ MODULES ############################################
#################################################################################################

Checking perl module dependencies...

    Checking module: Getopt::Long....................Getopt/Long.pm [OK]
    Checking module: Pod::Usage....................Pod/Usage.pm [OK]
    Checking module: Data::Dumper....................Data/Dumper.pm [OK]
    Checking module: POSIX....................POSIX.pm [OK]
    Checking module: FindBin....................FindBin.pm [OK]
    Checking module: DOMINO....................DOMINO.pm [OK]
    Checking module: File::Copy....................File/Copy.pm [OK]
    Checking module: File::Find;.................... [X]

ATTENTION: File/Find; is missing but DOMINO might still work appropiate...]

    Checking module: List::Uniq....................List/Uniq.pm [OK]
    Checking module: File::Path....................File/Path.pm [OK]
    Checking module: Cwd....................Cwd.pm [OK]
    Checking module: Parallel::ForkManager....................Parallel/ForkManager.pm [OK]
    Checking module: Spreadsheet::WriteExcel....................Spreadsheet/WriteExcel.pm [OK]
    Checking module: Time::HiRes....................Time/HiRes.pm [OK]
    Checking module: List::Util....................List/Util.pm [OK]

#################################################################################################
############################################ BINARIES ###########################################
#################################################################################################

Checking binary dependencies from other sources...

    Checking BLAST:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../NCBI_BLAST/
    Checking bowtie2 v2.2.9:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../bowtie2-2.2.9/
    Checking samtools v1.3.1:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../samtools-1.3.1/samtools
    Checking mothur v1.32:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../MOTHUR_v1.32.0/mothur
    Checking CAP3:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../cap3/bin/cap3
    Checking MIRA:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../mira_v4.0/bin/mira

#################################################################################################
############################################# UTILS #############################################
#################################################################################################

Checking perl scripts...

    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_GeneratePileup.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_DOMINO_dependencies.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerSliding.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_PrintExcel.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerValidate.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerClusterize.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerOverlap.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_runSPAdes.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_runMIRA.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_ContigStats.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerDiscovery.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_Assembly_v1.1.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_Clean_v1.1.pl syntax OK
    Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_MarkerScan_v1.1.pl syntax OK

and the files requested:

202012110739_Mapping_ERROR.txt
202012110739_Markers-Parameters.txt
DOMINO_dump_information.txt
slurm-39273687.txt
reference_berm9.rev.1.bt2.gz
reference_berm9.1.bt2.gz
reference_berm9.4.bt2.gz
reference_berm9.rev.2.bt2.gz
reference_berm9.2.bt2.gz
contigs_berm9_length.txt.gz
reference_berm9.3.bt2.gz
reads_bermCTAB-reference_berm9_mapping_logfile.txt.gz
reads_bermCTAB-reference_berm9_logfile.txt.gz
index_genome_reference_berm9.success.gz
reference_berm9-taxa_bermCTAB.sam.gz
mapping_bermCTAB.failed.gz
mapping_ref_berm9.success.gz

Im also in contact with the IT people at the supercomputer, seems to be that I have also a perl5 issue, and I commented them about bowtie misbehaving.

Cheers,
Cristóbal

`

@JFsanchezherrero
Copy link
Member

Hi there,

I think I have found it and fix it.

The problem was not related to bowtie. There was a problem the way perl rounds float numbers within DOMINO. We will split CPUs provided to maximize and optimize jobs to run. We used to split CPUs provided and generated a number of CPUs to use by doing number_CPUs/number_species_to_map but we did not take into account if value is <0.5. Than 0 CPUs were provided for each job.

I have updated the code for DM_MappingReads.pl script. You should update your version too by doing:

git pull https://github.com/molevol-ub/DOMINO.git
cp src/perl/scripts/DM_MappingReads.pl bin/scripts/DM_MappingReads.pl

Give it a try and let me known how it works.

P.S. If you need further help to update the code, let me know.

@civanovich-senck
Copy link
Author

Hi José,
So after updating the script, I runned DOMINO over the weekend. Seems to be that bowtie is working fine, but I see errors flag when calling samtools. And no markers discovered.

Attached are some files of the last run, and a ss of the cpu usage stats. Seems to be that DOMINO doesnt handle job division by nodes?

ss_uso_cpu

Cheers,
Cristóbal

202012121443_Mapping_ERROR.txt
202012121443_Markers-Parameters.txt
DOMINO_dump_information.txt
slurm-39296113.txt

@JFsanchezherrero
Copy link
Member

Dear Cristobal,

Sorry for the delay. I have been very busy lately and I haven't checked it already.

We have limited funds so far for this project and it is difficult to solve issues. I would try to have a look in the following days.

Best wishes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants