Skip to content

bilille/miRkwood

 
 

Repository files navigation

SYNOPSIS

    miRkwood is an application that allows for the fast and easy identification of microRNAs. It is specifically designed for plant microRNAs.


INSTALL

    See file miRkwood_installation.md.


USAGE

    miRkwood comes in two distinct pipelines, according to the input data type.

    -mirkwood.pl (abinitio pipeline): scans a genomic sequence and finds all potential microRNA precursors.
        Input: a FASTA file.

    -mirkwood-bed.pl (smallRNAseq pipeline): analyses small RNA deep sequencing data and find all potential microRNAs.
        Input : a BED file.


OPTIONS

    -mirkwood.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood.pl [options]
          Mandatory options:
            --input
                Path to the fasta file.

            --output
                Output directory. If non existing it will be created. The directory
                must be empty.

          Additional options:
            --both-strands
                Scan both strands.

            --species-mask
                Mask coding regions against the given organism.

            --shuffles
                Compute thermodynamic stability (shuffled sequences).

            --filter-mfei
                Select only sequences with MFEI < -0.6.

            --filter-rrna
                Filter out ribosomal RNAs (using RNAmmer).

            --filter-trna
                Filter out tRNAs (using tRNAscan-SE).

            --align
                Flag conserved mature miRNAs (alignment with miRBase + miRdup).

            --varna
                Allow the structure generation using Varna.

            --help
                Print a brief help message and exits.

            --man
                Prints the manual page and exits.


    -mirkwood-bed.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood-bed.pl [options]
          Mandatory options:
            --input
                Path to the BED file (created with our script mirkwood-bam2bed.pl).

            --genome
                Path to the genome (fasta format).

            --output
                Output directory. If non existing it will be created. The directory
                must be empty.

          Additional options:
            --shuffles
                Compute thermodynamic stability (shuffled sequences).

            --align
                Flag conserved mature miRNAs (alignment with miRBase + miRdup).

            --no-filter-mfei
                Don't filter out sequences with MFEI >= -0.6. Default : only keep
                sequences with MFEI < -0.6.

            --mirbase
                If you have a gff file containing known miRNAs for this assembly,
                use this option to give the path to this file.

            --gff
                List of annotation files (gff or gff3 format). Reads matching with
                an element of these files will be filtered out. For instance you can
                filter out CDS by providing a suitable GFF file.

            --no-filter-bad-hairpins
                By default the candidates with a quality score of 0 and no
                conservation are discarded from results and are stored in a BED
                file. Use this option to keep all results.

            --min-read-positions-nb
                Minimum number of positions for each read to be kept. Default : 0.

            --max-read-positions-nb
                Maximum number of positions for each read to be kept. Default : 5
                (reads that map at more than 5 positions are filtered out).

            --varna
                Allow the structure generation using Varna.

            --help
                Print a brief help message and exits.

            --man
                Prints the manual page and exits.


OUTPUT

    For both pipelines:

        alignments : folder containing all alignments files
            (only if option --align is on).

        images: folder containing images created by VARNA
            (only if option --varna is on).

        results: folder containing all results files, in several 
            formats (csv, fa, gff, html and txt).

        sequences: folder containing sequences for each candidate 
            in fasta and dotbracket format, alternatives sequences 
            if they exist and optimal structure if it is different 
            from the stemloop structure.

        YML: folder containing all candidates data in YAML format.

        basic_candidates.yml: contains a summary of all candidates
            with basic informations (this file is needed to create
            the results files).

        log.log: log file (hey, what did you expect?)

        run_options.cfg: config file with the chosen options.

    ab initio pipeline only:

        masks: folder containing results of BlastX, rnammer and tRNAscan-SE.

        input_sequences.fas: your sequences.

    smallRNAseq pipeline only:

        read_clouds: folder containing all text files for the candidates 
            read clouds.

        bed_sizes.txt: tabulated file with the number of reads in each BED file.

        summary.txt: contains a summary of your options and of results.


        Depending on the options you chose for your job you may find 
        some of the following files:

            your_bed_your_GFF.tar.gz: a compressed BED containing all reads matching
                to features from your GFF file, for each GFF file that you
                provided.

            your_bed_multimapped.tar.gz: a compressed BED containing all reads from your 
                input BED file mapping at less than --min-read-positions-nb positions
                or more than --max-read-positions-nb positions.

            your_bed_miRNAs.tar.gz: a compressed BED containing all reads from your 
                input BED file corresponding to miRNAs present in miRBase.

            your_bed_orphan_clusters.tar.gz: a compressed BED containing all reads from your 
                input BED file that fall into a peak but that don't correspond to
                a valid miRNA candidate.

            your_bed_orphan_hairpins.tar.gz: a compressed BED containing all candidates
                with a quality score of 0 and no conservation. By default 
                these candidates are excluded from final results, but you can
                change this behaviour with flag option --no-filter-bad-hairpins.

            your_bed_filtered.bed: a BED containing all reads from your 
                input BED file that have not been filtered out in one of the
                previous categories.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 54.1%
  • HTML 22.7%
  • Python 7.7%
  • PHP 7.2%
  • JavaScript 2.5%
  • Shell 2.0%
  • Other 3.8%