Skip to content

Umesh-JNU/DiscountPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiscountPy : k-mer counting tool

Jupyter PyPI - Python Version

GitHub license GitHub release GitHub tag GitHub commits Python 3.6+

GitHub stars GitHub forks Pull Requests Badge GitHub issues

Contents:

Requirements

  1. DiscountPy is completely Python based. To run it on your machine, python >= 3.9 is required.
  2. Poetry is a tool for dependency management and packaging in Python. Make sure to install it.
  3. Try to run all the commands in Powershell

Setup

  1. Installation: You can download the zip file or clone it.

  2. Setting up: First install all the dependencies and create virtual environment. To do so, run the following commands in workspace terminal.

    poetry install
    poetry update
    
  3. Now configure python interpreter. For configuring, first get the env path. To get the env information, run the following command.

    poetry env info
    
    • Or to know only path, run

      poetry env info --path
      

To know more about poetry, follow Poetry

Now DiscountPy is ready to be run.

Some codes

DiscountPy is a k-mer counting tool, it gives you three orderings for counting the k-mers.

  • -k : Length of the k-mer
  • -m : Width of the minimizers
  • -f : Input dataset (.fasta)
  • -o : Order (lex | freq)
  • --minimizers : Universal minimizer set

How to use?

  1. To get the hashed super-mers with minimizers

    • By lexicographically ordered

      discount -k 28 -o lex -f data/SRR094926.fasta 
      

      or

      discount -k 28 -m 10 lex -f data/SRR094926.fasta
      
    • By frequency ordering

      discount -k 28 -f data/SRR094926.fasta
      

      or you can skip the -o in frequency order as default value is -o freq

      discount -k 28 -o freq -f data/SRR094926.fasta
      
    • By universal frequency ordering

      discount -k 28 -f data/SRR094926.fasta --minimizers PASHA/pasha_all_28_10.txt
      

      or

      discount -k 28 -o freq -f data/SRR094926.fasta --minimizers PASHA/pasha_all_28_10.txt
      
  2. To generate file of the hashed super-mers:

    discount -k 28 -o freq -f data/SRR094926.fasta --minimizers PASHA/pasha_all_28_10.txt --output output/xxx.txt
    
  3. At finally getting the counts of the k-mers:

    • You have to sort the above generated file externally and input that file.

      discount -k 28 -f sortedXYZ.txt --count directory/to/counted-kmer-file
      

Query

Ask Me Anything !