Skip to content

Short 2

Andrea Telatin edited this page Oct 5, 2020 · 8 revisions

Text files and bioinformatics

The cat command (for concatenate) will print the content of one (or more) text file(s).

Let's start locating some ".txt" files. First, we can move to the "course" directory we created in our home:

cd ~/course

From there, we can use find to locate files in a specific path (that will be the learn_bash directory, again in our home), using different criteria like wildcards to filter for filenames:

find ../learn_bash/ -name "*.txt"

Choose one of those files and try cat. For example:

cat ../learn_bash/files/introduction.txt

If the file is long, we don't want to flood our terminal with the whole thing. Sometimes a preview of the first (or last) lines will be enough (by default 10 lines, use -n NUMBER to specify otherwise). The head and tail commands will do this:

head ~/learn_bash/files/introduction.txt
tail -n 3 ~/learn_bash/files/introduction.txt

Extracting matching lines

The grep command will only print lines matching a pattern. For example:

grep Darwin ~/learn_bash/files/introduction.txt

will print one line, while:

grep Newton ~/learn_bash/files/introduction.txt

will print none

Counting lines

The wc command returns how many lines, words and characters are present in a text file:

wc ~/learn_bash/files/introduction.txt

To only print the number of lines:

wc -l ~/learn_bash/files/introduction.txt

Extract sequence headers

Try this exercise alone:

  1. Locate files in the learn_bash directory, that end with "faa" (stands for FASTA amino acidic)
  2. Use cat to print the content of one of such files
  3. Use wc to detect the number of lines

Finally, to extract just the sequence headers:

grep '>' ~/learn_bash/phage/vir_cds_from_genomic.fna

Interactive viewer

The less command will display a text file for interactive visualization, and behaves like the man command we used before. Try:

less ~/learn_bash/phage/vir_genomic.gff 

If you want to avoid word-wrap and prefer to keep the lines intact:

less -S ~/learn_bash/phage/vir_genomic.gff 

If you want to increase the space of tabs:

less -S -x 20 ~/learn_bash/phage/vir_genomic.gff 

Homeworks

Install, or ask to have installed, the program Visual Studio Code.

Further reading

Menu

Clone this wiki locally