Command-line basics

 

 

 

 

LINUX Command line basics for Bioinformatics Analysis 

  •  $pwd  (Print working directory ) – shows you which directory you are in.
  • $ls or $ll  (Lists) – lists all the files and directories in working directory
    • $ls -l 
    • ls *.fastq – list all fastq files 
  • $history -prints the last 1000 commands in the history
    • $history> history.txt
  • $cd (change directory) – changes the working directory
    • $cd <location> – takes you into directory located
    • $cd .. – takes you one step back directory
      • $cd .. .. ..  – takes you three steps back directory
  • $mkdir -makes new directory in the current directory
    • $mkdir new_dir
  • $touch– creates new file inside of working directory
    • $touch new_file
  • $cp– copies files
    • $cp -r – copies directory
    • $ cp *.fastq – copy all fastq files 
  • *– wild card, selects everything in the specified format
    • *.gb – selects all the GenBank files
  • > -redirects
  • >> -apprends
  • | -pipes
  • $ mv – renames the file or directory name
    • $ mv old_name new_name 
  • $ rm – removes files
    • $ rm -r – removes directory
  • $ rmdir –removes directory
  • -r – option selects for directories
  • -a – option
  • -l – list everything in long format
  • -t– orders file and directories by the time they were last modified
  • -alt – altogether format of -a -l -t
  • -c –  count
  • $ ssh – allows you to log into and work on a remote computer, which can be located anywhere in the world
  • $ scp – (secure copy)- copies the files or directories from a computer to remote computer
    • $ scp my_file.txt gizemlevent@example.com:/scratch/user/transfered_files 
  • $ cut -extract the specified lines
    • $ cut -d “.” -f4 | sort | uniq | wc -l 
    • $ cut -f2,3,4 example.txt > example_subset.txt
    • $ cut -d” ” -f1-6 table.txt
  • $ head  – prints the beginning of the file
    • $head -200 -prints the first 200 lines
  • $ tail  – prints the end of the file
    • $ tail -1000 -prints the last 1000 lines
  • $ cat – prints the whole input of the file
    • $ cat hello.txt  -prints the input of hello.txt
    • $ cat R1.fastq | tail -10000000 > last10mil_R1.fastq
  • $ zcat – unzip the file
  • stdin – standard input
  • stderr -standard error
  • stout – standard output
  • $ sort -sorts the files or directories
    • $ sort hello.csv> sorted_hello.cvs
  • $ grep – searches what you are looking for
    • $ grep “linux” -searched all the linux words in the file
    • $ grep -c hello hello.fasta
      • $grep -c “>” contigs.fasta – counts sequences 
    • $ grep -e hello -e Hello hello.fasta
    • $ grep -v “>” gene.fasta | head -7 > edited_gene.fasta
    • $ grep -v “>” genes.fasta | tail -8 | cat
    • $ grep “>” genes.fasta | sed ‘s/>//’ > gene_names.txt
    • $ grep -A 1 “>geneA” genes.fasta > geneA.fasta
    • $ grep -B 5 “>geneA” genes.fasta > geneA.fasta
  • $ uniq – removes the duplicates only if they are adjacent (uniq doesn’t work without sort)
  • wc -( word count), counts the words
  • $ nano – prints the whole input of the file in nano platform
  • $ sed – finds and replaces the words or characters in the file
    • $ sed s/<old_name>/<new_name>/g  this_file > edited_file 
    • $ sed ‘s/<old_name>/<new_name>/g’ example.fasta > edited_example.fasta
  • $ gunzip – unzips zip files
    • $ gunzip file.gz 
    • $ gunzip -c my_sequence.fasta.gz > new_my_sequence.fastq – unzips and zips back the file to save time
  • $ gzip -zips files
    • $ gzip file.fastq
  • $ module 
    • $ module spider – searches for the module
      • $ module spider bwa 
    • $ module load -loads the module
    • $ module unload -unloads the module
    • $ module avail -shows available modules
  • $ wget -helps to download a file from the web link
  • $ bsub – submits the job
    • $ bsub < <job_file>
  • $ bjobs -lists the running job ids and other information
  • $ bkill – kills the running job with specified ID
    • $ bkill <job_id>
  • $ bhist shows amount of time that job waits in queue, running etc. 
  • $ nohup -runs the process even you log out 
    • $ nohup scp ..
    • $ nohup gunzip ..
  • $ showquota -shows memory available
  • $ dos2unix .fasta or .txt to convert CRLF characters to UNIX format  
  • $ awk ‘BEGIN{sum=0;}{if(NR%4==2){sum+=length($0);}}END{print sum;}’ sequences.fastq.qz – This will count the zipped FASTQ files 
  • $ echo -n “$file,”; awk ‘BEGIN{sum=0;}{if(NR%4==2){sum+=length($0);}}END{print sum;}’  $file; done – to count bases and save in file for FASTQ files 
  • $ echo -n “$file,”; zcat $file | awk ‘BEGIN{sum=0;}{if(NR%4==2){sum+=length($0);}}END{print sum;}’; done -fto count bases and save in file for FASTQ.gz files 

 

Leave a Reply