Searching a JSON for gene-pathway matches and ‘pretty printing’ the resulting dictionary


Extracting exonic and intronic regions from a bam file


Step 1: Making separate bed files for the intronic & exonic regions


  • Then a second page will appear where you  select “Introns plus” and leave as 0 to just get the introns
  • select “get BED”
  • then repeat, renaming the file to UCSC_exons_hg19.tsv and on the second page selecting “Exons plus”
  • you can use the UCSC genome browser to confirm the files have only introns (or only exons)

Step 2: Extracting the specific regions from your bam file

This samtools command will only output alignments overlapping the input BED file.

The -h option = output file will include the header. The -L option = output alignments overlapping the input BED file.

Only output alignments overlapping introns:

samtools view -h -L UCSC_Introns_hg19_.bed  sample.bam > sample_introns.bam

Only output alignments overlapping exons:

samtools view -h -L UCSC_exons_hg19_new.bed sample.bam > sample_exons.bam

Linux Quick Tip

The du (disk usage) command is a Unix/Linux command that estimates file space usage.

The command:

du -hsc *

lists the file and directory sizes of the current working directory.

  • h = show sizes in human readable format (1K, 1M, 1G, …)
  • s = summarize the totals for each argument
  • c = display a grand total

For example running du -hsc * in my current directory lists all the sam files, their sizes, and the total size of the directory.


I’m running a HISAT2 alignment, so when I checked a few minutes later, I can see the sam files are bigger and closer to being done.


This is also an easy way to check if a file is empty or how file sizes compare to each other.

First blog post

After reading this blog post about the benefits of science blogging, I figured I’d give it a shot. I know I can benefit from the opportunity to sort my ideas and learn by teaching. I plan to do some beginner bioinformatics tutorials and document my journey towards a PhD!