Step 1: Making separate bed files for the intronic & exonic regions
- Go to UCSC table browser.
- Fill out the options as I did below, then select get output
- Then a second page will appear where you select “Introns plus” and leave as 0 to just get the introns
- select “get BED”
- then repeat, renaming the file to UCSC_exons_hg19.tsv and on the second page selecting “Exons plus”
- you can use the UCSC genome browser to confirm the files have only introns (or only exons)
Step 2: Extracting the specific regions from your bam file
This samtools command will only output alignments overlapping the input BED file.
The -h option = output file will include the header. The -L option = output alignments overlapping the input BED file.
Only output alignments overlapping introns:
samtools view -h -L UCSC_Introns_hg19_.bed sample.bam > sample_introns.bam
Only output alignments overlapping exons:
samtools view -h -L UCSC_exons_hg19_new.bed sample.bam > sample_exons.bam
The du (disk usage) command is a Unix/Linux command that estimates file space usage.
du -hsc *
lists the file and directory sizes of the current working directory.
- h = show sizes in human readable format (1K, 1M, 1G, …)
- s = summarize the totals for each argument
- c = display a grand total
For example running du -hsc * in my current directory lists all the sam files, their sizes, and the total size of the directory.
I’m running a HISAT2 alignment, so when I checked a few minutes later, I can see the sam files are bigger and closer to being done.
This is also an easy way to check if a file is empty or how file sizes compare to each other.
After reading this blog post about the benefits of science blogging, I figured I’d give it a shot. I know I can benefit from the opportunity to sort my ideas and learn by teaching. I plan to do some beginner bioinformatics tutorials and document my journey towards a PhD!