Combining RAD Tag libraries from different Illumina runs

RETURN to CCG Ubuntu Directory

Table of Contents

Dec. 29, 2012

Comparing Trachemys Stacks results between the Nov and Dec 2012 Harvard Illumina runs

The first Illumina run of 93 Trachemys samples in Nov. shared a single HiSeq lane with many other samples(fish and flies) and resulted in very low coverage. Fortunately, Anna saved portions of the libraries at several stages of the RAD tag library contruction. In Dec. we sent a sample saved after the Pippen size selection stage to Harvard to be run alone on a single HiSeq lane. Almost all of the Dec. samples had twice the number of reads from Nov. While there were many more reads from the Dec run, the '-m 5' resulted in fewer total stacks than in the Nov. run (1,009,172 vs. 739,115), but when the data is filtered by various criteria; the Dec. run always returns more stacks and with deeper coverage (double digit vs. single digit).

Dec. 30, 2012

Combining the Dec. & Nov. Harvard Illumina data

I created a new Stacks project directory "Harvard_merged" and wrote a bash loop script that would combine the illumina samples from the Nov. and Dec. runs.

#! /bin/bash clear echo "starting script" topdir=/data/simison/Analyses/Rad_tag/Harvard_merged/samples dir1=/data/simison/Analyses/Rad_tag/Harvard_Dec_2012/samples dir2=/data/simison/Analyses/Rad_tag/Harvard_2012/samples for f in $dir1/*.fq_1 do outf=$topdir/`basename $f` echo $outf cat $f $dir2/`basename $f` > $outf done

I made two .sh files, one each for the foward (*.fq_1) and reverse (*.fq_2) reads and saved them to the /opt dir. I also made sure the files were executable:

$ chmod 755

To run the scripts, simply enter the name of the file and hit return:

$ /opt/

Dec. 31, 2012

Stacks run on merged data

With the additional reads, I was able to increase the filtering stringency during the Stacks

$ nice -n 19 -m 5 -M 3 -T 16 -B TrachDec92_radtags -b 1 -t -a 2012-29-12 -D "New samples from Harvard Dec. 2012" -o ./stacks -p ./samples/ven_M233245.fq_1 ...

Notice that I increased '-m' option from 3 to 5 ('-m' specifies a minimum number of identical, raw reads required to create a stack).