The data for this project was taken from this paper: Wu et al.
Please download the data in the directory here: Yeast Data
What this data contains is 4 RNA-Seq files (reads from just chromosome VII), a tar file containing a bowtie index for the latest assembly of yeast, the genome sequence, and an annotation gtf file.
You will need to untar the tar file with the command:
tar xvf sce_R64_1_1.tar
Please read more information on the options of tophat, check here: Tophat Manual
tophat -o sce_1n_R1_VII_sce_R64_1_1_tophat --no-coverage-search -no-novel-juncs -G sce_R64_1_1_ENSEMBL.gtf sce_R64_1_1 sce_1n_R1_VII.fastq tophat -o sce_1n_R2_VII_sce_R64_1_1_tophat --no-coverage-search -no-novel-juncs -G sce_R64_1_1_ENSEMBL.gtf sce_R64_1_1 sce_1n_R2_VII.fastq tophat -o sce_4n_R1_VII_sce_R64_1_1_tophat --no-coverage-search -no-novel-juncs -G sce_R64_1_1_ENSEMBL.gtf sce_R64_1_1 sce_4n_R1_VII.fastq tophat -o sce_4n_R2_VII_sce_R64_1_1_tophat --no-coverage-search -no-novel-juncs -G sce_R64_1_1_ENSEMBL.gtf sce_R64_1_1 sce_4n_R2_VII.fastq
Please read more information on the options of cuffdiff, check here: Cuffdiff Manual
cuffdiff -L 4n,1n sce_R64_1_1_ENSEMBL.gtf sce_4n_R1_VII_sce_R64_1_1_tophat/accepted_hits.bam,sce_4n_R2_VII_sce_R64_1_1_tophat/accepted_hits.bam sce_1n_R1_VII_sce_R64_1_1_tophat/accepted_hits.bam,sce_1n_R2_VII_sce_R64_1_1_tophat/accepted_hits.bam
Do you find any differentially expressed genes? These will correspond to those significant at an FDR of 0.05, by default. There should be 4 genes. They make this easy to putting a "yes" on the last column for genes that are significant.
grep yes gene_exp.diff
GO Terms, or "Gene Ontology" terms are a controlled vocabulary of gene functions that are described by GO annotations. Now that you have a list of genes, you can head to GOEAST web tool here: GOEAST. Select yeast (S. cerevisiae) from the species pull-down menu. I think you also need to select the Gene Symbol check box. The Step Optional button opens up numerous options. Let's use Hochberg (Benjamini-Hochberg) with an FDR of 0.05.
Bonus: If you want to take it further, you could re-do this project with the full dataset here: GEO link, although you should expect a decent runtime to do the full set.