Getting Prepared for Analysis

Note: make sure your username on Mac/Windows and your filestructure do not contain spaces.
Spaces in your PATH will cause errors. If you need to, create a new user on your PC/Mac before you start programming


The first time you perform RNA-Seq you will need to download a slew of programs, packages, and files, enabling you to perform the analysis. This short walkthrough will help you prepare your digital workspace in a streamlined manner. For example, you can find all bioconductor install.packages() commands here, rather than searching every single one. This walkthough will take you through:


  1. Setting up Linux/Unix for Mac or Windows
  2. Downloading and Installing Anaconda with python.
  3. Setting up bioconda
  4. Downloading and installing RStudio
  5. Downloading, installing, and running FastQC
  6. Downloading, installing, and running TrimGalore
  7. Downloading, installing, and running kallisto
  8. Downloading, installing, and running tximport
  9. Downloading, installing, and running DESeq2
  10. Downloading all required anaconda libraries
  11. Downloading all required R packages


Setting up your environment

Mac: You already are running on Linux/Unix so you just need to open up a terminal with command+space and type Terminal then press enter.

Windows: You will need to download WSL (Windows Subsystem for Linux) and install Ubuntu, or your linux distribution of choice (we recommend Ubuntu for Windows). Follow the instructions below or follow along to this youtube video.

  1. As of Windows 10, support for WSL is included in PowerShell, which makes downloading WSL really easy:
  2. click the windows icon key and search for “PowerShell”. right-click on it.
  3. Choose “Run as Administrator”
  4. type
wsl --install
  1. follow instructions for installation.

Alternatively:

  1. Click on the windows icon
  2. Type features and click on Turn Windows features on or off
  3. Scroll down to find Windows Subsystem for Linux and check the box. click ok.
  4. Restart your computer.
  5. Click the windows icon, search microsoft store and click the microsoft store.
  6. Search Ubuntu and download/install Ubuntu.
  7. Click the windows icon, navigate to U and you should see Ubuntu
  8. Click on Ubuntu to open the ubuntu terminal, and it will say installing...
  9. The installation could take up to 45 minutes or and hour.
  10. Enter a username and password. you are now in the Linux environment
  11. At any time you can open command line and input bash to run Linux on WSL.


Download Anaconda

For Windows/Linux users: follow the instructions here: (https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da). Make certain that you are installing the LINUX version, and install it from inside WSL.

otherwise

Download Anaconda here and install it.

This will install:
1) Anaconda
2) Python
3) And you can also choose to install Jupyter Notebooks/Spyder IDE, which is recommended.

You can check your version of python in bash/Linux by typing python. This also enters the python environment on the command line. exit by typing exit().


Set up Bioconda

Start WSL in a new command window by typing:

wsl

Set up Bioconda by typing in bash:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Then run updates to update all available packages

conda update --all

If you need the instructions, the website instructions are here


Install RStudio

Download RStudio and install it.

Run R Studio as administrator when downloading packages.

If you are having trouble downloading packages into the correct library, You can run RStudio as an administrator on windows by right clicking the RStudio icon and running as administrator.


Install Java

some programs like FastQC depend on Java. to check if you have java, on bash or command line type java -version. Install java here. once java is installed, check it is installed correctly by again typing java -version.


Install FastQC

Download FastQC here and install it by unzipping the downloaded file. On windows, you can unzip files by right clicking, and using Windows Explorer to open the file. Move the application folder to a new location, and it is now unzipped.

Open the FastQC GUI by following the instructions from the website:

Windows: Simply double click on the run_fastqc bat file. If you want to make a pretty shortcut then we’ve included an icon file in the top level directory so you don’t have to use the generic bat file icon.

MacOSX: Double click on the FastQC application icon.


Create a new conda environment

Notice that to the left of your cursor, the line begins with (base) or something like it. This is telling you what environment you are in. You want to make a new environment for each workflow you do on your computer, so that the packages you download are specific to that environment, and don’t contaminate the global environment in case versions of packages are not compatible with the version of python you have, for example. To create a new conda environment, run:

conda create --name RNA


Initialize the conda environment

And every time you open bash to perform this RNA-Seq analysis workflow in a new session, you’ll want to initialize this environment to run the packages you download here.

conda activate RNA


Install TrimGalore

TrimGalore depends on two packages which you’ll need to install first: cutadapt and fastqc. You can download Fastqc again, this time in your conda environment, for optionally running fastqc following TrimGalore

conda install -c bioconda cutadapt
sudo apt install fastqc

sometimes FastQC does not install unless you update Anaconda, and update all packages. If you cannot install FastQC, attempt to update all your programs and packages first.

Check versions to make sure they installed correctly

cutadapt --version 
fastqc -v

Install TrimGalore:

Try:

conda install trimgalore

If this doesn’t work, you can try and extract the tarball and install TrimGalore yourself.

curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/0.6.5.tar.gz -o trim_galore.tar.gz
tar xvzf trim_galore.tar.gz

you can extract tarballs in bash using administrator permissions (if needed) by running:

sudo tar -xvzf /mnt/c/PATH/TO/TAR-FILE/Desktop/FILE-NAME.tar.gz -C /mnt/c/PATH/TO/DESTINATION/FOLDER


Install Kallisto

Make sure you are in the correct conda environment by passing conda env list. the environment with the * next to it is the current environment. Change environments by typing conda activate environmentname.

conda install kallisto
kallisto


This concludes prep in the bash/linux/python environments

You’ll be prompted to install a lot of additional libraries during this process, make sure you download all of them and, at the end, it is good practice to update all the packages using conda update all.


RStudio and downloading R Packages

Open RStudio. Update R. For Mac, download the latest update, then open RStudio and it will automatically install. On Windows:

install.packages("installr")
library(installr)
updateR()

Next, install all of these packages:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.11")

install.packages("devtools")
install.packages("tidyverse")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("tximport")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("tximportData")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rhdf5")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GenomicFeatures")

install.packages("pacman")

## or install the source package from 
## https://cran.r-project.org/web/packages/pacman/index.html
## and install in R with 
## install.packages(path_to_file, repos = NULL, type="source")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")


This should download and install all of the packages you will need. If there is a package missing, please email and i will include it here.

You should now be ready to run Part 1 and Part 2 of the RNA-Seq Walkthrough.