Bioinformatics-2014

Welcome to Bioinformatics

Contact:

Please send all correspondences relating to this course, as well as all completed computer labs to: [course@johanneslab.org]

Course aim:

The aim is to provide an introduction to the following important topics in Bioinformatics:

Jump to Lecture [1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9]

What can you find on this page:

This page provides information for each of the class lectures. It includes pdf versions of the lecture slides as well as links to download datasets and instructions for the computer labs. I am also providing supplemental material with each lecture, which is meant to reinforce the course material. More general online educational resources are listed at the bottom of this page. These include tutorials, webinars and presentations.

Jump to [General online and educational material]

Symposium:

At the end of this course you will give a presentation to the other students. In this presentation you will summarize an original scientific study that was highlighted in the News section of the  journals Nature or Science. Your task is to show how the concepts and/or tools employed in the study relate to one or more topic(s) covered in our lectures. These symposium presentations are an important component of this course, which is reflected in their contribution to your final grade.

Jump to [Symposium]

Study Guide:

In this course you will be exposed to a lot of new information. To help you study for the exam we are putting together a study guide that will highlight the most important topics.

Jump to [Study Guide]

Grading scheme:

Your final grade is obtained as follows:

We will try to keep you up-to-date on your current grades and status of your assignments. See spreadsheet below.

Grades-and-Assignments-2014

Lecture 1: Course overview/Genome projects

       

Date:

8-5-2014

Lecture slides:     

[pdf]

Computer lab:      

n/a

Lecturer(s):       

Frank Johannes | Marnix Medema

Summary:

In this lecture you will learn the key components of 'Genome Projects'. We will highlight these components in the context of the Human Genome Project. Finally, we will discuss ongoing genome projects in microbes, as microbial systems are an important focus of GBB.

Lecture 1: Supplemental material

Genome Browsers

[Esembl | NCBI | USCS]

Human Genome Project

[Summary | Detailed Timeline]

Human Genome announcement at the White House

[Video]

Publications of the initial working draft human genome sequence

[IHGSC-Nature-2001 | Venter-Science-2001]  

Lecture 2: Genome sequencing I

       

Date:

9-5-2014

Lecture slides:     

[pdf-part1] [pdf-part2][NGS-supplemental-slides+videos]

Computer lab:      

n/a

Lecturer(s):       

Frank Johannes | Marianna Bevova

Summary:

DNA sequencing is the most important component of Genome Projects. In this lecture you will learn the principles of Sanger sequencing (First Generation Sequencing) as well as newer methods such as massively parallel sequencing (Next Generation Sequencing).

Lecture 2: Supplemental material         

Principles of Sanger Sequencing

[Video#1][Video#2]    

Review of Next Generation Sequencing (NGS) Methods

[Metzker-NRG-2010]

Key companies producing NGS technologies

[Roche 454 | LifeTechnologies (2-color encoding) | Illumina | Pacific Biosciences]

Whole Genome Shotgun Sequencing in action

[Graig Venter's sorcerer 2 expedition]  

Lecture 3: Genome sequencing II

       

Date:

12-5-2014

Lecture slides:     

[pdf-part1][pdf-part2]

Computer lab:      

[pdf][data link]

Lecturer(s):       

Victor Guryev

Summary:

We discussed Next Generation Sequencing technologies (NGS) in the previous lecture. NGS produces millions of short sequenced reads. An important bioinformatic step is to quality-check these reads. It is important to remove any reads or samples that show experimental artifacts, which could stem from DNA preparation or from the NGS machine itself. Following this quality control step, short reads need to be aligned to a reference genome, which is a huge computational task. We will learn now this is done.

Lecture 2: Supplemental material         

[none]

Lecture 4: Detection of DNA sequence variation

       

Date:

13-5-2014

Lecture slides:     

[pdf-part1][pdf-part2][pdf-part3]

Computer lab:      

[pdf]

Lecturer(s):       

Frank Johannes | Victor Guryev

Summary:      

Having determined the DNA sequence of a single (reference) genome, it is of interest to identify and characterize DNA sequence differences between individuals or species. In this lecture you will hear about various types of DNA sequence variations, and how these can be detected using older array-based technologies as well as newer Next Generation Sequencing technologies. We will highlight several ongoing large-scale projects that focus on DNA sequence variation in populations, with particular emphasis on the Genome of the Netherlands (GO.NL).

Lecture 4: Supplemental material    

Large-scale projects to characterize DNA sequencing variation in various populations

 

Humans: [International HapMap Project | 1000 Genomes Project |

Human Variome Project | GO.NL (Genomes of the Netherlands) |

FarGen Project (Proposal to sequence the genomes of a whole population)]  

Plants: [1001 Genomes Project (Arabidopsis) | 1KP Plant Genomes Project (transcriptomes of 1000 plant species)]

Vertebrates: [10K Genomes Projects]

Microorganisms: [The 10k Microbial Genomes Project]

Micro-arrays for SNP detection

[LaFramboise-NAR-2009]

Lecture 5: Epigenomics I

       

Date:

14-5-2014

Lecture slides:     

[pdf-part1][pdf-part2][pdf-part3]

Computer lab:      

[pdf][data1][data2]

Lecturer(s):       

Frank Johannes | Maria Colomé-Tatché

Lecturer(s):   

Biological motivation

All cells in your body have the same DNA sequence (save a few somatic mutations). Nonetheless, there is vast functional divergence between cells from different tissues or time-points during development. This is achieved by turning genes on or off in a tissue-dependent or time-dependent manner. The systematic (genome-wide) study of this "functional organization" of the genome has become known as "epigenomics". We will discuss large-scale projects that have been initiated to construct comprehensive epigenomic maps in humans and model organisms.

Measuring DNA methylation

The methylation of cytosines is one important layer of the epigenome. We will learn about whole genome bisulphite sequencing (WGBS-seq), which is an NGS-based method for measuring whole methylomes at basepair resolution.

Analyzing DNA methylation

We will discuss simple analysis strategies that are commonly used to determine the methylation state of individual cytosines from WGBS-seq measurements.

Lecture 5: Supplemental material         

The mysterious epigenome: What lies beyond DNA?

[Video]

Epigenomics: Scientific background

[NCBI]

Epigenomics: centralized data deposit

[NCBI]

ENCODE Project information and resources

[Home (UCSC) | NHGRI | Ensembl | Nature ENCODE explorer]

A User's guide to ENCODE

[The ENCODE Project Consortium-PLoSBiology-2011]     

modENCODE Project information and resources

[Home | NHGRI]

Roadmap Epigenomics Program

[Home]

News article about 'ENCODE' and 'Roadmap Epigenomics Program'

[The Scientist Magazine 2012]

Lecture 6: Epigenomics II

       

Date:

15-5-2014

Lecture slides:     

[pdf-part1][pdf-part2][pdf-part3]

Computer lab:      

[BS-seq-pdf][BS-seq-data]

[ChIP-seq-pdf][ChIP-seq-data (posted on Nestor)]

Lecturer(s):       

Frank Johannes | Maria Colomé-Tatché

Summary: 

Measuring histone modifications

In this lecture you will learn about methods for measuring the chemical states of histone proteins. The major technique is called Chromatin Immunoprecipitation followed by sequencing (ChIP-seq).

Analyzing histone modifications (single epigenome)

ChIP-seq data obtained from a single epigenome is a common experimental starting point. You will learn about two major ways of analyzing this data.

Analyzing histone modifications (multiple epigenomes)

Often researchers are interested in comparing two or more ChIP-seq dataset with each other. For instance, it may be of interest to compare the histone states of cancer cells with those of normal cells. Epigenetic differences between these two cells types may provide clues about the causes of cancer.

Lecture 6: Supplemental material 

Review of ChIP-seq analysis

[Park-NRG-2009]       

Review of DNA methylation analysis

[Laird-NRG-2009]

Lecture 7: Network analysis

       

Date:

16-5-2014

Lecture slides:     

[pdf]

Computer lab:      

[pdf]

Lecturer(s):       

Pariya Berouzi

Summary:

The regulation of the genome is a complex process. We still know very little about it. However, we do know that many functional elements, such as genes, form interacting networks. That is, genes can be expressed together, or they are expressed in response to the expression of another gene, and so forth. Often it is important to infer (i.e. find) networks from a "sea of functional data". Ideally we would like analysis techniques that allow us to infer the direct and causal relationships between functional elements.  In this lecture we are going to study graphical models as a statistical tool to encode the relationships among various functional elements.  We will highlight the most common network analysis methods used in computational biology.

Lecture 7: Supplemental material         

[none]

Lecture 8: Protein alignment/Gene prediction/Databases

       

Date:

19-5-2014

Lecture slides:     

[part1][part2][part3]

Computer lab:      

[pdf][doc]

Lecturer(s):       

Peter Terpstra

Summary:

Knowledge of the DNA sequence allows us to know the sequence of proteins. Or alternatively, knowing the amino acid sequence of proteins we can map this back to the genic regions that produced the protein (and/or its homologs). This will help us to predict coding elements in raw DNA sequences and will start the genome ”annotation” process. Together with knowledge of (e.g.) start, stop and splice sites it will allow us to ”predict” genes based on sequence information alone. Knowledge about where the databases are that contain all this information is of course indispensable.

Protein alignment

We will discuss: methods of protein alignments; explain what protein substitution scoring matrices are; how to extract protein “profiles” from multiple alignments leading to protein families; the difference between “profiles” with “motifs” . Also we will briefly discuss how one could predict a protein secondary or 3D structure only from its amino acid sequence.

Gene prediction

We will discuss: layout of pro- and eukaryotic genes; homology and “ab initio” methods of predicting the location of protein coding genes in unannotated DNA sequences; what gene “signals” and “content” are; the differences in predicting genes for pro- versus eukaryote genes.

Database searches

This will be an introduction about databases that contain DNA/RNA/Protein sequence data and DNA variation, expression and metabolic data. Where to find them and how to use them.

Lecture 8: Supplemental material

List of helpful links to online tutorials, software and databases

[pdf]

Lecture 9: Big data handling

       

Date:

20-5-2014

Lecture slides:     

[pdf]

Computer lab:      

[zip-file]

Lecturer(s):       

Pieter

Summary:

Modern measurement technologies such as the Next Generation Sequencing methods discussed in this course generate a tremendous amount of molecular data. The storage, manipulation and dissemination of this data to end-users is a major bottleneck in gaining meaningful biological insights. In this lecture you will learn about ways to handle this so-called Big data.

Lecture 9: Supplemental material         

For additional reading material about this topic see the links and publications listed in the following ppt slide:

[ppt]

Symposium: Day 1

       

Date:

21-5-2014

Instructions: 

[pdf]

Groups presenting:

see spreadsheet below

Symposium-Groups-Day1-2014

Symposium: Day 2

       

Date:

23-5-2014

Instructions: 

[pdf]

Groups presenting:

see spreadsheet below

Symposium-Groups-Day2-2014

General online and educational material

Online training courses at EBI-EMBL

[go there]

Overview of online software and databases at EBI-EMBL

[go there]

Video tutorial on how to use Ensembl tools

[go there]

Training and tutorials at NCBI

[go there]

NCBI educational resources

[go there]

Educational resources at the National Genome Research Institute

[go there]

Website of USA PBS (Public Broadcasting Station) with interesting NOVA documentaries. Note: the PBS website will not show them, but most of these videos can be viewed at YouTube.

[go there]

Study guide questions

If you really KNOW the answer to the following questions, you should be able to perform well on the final exam.

Lecture 1: Genome Projects

Lecture 2: DNA sequencing I

Lecture 3: DNA sequencing II

Detection of DNA Sequence Variation

Lecture 5: Epigenomics I

Scientific background

Measuring and analyzing DNA methylation: MeDIP-chip

Measuring and analyzing DNA methylation: Bisulphite Sequencing

Lecture 6: Epigenomics II

Lecture 7: Network analysis

Lecture 8: Protein alignment, gene prediction, databases

Lecture 9: Big data handling

THE END