COMP 4550: Bioinformatics: Biological Data Analysis

This course is an elective for the Data-centric Computing concentration.

The course is designed as an interdisciplinary advanced course for both Computer Science and Biology students in bioinformatics, and as a bridge between both disciplines.

This is an advanced course to provide students with the basis to perform their own analysis of high-throughput data using R and Bioconductor. Students, who succeed in this course, should be comfortable programming in R and be able to use available Bioconductor packages to analyse a variety of biological data such as expression data, high-throughput cell-based assay data and mass spectrometry protein data, and to use a variety of approaches available within the R environment, such as clustering, graphs, classification approaches, such as random forests and support vector machines, and enrichment analysis methods.

Lab

In addition to classes, this course has one structured laboratory session per week.

Prerequisites: Biology 3951 or COMP 3550, and Statistics 2500 or Statistics 2550, or permission of the course instructor.

Availability: This course is usually offered once per year, in Fall or Winter.

Course Objectives

This course provides students with the basis to analyse a variety of biological data within an integrated programming environment for data manipulation, calculation and graphical display. Students will learn to extract meaningful information from data generated by high-throughput experimentation. The course will introduce one such integrated programming environment and will explore the computational and statistical foundations of the most commonly used biological data analysis procedures.

In the introductory Bioinformatics course ( Computer Science 3550), students will have:

Understood the basis of bioinformatics methods, for example, how multiple sequences aligners actually construct the alignments, what steps are involved in the analysis of gene expression, what multiple testing correction is and how it is done;
Achieved basic Perl programming skills; and
Used online databases and computational tools.

On the other hand, in this advanced course, although some topics such as gene expression, enrichment analysis and proteomics are also covered, the students will be learning how to do the analysis on their own, that is, without relying on the existence of a graphical and friendly computer program that will do the required analysis by choosing the appropriate parameters and clicking on some buttons.

Representative Workload

Assignments and Project 25%
Lab Work and Quizzes 20%
In-class Exams 30%
Final Exam 25%

Representative Course Outline

Introduction to R and Bioconductor
Exploratory data analysis and hypothesis testing
Gene Expression data analysis
Mass Spectrometry Protein data analysis
Clustering and visualization
Machine learning: concepts and packages

Feature selection
Cross-validation
Multiclass problems
Ensemble methods
Bayesian methods

Graphs and Networks

Protein interactions
Pathways
Co-expression graphs

Biological Annotation
Gene set enrichment analysis

Labs

Students will perform hands on analysis of experimental biological data using mainly R and Bioconductor. Additional software that may be used includes Cytoscape.

R programming exercises
Exploratory data analysis: graphics/plots generation
Processing expression data
Processing proteomics data
Clustering data and cluster visualization
Data classification using supervised machine language
Using graphs for data visualization
Annotating data
Performing enrichment analysis
Introduction to Cytoscape

Notes

Students can receive credit for only one of Computer Science 4550 or Biology 4606.

Page last updated May 24th 2021

Computer Science
|
Faculty of Science