Main Page

From LsmWiki
Jump to navigationJump to search

Bioinformatics Modules for Life Sciences Teaching, NUS

Course Modules available

  • LSM2104
  • LSM3241
  • MDG5101
  • TA Resources
  • Please see User's Guide for usage and configuration help.
  • Formatting guide
  • HTML reference http://www.december.com/works/hcu/quickref.html
  • Online Tutorials on Bioinformatics
    All tutorial suites include a recorded tutorial (requires Flash), downloadable slides, handouts, and exercises. Some providers also offer Quick Reference Cards-handy cards to keep near the computer with reminders and tips about using the software.
    • UCSC Genome Browser Introduction; sponsored by University of California, Santa Cruz
    • UCSC Genome Browser Advanced Features; sponsored by University of California, Santa Cruz
    • Integrated Microbial Genomes (IMG); sponsored by the Joint Genome Institute
    • VISTA: Comparative Genomics tools; sponsored by University of California, Berkeley
    • SeattleSNPs; sponsored by University of Washington
    • Genome Variation Server; sponsored by University of Washington
    • OpenHelix

Mini Project

Purpose

  • Practise students in making powerpoint presentations of a specific topic
  • Practise students in researching a topic and self-learning
  • Give students a chance to face an audience, give a talk, answer questions and enhance their presentation skills in the process.


Task

  • Topic given will be related to the lecture itself
  • Prepare a powerpoint presentation by dividing the task among team members
  • Deliver the powerpoint presentation within 15 minutes (10 minutes for presentation and 5 minutes for Q&A) to the Practical Group you are in. e.g. A01 presents to Group A
  • Your team will engage in a 5 min verbal question and answer (Q&A) session where you will answer questions from fellow students and from Teaching Assistants etc
  • All students in the class will rate your presentation on a scale of 1 to 5


Grading

Class Presentation(CA3) - 10% of overall grade comprising

  1. Presentation (5%) - marks will be peer reviewed
  2. Submitted material (5%)


Peer Review

See All On Peer Review


Topics 2006-07 Sem1

Week 1: 2006b_A01 2006b_B01 2006b_C01 2006b_D01

Topic Title : Scope of Bioinformatics and What is the relevance of Bioinformatics in Life Sciences?

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A01 -
  • B01 - Xie Chao
  • C01 - Li Hu
  • D01 - Rahul Thadani


  • Outline of all relevant topics & subject headings in bioinformatics
  • Provide their definitions and brief explanation of each of the topic
  • Compile and summarise a list of definitions of what bioinformatics is
  • Provide a scope of what bioinformatics covers.
  • For all relevant topics and subtopics of bioinformatics generally found in textbooks, write short descriptions.
  • Provide some hyperlinks to additional information sources and bibliographic citations
  • Short descriptions should not be copied wholesale from whatever research material used, and wherever material is copied, they should be properly cited with references, and wherever possible, a hyperlink included.
  • Short descriptions should also answer the questions:
    • What is the relevance of Bioinformatics in the Life Sciences?
    • And in particular, how does the subarea of bioinformatics relate to life sciences and biology?
  • Do not cover material in the subsequent topics, or else, you will be doing the homework of the subsequent groups.

Week 2: A02 B02 C02 D02

Title : Survey of Bioinformatics Databases


TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A02 - Susan Moore
  • B02 - Susan Moore
  • C02 - Susan Moore
  • D02 - Rahul Thadani


  1. Summarise what kinds of biological information can be found and in the process, outline the various different “rudimentary” ways of categorising or classification of Bioinformatics Databases
  2. For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological databases (including those containing biological literature)
  3. For the main bioinformatics databases, give a brief description of what they contain, what they do, how they aid and of what use they are to Life Science Research.
    1. Outline in point form, with a brief explanation, the uses and applications of such databases in the study of life sciences and research
  4. Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.
  5. EXAMPLES
    1. NAR Classification
      1. Nucl Acids Research Database Issue 2006
        1. Bateman, A. (2006) Nucleic Acids Res. 34(Database issue):D1.
        2. Galperin, Michael Y. The Molecular Biology Database Collection: 2006 update Nucl. Acids Res. 2006 34: D3-5
        3. http://www.oxfordjournals.org/nar/database/c/
        4. Additional Information http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3/DC1
        5. Categories List http://www.oxfordjournals.org/nar/database/cap/
        6. Alphabetic List http://www.oxfordjournals.org/nar/database/a/
    2. InfoBioGen DBCat Classification
      1. DBCat http://www.infobiogen.fr/services/dbcat/
      2. http://www.infobiogen.fr/services/bdd/getdb_group.php?group=-1&lg=en
    3. Other Classifications
      1. AcademicInfo
        1. http://www.academicinfo.net/biodata.html
      2. Amos Bairoch’s List
        1. http://au.expasy.org/links.html
      3. Pevsner’s List
        1. http://pevsnerlab.kennedykrieger.org/wiley/index.html
        2. http://pevsnerlab.kennedykrieger.org/wiley/chapt1.htm
        3. http://pevsnerlab.kennedykrieger.org/wiley/chapt2.htm
      4. Suresh Kumar’s list
        1. http://www.geocities.com/bioinformaticsweb/data.html
        2. http://www.geocities.com/bioinformaticsweb/datalink.html
        3. http://www.geocities.com/bioinformaticsweb/speciesspecificdatabases.htm
    4. Database Maps
      1. Japanese KEGG http://www.genome.ad.jp/dbget/dbget.links.html
      2. European SRS http://srs.ebi.ac.uk/
      3. USA Entrez http://www.ncbi.nlm.nih.gov/Database/index.html
  6. OPTIONAL WORK
    1. Advanced Work (only for those very interested):
      1. From the survey of classification systems of databases, identify their weaknesses and limitations, ie. What you can do and what you cannot do to find information.
        1. Strengths and Weaknesses of each classification system
        2. Features and Limitations of each classification system
      2. Integration and improving the search capability of biological databases
        1. Propose ideas for integration
        2. Propose ideas for searching a database of biological databases
      3. From the survey of databases, propose a better classification system
    2. Super Advanced work (only for those who intend to do graduate research):
      1. What is Ontology (non-philosophical definition)?
      2. How is the current situation of bioinformatics databases rudimentary in its organisation of information?
      3. How can we create an Ontology for bioinformatics databases? (e.g. Stanford’s Protégé software)
      4. How can we use an ontology to classify and group them
      5. What advantages will such an ontology have on the complexity of the life science information landscape?
      6. Propose how you can implement this ontology in the MySQL database on your APBioKnoppix2.

Week 3: A03 B03 C03 D03

Title : Survey of Bioinformatics Software and Web Services

(To be Opened earlier on 27 Jan 06 because of CNY)

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A03 - Ung Choong Yong
  • B03 - Ung Choong Yong
  • C03 - Li Hu
  • D03 - Kalyan


  1. Many different types of software applications and packages of many software applications for bioinformatics and biology.
  2. Many different types of webservices offering web interface to software applications and databases for bioinformatics and biology.
  3. For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological software and webservices
  4. For the main and popularly used ones, give a brief description of what they contain, what they do, how they aid life science research.
  5. Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.
  • Subclassification: Sequence Analysis (Optional)
    • For sequence analysis software, there are further subclassifications.
    • For example, EMBOSS itself is subclassified into groups of applications on a functional basis.
    • Do a survey of software and web services for the class of “Sequence Analysis” including EMBOSS and other applications
  • Advanced: Integration and Organisation (Optional)

Week 4: A04 B04 C04 D04

Title : Biological Sequence Comparison

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A04 - Ung Choong Yong
  • B04 - Justin Choo
  • C04 - Xie Chao
  • D04 - Muh Hon Cheng


  1. What is the purpose of Biological Sequence Comparison?
    1. List out the uses of sequence comparison for the biologist
    2. In explaining biological sequence comparison, include the following key concepts/topics of the subject.
  2. Global vs Local Alignment
    1. List out and explain the differences between Global vs Local Alignment
      1. In which cases do we want to do global alignment instead of local alignment and vice versa
    2. Tabulate examples of software applications that perform Global vs Local Alignment
    3. Give specific names and the packages they come from
    4. Give brief details of their distinguishing features
    5. Briefly distinguish between the Smith-Waterman and Needleman-Wunsch methods
  3. Visualisation of Sequence Comparison
    1. What are Dotplots? How are dotplots constructed for visualisation of sequence comparison?
    2. Give examples of dotplot software available from various packages or sources
    3. Give brief details of the similar or different features of these applications
    4. What the different types dotplots tell us about what kind of biological inferences we can make about the sequences being compared?
    5. In which cases is it more advantageous to use Dynamic Programming than dotplots? - how detailed are dotplots?
    6. What are the various parameters of Dotplot software, and what is their significance?
  4. Substitution Matrices
    1. Give a general definition of substitution matrix
    2. Give some examples of substitution matrices and their historical background
    3. Why do we need to use a substitution matrix as opposed to a unitary matrix?
    4. What is the biological significance of the substitution matrix
    5. (More details on substitutions matrices will be asked in the next week, so do not cover in greater detail than necessary and end up encroaching on next week's class presentation)
  5. Dynamic Programming
    1. What is its computing background and brief context in computer science?
    2. How does DP work when applied to biological sequence comparison?
    3. Explain the various stages of the DP algorithm as applied to sequence comparison.
    4. What is the scoring mechanism of DP applied to biological sequence comparison?
    5. Explain the parameters used in biological sequence comparison.
    6. Explain what is the core idea in Dynamic Programming when examining the the difference between exact algorithmic methods vs heuristic methods in sequence comparison? (Only brief coverage on heuristic methods necessary as it will be covered next week)
    7. (Do not cover BLAST details as this will be for the next week)

Week 5: A05 B05 C05 D05

Title : Biological Sequence Search

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A05 - Ung Choong Yong
  • B05 - Justin Choo
  • C05 - Muh Hon Cheng
  • D05 - Kalyan
  1. Biological Sequence Search - concepts and topics
    1. What is the difference between a sequence search and keyword search?
    2. What is the search query in a sequence search?
    3. In the sequence search, what is the search query searched against?
    4. Typically what is the size of a sequence search query? Give examples e.g. a sequence from a single run in an automated DNA sequencer, small insert library sequence, a sequence melded from various contigs in a BAC, etc.
    5. Typically what is the size of a database of sequences? Give examples of different databases commonly used in sequence searches.
  2. Sequence Comparison in sequence searching
    1. Parameters used in sequence comparison. Explain the following:
      1. gap penalty
      2. gap extension penalty
      3. k-tuple. What other synonyms are also used?
      4. How are they used in sequence searches?
    2. Substitution Matrices in sequence comparison and sequence search
      1. Explain how substitution matrices bring biological significance into sequence alignments. In other words, why do we use substitution matrices?
      2. How are substitution matrices commonly used in sequence searches constructed?
      3. Give examples of such substitution matrices and their method of construction?
      4. What other substitution matrices can you find besides PAM and BLOSUM
      5. When constructing substitution matrices, we should avoid gaps in alignments. Why?
    3. In a sequence search, what governs our choice of substitution matrices?
    4. If we guess that homologs are more divergent, which substitution matrix should we choose? If we guess that homologs are less divergent, which substitution matrix should we use? Tabulate your recommendations as a guide to your choice whenever you are using sequence searches or comparisons.
  3. Heuristic methods
    1. What is the key constraint with exact dynamic programming methods in sequence searches?
    2. Explain, why so many bioinformatics programs, particularly sequence search techniques, use heuristic methods to speed up the computation and how they reduce the search space.
    3. What are the main differences between heuristic BLAST technique and the exact Dynamic Programming algorithm? Explain the main points that makes BLAST a heuristic technique.
  4. BLAST and BLAST Flavours
    1. List all the different BLAST programs. Tabulate the type of query, the queried database, the type of sequence compared (nucleotide or amino acid sequence), the technique use and its difference. At a glance, one should be able to see, for example, what is the query sequence (nt or aa seq) used against what kind of database (nt or aa seq db) using what kind of comparison (aa vs aa or nt vs nt).
    2. If you use tblastx to search a database, all the sequences in the database are translated into six frames. This is done for each of your search sequences. How long does it typically take, if you are using NCBI BLAST server or our National BLAST server or our NUS BLAST Server. If you have 100,000 sequences to search, can you think of any refinement to the procedure?
  5. The BLAST output
    1. In the BLAST result output, what are the key aspects of information given?
    2. How are the hits ranked?
    3. Explain the colour scheme of NCBI BLAST alignment visualisation?
    4. What happens if more than one local alignment comes from the same target sequence?
    5. What is the meaning of percentage identity and percentage similarity?
    6. The alignment shown for each blast hit, is it a global or a local alignment of the query sequence against the sequence in the database?
  6. Substitution Matrices and BLAST Scores
    1. How are BLAST Scores computed?
    2. How does the choice of Substitution matrix affect the score?
    3. What is the significance of the scores
    4. How can differences in scoring system result in different alignments?
  7. Statistical Significance and E values
    1. How does one assess the statistical significance of a typical BLAST Search?
    2. Is the E-value in BLAST a probability, strictly speaking? Hint: You can find an answer at the NCBI website.
    3. How does the E-value help you in assessing the BLAST hits?
  8. Making Sound Biological inferences from BLAST
    1. If you want to find homologs to your query sequence, what are the most important items to observe in BLAST results?
    2. Before you make an inference that a BLAST hit is a homolog of your query sequence what sort of criteria should be fulfilled typically?
    3. In a BLAST hit alignment, before you can infer that the sub-region in your query sequence that matches another sub-region of a database sequence entry are homologs, what sort of criteria needs to be fulfilled?
    4. How does low complexity regions in a query sequence affect the BLAST Search, and how can we avoid this problem?

Week 6: A06 B06 C06 D06

Title : Biological Patterns and Motifs

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A06 - Ung Choong Yong
  • B06 - Justin Choo
  • C06 - Muh Hon Cheng
  • D06 - Kalyan
  1. Basic Concepts in Biological Sequence Patterns and Motifs
    1. Consensus, Regular Expressions, Matrices and Profiles
    2. SeqLogo
    3. EMBOSS software cons
    4. What are regular expressions?
    5. Give examples of how a Prosite Pattern is constructed
    6. Give examples of how Prosite Profiles are constructed
    7. Give examples of how matrices are constructed
  2. Databases of patterns and motifs and Software Tools in predicting patterns and motifs
    1. Examples of DNA sequence pattern databases
      1. Describe the REBASE
      2. Describe the transcription factor database, TRANSFAC
    2. Examples of Protein sequence pattern databases
      1. What is the Prosite database?
        1. Pattern Scanning Software e.g. ScanProsite
      2. PRINTS and BLOCKS databases
        1. What are they used for and what are the differences
      3. PFam and SMART database
        1. Distinguish between the two. What are they used for?
  3. Sensitivity, Specificity, PPV
    1. Simple examples of calculations
    2. How do variations in the respective values of TP, TN, FP and FN affect these metrics
  4. Making biological inferences from patterns and motifs
    1. Common Pitfalls in making inferences

Week 7: A07 B07 C07 D07

Title : Multiple Sequence Alignments and Molecular Phylogenetics

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A07 - Ung Choong Yong
  • B07 - Ung Choong Yong
  • C07 - Justin Choo
  • D07 - Justin Choo


  1. Multiple sequence alignment - extension of pairwise sequence comparison
  2. Software tools for MSA
  3. Evolutionary Theory and Molecular Phylogenetics
  4. Tree of Life and molecular clocks
  5. Software tools for phylogenetic inference
  6. Dendrograms and visualisation tools
  7. Key Concepts in molecular phylogenetics such as Neighbour Joining, Maximum Parsimony and Maximum Likelihood, Bootstrapping, etc.
  8. Making biological inferences from MSA and phylogenetics

Week 8: A08 B08 C08 D08

Title : Folding problem and Fundamentals of Structural Biology and Visualisation

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A08 - Susan Moore
  • B08 - Susan Moore
  • C08 - Li Hu
  • D08 - Li Hu
  1. The importance of protein structures
  2. The different levels of protein structure (primary, secondary, tertiary, etc.)
  3. The 20 naturally occurring amino acids: how they can be grouped (hydrophobic and hydrophilic; small and large; and special ones like glycine and proline)
  4. Ramachandran plot
  5. Secondary and super secondary structures
  6. Factors involved in protein folding
  7. Levinthal's paradox about protein folding
  8. Old and new views of protein folding

Week 9: A09 B09 C09 D09

Title : Protein structure: databases, visualization, and classification

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A09 - Kalyan
  • B09 - Kalyan
  • C09 - Xie Chao
  • D09 - Xie Chao
  1. Protein data bank (PDB): what are in the database, statistics of the database, rate of growth
  2. Structure visualization: free software packages (Rasmol, Swiss PDB viewer (DeepView), Chime, YASARA, MolMol, etc).
  3. The hierarchy of SCOP database and how they are defined
    1. Class
    2. Fold
    3. Superfamily
    4. Family
  4. The hierarchy of CATH database and how they are defined: class, architect, topology, homologous superfamilies, and sequence families

Week 10: A10 B10 C10 D10

Title : Protein Structural Modelling and Prediction

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

  • A10 - Muh Hon Cheng
  • B10 - Rahul Thadani
  • C10 - Muh Hon Cheng
  • D10 - Rahul Thadani
  1. Relationship between sequence identity and structure similarities; safe zone and twilight zone
  2. Homology protein modeling
  3. De novo modeling
  4. CASP
  5. Current status of protein structure prediction
  6. Structure genomics and Protein Structure Initiative (PSI)