LSM2104 Main Page

LSM2104 Mini Project

LSM2104/AY0708Sem1-MiniProject

Purpose

Practise students in making powerpoint presentations of a specific topic
Practise students in researching a topic and self-learning
Give students a chance to face an audience, give a talk, answer questions and enhance their presentation skills in the process.

Task

Topic given will be related to the lecture itself
Prepare a powerpoint presentation by dividing the task among team members
Deliver the powerpoint presentation within 15 minutes (10 minutes for presentation and 5 minutes for Q&A) to the Practical Group you are in. e.g. A01 presents to Group A
Your team will engage in a 5 min verbal question and answer (Q&A) session where you will answer questions from fellow students and from Teaching Assistants etc
All students in the class will rate your presentation on a scale of 1 to 5

Grading

Class Presentation(CA3) - 10% of overall grade comprising

Presentation (5%) - marks will be peer reviewed
Submitted material (5%)

Peer Review

See All On Peer Review

LSM2104 Presentation Topics 2006-07 Sem1

Previous Year LSM2104Present06sem2

Week 1: 2006a_A01 2006a_B01 2006a_C01 2006a_D01

Topic Title : Scope of Bioinformatics and What is the relevance of Bioinformatics in Life Sciences?

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A01 -
B01 - Xie Chao
C01 - Li Hu
D01 - Rahul Thadani

Outline of all relevant topics & subject headings in bioinformatics
Provide their definitions and brief explanation of each of the topic
Compile and summarise a list of definitions of what bioinformatics is
Provide a scope of what bioinformatics covers.
For all relevant topics and subtopics of bioinformatics generally found in textbooks, write short descriptions.
Provide some hyperlinks to additional information sources and bibliographic citations
Short descriptions should not be copied wholesale from whatever research material used, and wherever material is copied, they should be properly cited with references, and wherever possible, a hyperlink included.
Short descriptions should also answer the questions:
- What is the relevance of Bioinformatics in the Life Sciences?
- And in particular, how does the subarea of bioinformatics relate to life sciences and biology?
Do not cover material in the subsequent topics, or else, you will be doing the homework of the subsequent groups.

Week 2: A02 B02 C02 D02

Title : Survey of Bioinformatics Databases

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A02 - Susan Moore
B02 - Susan Moore
C02 - Susan Moore
D02 - Rahul Thadani

Summarise what kinds of biological information can be found and in the process, outline the various different “rudimentary” ways of categorising or classification of Bioinformatics Databases
For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological databases (including those containing biological literature)
For the main bioinformatics databases, give a brief description of what they contain, what they do, how they aid and of what use they are to Life Science Research.
1. Outline in point form, with a brief explanation, the uses and applications of such databases in the study of life sciences and research
Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.
EXAMPLES
1. NAR Classification
  1. Nucl Acids Research Database Issue 2006
    1. Bateman, A. (2006) Nucleic Acids Res. 34(Database issue):D1.
    2. Galperin, Michael Y. The Molecular Biology Database Collection: 2006 update Nucl. Acids Res. 2006 34: D3-5
    3. http://www.oxfordjournals.org/nar/database/c/
    4. Additional Information http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3/DC1
    5. Categories List http://www.oxfordjournals.org/nar/database/cap/
    6. Alphabetic List http://www.oxfordjournals.org/nar/database/a/
2. InfoBioGen DBCat Classification
  1. DBCat http://www.infobiogen.fr/services/dbcat/
  2. http://www.infobiogen.fr/services/bdd/getdb_group.php?group=-1&lg=en
3. Other Classifications
  1. AcademicInfo
    1. http://www.academicinfo.net/biodata.html
  2. Amos Bairoch’s List
    1. http://au.expasy.org/links.html
  3. Pevsner’s List
  4. Suresh Kumar’s list
4. Database Maps
  1. Japanese KEGG http://www.genome.ad.jp/dbget/dbget.links.html
  2. European SRS http://srs.ebi.ac.uk/
  3. USA Entrez http://www.ncbi.nlm.nih.gov/Database/index.html
OPTIONAL WORK
1. Advanced Work (only for those very interested):
  1. From the survey of classification systems of databases, identify their weaknesses and limitations, ie. What you can do and what you cannot do to find information.
    1. Strengths and Weaknesses of each classification system
    2. Features and Limitations of each classification system
  2. Integration and improving the search capability of biological databases
    1. Propose ideas for integration
    2. Propose ideas for searching a database of biological databases
  3. From the survey of databases, propose a better classification system
2. Super Advanced work (only for those who intend to do graduate research):
  1. What is Ontology (non-philosophical definition)?
  2. How is the current situation of bioinformatics databases rudimentary in its organisation of information?
  3. How can we create an Ontology for bioinformatics databases? (e.g. Stanford’s Protégé software)
  4. How can we use an ontology to classify and group them
  5. What advantages will such an ontology have on the complexity of the life science information landscape?
  6. Propose how you can implement this ontology in the MySQL database on your APBioKnoppix2.

Week 3: A03 B03 C03 D03

Title : Survey of Bioinformatics Software and Web Services

(To be Opened earlier on 27 Jan 06 because of CNY)

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A03 - Ung Choong Yong
B03 - Ung Choong Yong
C03 - Li Hu
D03 - Kalyan

Many different types of software applications and packages of many software applications for bioinformatics and biology.
Many different types of webservices offering web interface to software applications and databases for bioinformatics and biology.
For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological software and webservices
For the main and popularly used ones, give a brief description of what they contain, what they do, how they aid life science research.
Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.

BioSoftware
- EBI DBCat classification
  - http://corba.ebi.ac.uk/Biocatalog/
  - http://www.ebi.ac.uk/Tools/
  - 2000 – dated.
- IUBio Archive
  - http://iubio.bio.indiana.edu/software/
- Other eclectic classifications
  - From the book Computational Methods in Molecular Biology
    http://www.cs.jhu.edu/~salzberg/appendixa.html
  - Netsci
    http://www.netsci.org/Resources/Software/Bioinform/
  - Bioinformatics.org
    http://bioinformatics.org/softwaremap/?form_cat=2
  - Suresh’s links
    http://www.geocities.com/bioinformaticsweb/toollink.html

BioWebservices
- Many types of web links available, including databases and tutorials
- Select web services for computation only.
- NAR Web Server Issue
  - 2005 July Issue
    http://nar.oxfordjournals.org/content/vol33/suppl_2/index.dtl#EDITORIAL
  - Joanne A. Fox, Stefanie L. Butland, Scott McMillan, Graeme Campbell, and B. F. Francis Ouellette (2005) The Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers. Nucl. Acids Res. 2005 33: W3-W24
    doi:10.1093/nar/gki594
  - Links Directory
    http://bioinformatics.ubc.ca/resources/links_directory/
  - Categories
    http://www.bioinformatics.ubc.ca/resources/links_directory/narweb2005/categorized.php

Others

Subclassification: Sequence Analysis (Optional)
- For sequence analysis software, there are further subclassifications.
- For example, EMBOSS itself is subclassified into groups of applications on a functional basis.
- Do a survey of software and web services for the class of “Sequence Analysis” including EMBOSS and other applications

Advanced: Integration and Organisation (Optional)
- With last groups presentation of databases, and this week’s presentation of software and web services, there is such an overwhelming range of resources
- Propose what ways of integration and organisation that can take place.
- The Resourceome (general)
  - Cannata N, Merelli E, Altman RB (2005) Time to Organize the Bioinformatics Resourceome. PLoS Comput Biol 1(7): e76
  - http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010076
- TAMBIS –Tao (by ontology)
  - An Ontology for bioinformatics application” by Patricia Baker, et al, that appeared in Bioinformatics, Volume 15, number 6, pages 510–520, 1999
  - http://www.ontologos.org/OML/..%5COntology%5CTAMBIS.htm (use Internet explorer)

Week 4: A04 B04 C04 D04

Title : Biological Sequence Comparison

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A04 - Ung Choong Yong
B04 - Justin Choo
C04 - Xie Chao
D04 - Muh Hon Cheng

What is the purpose of Biological Sequence Comparison?
1. List out the uses of sequence comparison for the biologist
2. In explaining biological sequence comparison, include the following key concepts/topics of the subject.
Global vs Local Alignment
1. List out and explain the differences between Global vs Local Alignment
  1. In which cases do we want to do global alignment instead of local alignment and vice versa
2. Tabulate examples of software applications that perform Global vs Local Alignment
3. Give specific names and the packages they come from
4. Give brief details of their distinguishing features
5. Briefly distinguish between the Smith-Waterman and Needleman-Wunsch methods
Visualisation of Sequence Comparison
1. What are Dotplots? How are dotplots constructed for visualisation of sequence comparison?
2. Give examples of dotplot software available from various packages or sources
3. Give brief details of the similar or different features of these applications
4. What the different types dotplots tell us about what kind of biological inferences we can make about the sequences being compared?
5. In which cases is it more advantageous to use Dynamic Programming than dotplots? - how detailed are dotplots?
6. What are the various parameters of Dotplot software, and what is their significance?
Substitution Matrices
1. Give a general definition of substitution matrix
2. Give some examples of substitution matrices and their historical background
3. Why do we need to use a substitution matrix as opposed to a unitary matrix?
4. What is the biological significance of the substitution matrix
5. (More details on substitutions matrices will be asked in the next week, so do not cover in greater detail than necessary and end up encroaching on next week's class presentation)
Dynamic Programming
1. What is its computing background and brief context in computer science?
2. How does DP work when applied to biological sequence comparison?
3. Explain the various stages of the DP algorithm as applied to sequence comparison.
4. What is the scoring mechanism of DP applied to biological sequence comparison?
5. Explain the parameters used in biological sequence comparison.
6. Explain what is the core idea in Dynamic Programming when examining the the difference between exact algorithmic methods vs heuristic methods in sequence comparison? (Only brief coverage on heuristic methods necessary as it will be covered next week)
7. (Do not cover BLAST details as this will be for the next week)

Week 5: A05 B05 C05 D05

Title : Biological Sequence Search

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A05 - Ung Choong Yong
B05 - Justin Choo
C05 - Muh Hon Cheng
D05 - Kalyan

Biological Sequence Search - concepts and topics
1. What is the difference between a sequence search and keyword search?
2. What is the search query in a sequence search?
3. In the sequence search, what is the search query searched against?
4. Typically what is the size of a sequence search query? Give examples e.g. a sequence from a single run in an automated DNA sequencer, small insert library sequence, a sequence melded from various contigs in a BAC, etc.
5. Typically what is the size of a database of sequences? Give examples of different databases commonly used in sequence searches.
Sequence Comparison in sequence searching
1. Parameters used in sequence comparison. Explain the following:
  1. gap penalty
  2. gap extension penalty
  3. k-tuple. What other synonyms are also used?
  4. How are they used in sequence searches?
2. Substitution Matrices in sequence comparison and sequence search
  1. Explain how substitution matrices bring biological significance into sequence alignments. In other words, why do we use substitution matrices?
  2. How are substitution matrices commonly used in sequence searches constructed?
  3. Give examples of such substitution matrices and their method of construction?
  4. What other substitution matrices can you find besides PAM and BLOSUM
  5. When constructing substitution matrices, we should avoid gaps in alignments. Why?
3. In a sequence search, what governs our choice of substitution matrices?
4. If we guess that homologs are more divergent, which substitution matrix should we choose? If we guess that homologs are less divergent, which substitution matrix should we use? Tabulate your recommendations as a guide to your choice whenever you are using sequence searches or comparisons.
Heuristic methods
1. What is the key constraint with exact dynamic programming methods in sequence searches?
2. Explain, why so many bioinformatics programs, particularly sequence search techniques, use heuristic methods to speed up the computation and how they reduce the search space.
3. What are the main differences between heuristic BLAST technique and the exact Dynamic Programming algorithm? Explain the main points that makes BLAST a heuristic technique.
BLAST and BLAST Flavours
1. List all the different BLAST programs. Tabulate the type of query, the queried database, the type of sequence compared (nucleotide or amino acid sequence), the technique use and its difference. At a glance, one should be able to see, for example, what is the query sequence (nt or aa seq) used against what kind of database (nt or aa seq db) using what kind of comparison (aa vs aa or nt vs nt).
2. If you use tblastx to search a database, all the sequences in the database are translated into six frames. This is done for each of your search sequences. How long does it typically take, if you are using NCBI BLAST server or our National BLAST server or our NUS BLAST Server. If you have 100,000 sequences to search, can you think of any refinement to the procedure?
The BLAST output
1. In the BLAST result output, what are the key aspects of information given?
2. How are the hits ranked?
3. Explain the colour scheme of NCBI BLAST alignment visualisation?
4. What happens if more than one local alignment comes from the same target sequence?
5. What is the meaning of percentage identity and percentage similarity?
6. The alignment shown for each blast hit, is it a global or a local alignment of the query sequence against the sequence in the database?
Substitution Matrices and BLAST Scores
1. How are BLAST Scores computed?
2. How does the choice of Substitution matrix affect the score?
3. What is the significance of the scores
4. How can differences in scoring system result in different alignments?
Statistical Significance and E values
1. How does one assess the statistical significance of a typical BLAST Search?
2. Is the E-value in BLAST a probability, strictly speaking? Hint: You can find an answer at the NCBI website.
3. How does the E-value help you in assessing the BLAST hits?
Making Sound Biological inferences from BLAST
1. If you want to find homologs to your query sequence, what are the most important items to observe in BLAST results?
2. Before you make an inference that a BLAST hit is a homolog of your query sequence what sort of criteria should be fulfilled typically?
3. In a BLAST hit alignment, before you can infer that the sub-region in your query sequence that matches another sub-region of a database sequence entry are homologs, what sort of criteria needs to be fulfilled?
4. How does low complexity regions in a query sequence affect the BLAST Search, and how can we avoid this problem?

Week 6: A06 B06 C06 D06

Title : Biological Patterns and Motifs

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A06 - Ung Choong Yong
B06 - Justin Choo
C06 - Muh Hon Cheng
D06 - Kalyan

Basic Concepts in Biological Sequence Patterns and Motifs
1. Consensus, Regular Expressions, Matrices and Profiles
2. SeqLogo
3. EMBOSS software cons
4. What are regular expressions?
5. Give examples of how a Prosite Pattern is constructed
6. Give examples of how Prosite Profiles are constructed
7. Give examples of how matrices are constructed
Databases of patterns and motifs and Software Tools in predicting patterns and motifs
1. Examples of DNA sequence pattern databases
  1. Describe the REBASE
  2. Describe the transcription factor database, TRANSFAC
2. Examples of Protein sequence pattern databases
  1. What is the Prosite database?
    1. Pattern Scanning Software e.g. ScanProsite
  2. PRINTS and BLOCKS databases
    1. What are they used for and what are the differences
  3. PFam and SMART database
    1. Distinguish between the two. What are they used for?
Sensitivity, Specificity, PPV
1. Simple examples of calculations
2. How do variations in the respective values of TP, TN, FP and FN affect these metrics
Making biological inferences from patterns and motifs
1. Common Pitfalls in making inferences

Week 7: A07 B07 C07 D07

Title : Multiple Sequence Alignments and Molecular Phylogenetics

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A07 - Ung Choong Yong
B07 - Ung Choong Yong
C07 - Justin Choo
D07 - Justin Choo

Multiple sequence alignment - extension of pairwise sequence comparison
Software tools for MSA
Evolutionary Theory and Molecular Phylogenetics
Tree of Life and molecular clocks
Software tools for phylogenetic inference
Dendrograms and visualisation tools
Key Concepts in molecular phylogenetics such as Neighbour Joining, Maximum Parsimony and Maximum Likelihood, Bootstrapping, etc.
Making biological inferences from MSA and phylogenetics

Week 8: A08 B08 C08 D08

Title : Folding problem and Fundamentals of Structural Biology and Visualisation

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A08 - Susan Moore
B08 - Susan Moore
C08 - Li Hu
D08 - Li Hu

The importance of protein structures
The different levels of protein structure (primary, secondary, tertiary, etc.)
The 20 naturally occurring amino acids: how they can be grouped (hydrophobic and hydrophilic; small and large; and special ones like glycine and proline)
Ramachandran plot
Secondary and super secondary structures
Factors involved in protein folding
Levinthal's paradox about protein folding
Old and new views of protein folding

Week 9: A09 B09 C09 D09

Title : Protein structure: databases, visualization, and classification

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A09 - Kalyan
B09 - Kalyan
C09 - Xie Chao
D09 - Xie Chao

Protein data bank (PDB): what are in the database, statistics of the database, rate of growth
Structure visualization: free software packages (Rasmol, Swiss PDB viewer (DeepView), Chime, YASARA, MolMol, etc).
The hierarchy of SCOP database and how they are defined
1. Class
2. Fold
3. Superfamily
4. Family
The hierarchy of CATH database and how they are defined: class, architect, topology, homologous superfamilies, and sequence families

Week 10: A10 B10 C10 D10

Title : Protein Structural Modelling and Prediction

TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.

A10 - Muh Hon Cheng
B10 - Rahul Thadani
C10 - Muh Hon Cheng
D10 - Rahul Thadani

Relationship between sequence identity and structure similarities; safe zone and twilight zone
Homology protein modeling
De novo modeling
CASP
Current status of protein structure prediction
Structure genomics and Protein Structure Initiative (PSI)

LSM2104

Contents

LSM2104 Main Page

LSM2104 Mini Project

Purpose

Task

Grading

Peer Review

LSM2104 Presentation Topics 2006-07 Sem1

Week 1: 2006a_A01 2006a_B01 2006a_C01 2006a_D01

Week 2: A02 B02 C02 D02

Week 3: A03 B03 C03 D03

Week 4: A04 B04 C04 D04

Week 5: A05 B05 C05 D05

Week 6: A06 B06 C06 D06

Week 7: A07 B07 C07 D07

Week 8: A08 B08 C08 D08

Week 9: A09 B09 C09 D09

Week 10: A10 B10 C10 D10

Navigation menu

Page actions

Page actions

Personal tools

Quick Navigation

Tools

Search