LSM2104
LSM2104 Main Page
LSM2104 Mini Project
Purpose
- Practise students in making powerpoint presentations of a specific topic
- Practise students in researching a topic and self-learning
- Give students a chance to face an audience, give a talk, answer questions and enhance their presentation skills in the process.
Task
- Topic given will be related to the lecture itself
- Prepare a powerpoint presentation by dividing the task among team members
- Deliver the powerpoint presentation within 15 minutes (10 minutes for presentation and 5 minutes for Q&A) to the Practical Group you are in. e.g. A01 presents to Group A
- Your team will engage in a 5 min verbal question and answer (Q&A) session where you will answer questions from fellow students and from Teaching Assistants etc
- All students in the class will rate your presentation on a scale of 1 to 5
Grading
Class Presentation(CA3) - 10% of overall grade comprising
- Presentation (5%) - marks will be peer reviewed
- Submitted material (5%)
Peer Review
LSM2104 Presentation Topics 2006-07 Sem1
- Previous Year LSM2104Present06sem2
Week 1: 2006a_A01 2006a_B01 2006a_C01 2006a_D01
Topic Title : Scope of Bioinformatics and What is the relevance of Bioinformatics in Life Sciences?
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A01 -
- B01 - Xie Chao
- C01 - Li Hu
- D01 - Rahul Thadani
- Outline of all relevant topics & subject headings in bioinformatics
- Provide their definitions and brief explanation of each of the topic
- Compile and summarise a list of definitions of what bioinformatics is
- Provide a scope of what bioinformatics covers.
- For all relevant topics and subtopics of bioinformatics generally found in textbooks, write short descriptions.
- Provide some hyperlinks to additional information sources and bibliographic citations
- Short descriptions should not be copied wholesale from whatever research material used, and wherever material is copied, they should be properly cited with references, and wherever possible, a hyperlink included.
- Short descriptions should also answer the questions:
- What is the relevance of Bioinformatics in the Life Sciences?
- And in particular, how does the subarea of bioinformatics relate to life sciences and biology?
- Do not cover material in the subsequent topics, or else, you will be doing the homework of the subsequent groups.
Week 2: A02 B02 C02 D02
Title : Survey of Bioinformatics Databases
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of
fact etc.
- A02 - Susan Moore
- B02 - Susan Moore
- C02 - Susan Moore
- D02 - Rahul Thadani
- Summarise what kinds of biological information can be found and in the process, outline the various different “rudimentary” ways of categorising or classification of Bioinformatics Databases
- For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological databases (including those containing biological literature)
- For the main bioinformatics databases, give a brief description of what they contain, what they do, how they aid and of what use they are to Life Science Research.
- Outline in point form, with a brief explanation, the uses and applications of such databases in the study of life sciences and research
- Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.
- EXAMPLES
- NAR Classification
- Nucl Acids Research Database Issue 2006
- Bateman, A. (2006) Nucleic Acids Res. 34(Database issue):D1.
- Galperin, Michael Y. The Molecular Biology Database Collection: 2006 update Nucl. Acids Res. 2006 34: D3-5
- http://www.oxfordjournals.org/nar/database/c/
- Additional Information http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3/DC1
- Categories List http://www.oxfordjournals.org/nar/database/cap/
- Alphabetic List http://www.oxfordjournals.org/nar/database/a/
- Nucl Acids Research Database Issue 2006
- InfoBioGen DBCat Classification
- Other Classifications
- AcademicInfo
- Amos Bairoch’s List
- Pevsner’s List
- Suresh Kumar’s list
- Database Maps
- Japanese KEGG http://www.genome.ad.jp/dbget/dbget.links.html
- European SRS http://srs.ebi.ac.uk/
- USA Entrez http://www.ncbi.nlm.nih.gov/Database/index.html
- NAR Classification
- OPTIONAL WORK
- Advanced Work (only for those very interested):
- From the survey of classification systems of databases, identify their weaknesses and limitations, ie. What you can do and what you cannot do to find information.
- Strengths and Weaknesses of each classification system
- Features and Limitations of each classification system
- Integration and improving the search capability of biological databases
- Propose ideas for integration
- Propose ideas for searching a database of biological databases
- From the survey of databases, propose a better classification system
- From the survey of classification systems of databases, identify their weaknesses and limitations, ie. What you can do and what you cannot do to find information.
- Super Advanced work (only for those who intend to do graduate research):
- What is Ontology (non-philosophical definition)?
- How is the current situation of bioinformatics databases rudimentary in its organisation of information?
- How can we create an Ontology for bioinformatics databases? (e.g. Stanford’s Protégé software)
- How can we use an ontology to classify and group them
- What advantages will such an ontology have on the complexity of the life science information landscape?
- Propose how you can implement this ontology in the MySQL database on your APBioKnoppix2.
- Advanced Work (only for those very interested):
Week 3: A03 B03 C03 D03
Title : Survey of Bioinformatics Software and Web Services
(To be Opened earlier on 27 Jan 06 because of CNY)
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A03 - Ung Choong Yong
- B03 - Ung Choong Yong
- C03 - Li Hu
- D03 - Kalyan
- Many different types of software applications and packages of many software applications for bioinformatics and biology.
- Many different types of webservices offering web interface to software applications and databases for bioinformatics and biology.
- For one of the classifications, the best in your opinion, or your own classification, populate the classification categories with lists of key bioinformatics and biological software and webservices
- For the main and popularly used ones, give a brief description of what they contain, what they do, how they aid life science research.
- Give examples of how they are used and how you might use them for your LSM course modules for the rest of your undergraduate studies.
- BioSoftware
- EBI DBCat classification
- IUBio Archive
- Other eclectic classifications
- From the book Computational Methods in Molecular Biology
http://www.cs.jhu.edu/~salzberg/appendixa.html - Netsci
http://www.netsci.org/Resources/Software/Bioinform/ - Bioinformatics.org
http://bioinformatics.org/softwaremap/?form_cat=2 - Suresh’s links
http://www.geocities.com/bioinformaticsweb/toollink.html
- From the book Computational Methods in Molecular Biology
- BioWebservices
- Many types of web links available, including databases and tutorials
- Select web services for computation only.
- NAR Web Server Issue
- 2005 July Issue
http://nar.oxfordjournals.org/content/vol33/suppl_2/index.dtl#EDITORIAL - Joanne A. Fox, Stefanie L. Butland, Scott McMillan, Graeme Campbell, and B. F. Francis Ouellette (2005) The Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers. Nucl. Acids Res. 2005 33: W3-W24
doi:10.1093/nar/gki594 - Links Directory
http://bioinformatics.ubc.ca/resources/links_directory/ - Categories
http://www.bioinformatics.ubc.ca/resources/links_directory/narweb2005/categorized.php
- 2005 July Issue
- Subclassification: Sequence Analysis (Optional)
- For sequence analysis software, there are further subclassifications.
- For example, EMBOSS itself is subclassified into groups of applications on a functional basis.
- Do a survey of software and web services for the class of “Sequence Analysis” including EMBOSS and other applications
- Advanced: Integration and Organisation (Optional)
- With last groups presentation of databases, and this week’s presentation of software and web services, there is such an overwhelming range of resources
- Propose what ways of integration and organisation that can take place.
- The Resourceome (general)
- Cannata N, Merelli E, Altman RB (2005) Time to Organize the Bioinformatics Resourceome. PLoS Comput Biol 1(7): e76
- http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010076
- TAMBIS –Tao (by ontology)
- An Ontology for bioinformatics application” by Patricia Baker, et al, that appeared in Bioinformatics, Volume 15, number 6, pages 510–520, 1999
- http://www.ontologos.org/OML/..%5COntology%5CTAMBIS.htm (use Internet explorer)
Week 4: A04 B04 C04 D04
Title : Biological Sequence Comparison
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A04 - Ung Choong Yong
- B04 - Justin Choo
- C04 - Xie Chao
- D04 - Muh Hon Cheng
- What is the purpose of Biological Sequence Comparison?
- List out the uses of sequence comparison for the biologist
- In explaining biological sequence comparison, include the following key concepts/topics of the subject.
- Global vs Local Alignment
- List out and explain the differences between Global vs Local Alignment
- In which cases do we want to do global alignment instead of local alignment and vice versa
- Tabulate examples of software applications that perform Global vs Local Alignment
- Give specific names and the packages they come from
- Give brief details of their distinguishing features
- Briefly distinguish between the Smith-Waterman and Needleman-Wunsch methods
- List out and explain the differences between Global vs Local Alignment
- Visualisation of Sequence Comparison
- What are Dotplots? How are dotplots constructed for visualisation of sequence comparison?
- Give examples of dotplot software available from various packages or sources
- Give brief details of the similar or different features of these applications
- What the different types dotplots tell us about what kind of biological inferences we can make about the sequences being compared?
- In which cases is it more advantageous to use Dynamic Programming than dotplots? - how detailed are dotplots?
- What are the various parameters of Dotplot software, and what is their significance?
- Substitution Matrices
- Give a general definition of substitution matrix
- Give some examples of substitution matrices and their historical background
- Why do we need to use a substitution matrix as opposed to a unitary matrix?
- What is the biological significance of the substitution matrix
- (More details on substitutions matrices will be asked in the next week, so do not cover in greater detail than necessary and end up encroaching on next week's class presentation)
- Dynamic Programming
- What is its computing background and brief context in computer science?
- How does DP work when applied to biological sequence comparison?
- Explain the various stages of the DP algorithm as applied to sequence comparison.
- What is the scoring mechanism of DP applied to biological sequence comparison?
- Explain the parameters used in biological sequence comparison.
- Explain what is the core idea in Dynamic Programming when examining the the difference between exact algorithmic methods vs heuristic methods in sequence comparison? (Only brief coverage on heuristic methods necessary as it will be covered next week)
- (Do not cover BLAST details as this will be for the next week)
Week 5: A05 B05 C05 D05
Title : Biological Sequence Search
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A05 - Ung Choong Yong
- B05 - Justin Choo
- C05 - Muh Hon Cheng
- D05 - Kalyan
- Biological Sequence Search - concepts and topics
- What is the difference between a sequence search and keyword search?
- What is the search query in a sequence search?
- In the sequence search, what is the search query searched against?
- Typically what is the size of a sequence search query? Give examples e.g. a sequence from a single run in an automated DNA sequencer, small insert library sequence, a sequence melded from various contigs in a BAC, etc.
- Typically what is the size of a database of sequences? Give examples of different databases commonly used in sequence searches.
- Sequence Comparison in sequence searching
- Parameters used in sequence comparison. Explain the following:
- gap penalty
- gap extension penalty
- k-tuple. What other synonyms are also used?
- How are they used in sequence searches?
- Substitution Matrices in sequence comparison and sequence search
- Explain how substitution matrices bring biological significance into sequence alignments. In other words, why do we use substitution matrices?
- How are substitution matrices commonly used in sequence searches constructed?
- Give examples of such substitution matrices and their method of construction?
- What other substitution matrices can you find besides PAM and BLOSUM
- When constructing substitution matrices, we should avoid gaps in alignments. Why?
- In a sequence search, what governs our choice of substitution matrices?
- If we guess that homologs are more divergent, which substitution matrix should we choose? If we guess that homologs are less divergent, which substitution matrix should we use? Tabulate your recommendations as a guide to your choice whenever you are using sequence searches or comparisons.
- Parameters used in sequence comparison. Explain the following:
- Heuristic methods
- What is the key constraint with exact dynamic programming methods in sequence searches?
- Explain, why so many bioinformatics programs, particularly sequence search techniques, use heuristic methods to speed up the computation and how they reduce the search space.
- What are the main differences between heuristic BLAST technique and the exact Dynamic Programming algorithm? Explain the main points that makes BLAST a heuristic technique.
- BLAST and BLAST Flavours
- List all the different BLAST programs. Tabulate the type of query, the queried database, the type of sequence compared (nucleotide or amino acid sequence), the technique use and its difference. At a glance, one should be able to see, for example, what is the query sequence (nt or aa seq) used against what kind of database (nt or aa seq db) using what kind of comparison (aa vs aa or nt vs nt).
- If you use tblastx to search a database, all the sequences in the database are translated into six frames. This is done for each of your search sequences. How long does it typically take, if you are using NCBI BLAST server or our National BLAST server or our NUS BLAST Server. If you have 100,000 sequences to search, can you think of any refinement to the procedure?
- The BLAST output
- In the BLAST result output, what are the key aspects of information given?
- How are the hits ranked?
- Explain the colour scheme of NCBI BLAST alignment visualisation?
- What happens if more than one local alignment comes from the same target sequence?
- What is the meaning of percentage identity and percentage similarity?
- The alignment shown for each blast hit, is it a global or a local alignment of the query sequence against the sequence in the database?
- Substitution Matrices and BLAST Scores
- How are BLAST Scores computed?
- How does the choice of Substitution matrix affect the score?
- What is the significance of the scores
- How can differences in scoring system result in different alignments?
- Statistical Significance and E values
- How does one assess the statistical significance of a typical BLAST Search?
- Is the E-value in BLAST a probability, strictly speaking? Hint: You can find an answer at the NCBI website.
- How does the E-value help you in assessing the BLAST hits?
- Making Sound Biological inferences from BLAST
- If you want to find homologs to your query sequence, what are the most important items to observe in BLAST results?
- Before you make an inference that a BLAST hit is a homolog of your query sequence what sort of criteria should be fulfilled typically?
- In a BLAST hit alignment, before you can infer that the sub-region in your query sequence that matches another sub-region of a database sequence entry are homologs, what sort of criteria needs to be fulfilled?
- How does low complexity regions in a query sequence affect the BLAST Search, and how can we avoid this problem?
Week 6: A06 B06 C06 D06
Title : Biological Patterns and Motifs
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A06 - Ung Choong Yong
- B06 - Justin Choo
- C06 - Muh Hon Cheng
- D06 - Kalyan
- Basic Concepts in Biological Sequence Patterns and Motifs
- Consensus, Regular Expressions, Matrices and Profiles
- SeqLogo
- EMBOSS software cons
- What are regular expressions?
- Give examples of how a Prosite Pattern is constructed
- Give examples of how Prosite Profiles are constructed
- Give examples of how matrices are constructed
- Databases of patterns and motifs and Software Tools in predicting patterns and motifs
- Examples of DNA sequence pattern databases
- Describe the REBASE
- Describe the transcription factor database, TRANSFAC
- Examples of Protein sequence pattern databases
- What is the Prosite database?
- Pattern Scanning Software e.g. ScanProsite
- PRINTS and BLOCKS databases
- What are they used for and what are the differences
- PFam and SMART database
- Distinguish between the two. What are they used for?
- What is the Prosite database?
- Examples of DNA sequence pattern databases
- Sensitivity, Specificity, PPV
- Simple examples of calculations
- How do variations in the respective values of TP, TN, FP and FN affect these metrics
- Making biological inferences from patterns and motifs
- Common Pitfalls in making inferences
Week 7: A07 B07 C07 D07
Title : Multiple Sequence Alignments and Molecular Phylogenetics
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A07 - Ung Choong Yong
- B07 - Ung Choong Yong
- C07 - Justin Choo
- D07 - Justin Choo
- Multiple sequence alignment - extension of pairwise sequence comparison
- Software tools for MSA
- Evolutionary Theory and Molecular Phylogenetics
- Tree of Life and molecular clocks
- Software tools for phylogenetic inference
- Dendrograms and visualisation tools
- Key Concepts in molecular phylogenetics such as Neighbour Joining, Maximum Parsimony and Maximum Likelihood, Bootstrapping, etc.
- Making biological inferences from MSA and phylogenetics
Week 8: A08 B08 C08 D08
Title : Folding problem and Fundamentals of Structural Biology and Visualisation
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A08 - Susan Moore
- B08 - Susan Moore
- C08 - Li Hu
- D08 - Li Hu
- The importance of protein structures
- The different levels of protein structure (primary, secondary, tertiary, etc.)
- The 20 naturally occurring amino acids: how they can be grouped (hydrophobic and hydrophilic; small and large; and special ones like glycine and proline)
- Ramachandran plot
- Secondary and super secondary structures
- Factors involved in protein folding
- Levinthal's paradox about protein folding
- Old and new views of protein folding
Week 9: A09 B09 C09 D09
Title : Protein structure: databases, visualization, and classification
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A09 - Kalyan
- B09 - Kalyan
- C09 - Xie Chao
- D09 - Xie Chao
- Protein data bank (PDB): what are in the database, statistics of the database, rate of growth
- Structure visualization: free software packages (Rasmol, Swiss PDB viewer (DeepView), Chime, YASARA, MolMol, etc).
- The hierarchy of SCOP database and how they are defined
- Class
- Fold
- Superfamily
- Family
- The hierarchy of CATH database and how they are defined: class, architect, topology, homologous superfamilies, and sequence families
Week 10: A10 B10 C10 D10
Title : Protein Structural Modelling and Prediction
TAs-in-charge of supervising groups to screen for plagiarism, amend errors of fact etc.
- A10 - Muh Hon Cheng
- B10 - Rahul Thadani
- C10 - Muh Hon Cheng
- D10 - Rahul Thadani
- Relationship between sequence identity and structure similarities; safe zone and twilight zone
- Homology protein modeling
- De novo modeling
- CASP
- Current status of protein structure prediction
- Structure genomics and Protein Structure Initiative (PSI)