Diagnostic Characters
The Diagnostic Character analysis provides a means to examine nucleotide or amino acid polymorphism between sets of sequences that are grouped by taxonomic or geographic labels. More specifically, this tool identifies consensus bases from each group, compares them to those from the remaining sequences in other groups, then characterizes how unique each consensus base is. The purpose of this tool is categorizes consensus bases by their diagnostic potential, which are categorizes as followed:
Characterizations in Diagnostic Characters tool (* base is either nucleotide or a residue)
Abbreviation |
Name |
Meaning |
D |
Diagnostic |
At this position in the MSA, the base* is found only in one group. |
DP |
Diagnostic or Partial |
Due to the presence of ambiguous base(s) in other groups, this base* may be classified as P if the same character also appears in some but not all sequences in other groups OR D if it does not appear at all. |
P |
Partial Character |
At this position in the MSA, this base is found in all sequences in this one group, however it is also found in some but not all sequences in other groups. |
PU |
Partial or Uninformative Character |
Due to the presence of both ambiguous bases and this base of interest in all sequence in at least one other group, this base* can either be partial or uninformative depending on how many ambiguous bases in the other group are truly the same as the base in question. |
I |
Invalid Character |
Only ambiguous bases are present in all sequences in other groups. Since D, P and U are all possible, nothing can be said about this base, hence it is declared as invalid. |
U |
Uninformative Character |
More than 1 group share this consensus base. This base holds no diagnostic ability and cannot be used in any subsequent diagnoses. |
Parameters
Since this tool only performs the analysis on the set of sequences selected by the user, the result is greatly affected by the initial data and the analysis parameters. Even the smallest change in the initial sequences, filtering options, or the analysis parameters can cause the consensus sequences in each group and hence the diagnostic potential to be different between analyses. As a result, the interpretation of each analysis is absolutely dependent on all the factors combined. In general, having more sequences per group will provide a more accurate diagnosis of each group, as it reduces the problem caused by small sample size.
Algorithm
- Sequence alignment of all the sequences serves as the starting point of this analysis. Alignment algorithm is one of the options available for the user to specify.
- Based on the grouping attribution, sequences are separated into various sets.
- Consensus sequences within each sequence set are collected.
- For each group, the consensus bases are examined one by one and compared to the bases found in all the remaining sequences. Based on the number occurrence and percentage of occurrence of the consensus base in other groups (see table above for definition), the diagnostic potential of that base to the current group is determined.
Diagnostic Characters examples
Diagnostic Characters results page
Back to Top