Go back

The following information might help you to make better trees with the programs in the Phylip package.  The script for this program was written by José M. González, University of La Laguna, Spain, with some input from William B. Whitman, University of Georgia.  If you have a comment or need further explanations, please feel free to send me a message (jmglezh@ull.es). If you'd like to cite this tool, please use this article.

 

Cophenetic correlation coefficient

 

This will help you calculate the Cophenetic Correlation Coefficient (CCC).  This parameter measures the correlation between distance values calculated during tree building and the observed distance. The CCC is a measure of how faithfully a dendrogram maintains the original pairwise distances.

To calculate it this Excel sheet uses the output and input files of the program Neighbor or Fitch, which are distance methods and are included in the Phylip package.  You might want to download the latest version of Phylip since the output files obtained with an older version might give problems. When using Neighbor, I suggest you use the UPGMA option to generate the tree since the other option, neighbor-joining, "corrects" for the artefacts for which the CCC tests. I also suggest to use the Jukes-Cantor evolutionary distances.

To run the program open the macros window and run CCC (pull down Tools and select "macros").  Alternatively, hit Control + "c".  It will ask for the input files, "infile", which is the input file of Neighbor or Fitch to generate a tree, and then "outtree" (what used to be "treefile" in the previous version of Phylip), which is generated by any of these two Phylip programs.  Make sure the sequence labels have letters or numbers only, not spaces or any other characters.  Once the right files are selected, it will take the data to the Excel sheet, deleting whatever was there before.  A graph shows the correlation between the branch lengths and distances.

The program doesn’t change any values on the input files, it only adds up the branch lengths in the file "outtree" for each pair of taxa.  The representation and the CCC could be useful to improve your tree.  A wrong alignment or sequencing mistakes in one sequence or more could lower the CCC.  A low CCC could also be due to horizontal gene transfer as has been shown to be the case in the genera Aeromonas, Acinetobacter or Pseudomonas.  A low correlation between DNA sequence similarities and DNA hybridization data suggests horizontal gene transfer as well.  If interested, please check Keswani and Whitman (2001) below.  You can find their calculations on this Excel sheet, which you can edit to analyze your own data.

Another way to validate your tree is by calculating the distortion coefficient E.  You may not need it but I used it to make sure this program was running right.  E can be calculated with the output files of Neighbor or Fitch.  The program also calculates it, where

T = number of taxa, ω is some weighting function, dij is the observed pairwise distance between taxa i and j, pij is the path-length derived distance between taxa i and j, and if α is 1, then absolute distance, but if α = 2, then weighted least-squares distance.

If ωij = 1, then all distances are expected to have the same error; if ωij = 1/dij, then error is expected to be proportional to the observed distance; if ωij = (1/dij)2 then expected error is proportional to the square-root of the observed distance.  The outfile that the program Fitch (also in the PHYLIP package) creates shows the sum of squares and you can check this way that E, when α = 2 and ωij = (1/dij)2, is sum of squares / 2. 

In the case of a neighbor-joining tree, you can validate it using bootstrap values. 

Bibliography

Cavalli-Sforza, L. L. and A. W. F. Edwards. 1967. Phylogenetic analysis: models and estimation procedures. American Journal of Human Genetics 19:233-257.

Farris, J. S. 1969. On the cophenetic correlation coefficient. Systematic Zoology 18:279-285.

Felsenstein, J. 1993. Phylogeny Inference Package (Phylip). Version 3.5. University of Washington, Seattle.

Fitch, W. M. and E. Margoliash. 1967. Construction of phylogenetic trees. Science 155:279-284.

Keswani, J., and W. B. Whitman. 2001. Relationship of 16S rRNA sequences similarity to DNA hybridisation in prokaryotes. Int. J. Syst. Evol. Microbiol. 51:667-678.

Sokal, R. R., and F. J. Rohlf. 1962. The comparisons of dendrograms by objective methods. Taxon 11:33-40.

Go back