Genomic sequence diversity of a bacterial species mainly results from the frequency distribution of single nucleotide polymorphisms (SNPs). Here we report on a SNP-matrix based binary algorithm to determine the intra- or interclonal genomic diversity by the number of shared sequential SNPs, the so-called SNP synteny or haplotype. All SNP positions and the frequency and length distribution of haplotypes are determined from pairwise alignment of completely sequenced genomes. This metric is invariant regarding the reference genome chosen. Information is obtained about the size of haplotypes, genomic gradients of recombination frequency, relatedness of strains and population composition of a taxon or clonal populations. The approach is illustrated with whole genome data sets of Staphylococcus aureus and Pseudomonas aeruginosa strains.
- Losada, P. M.; Tummler, B.
Keywords
- Bacterial recombination
- Pseudomonas aeruginosa
- Staphylococcus aureus
- core genome
- haplotype
- single nucleotide polymorphism