sequence_utils.hpp

Functions for analyzing with generic sequence types.

A set of functions for analyzing sequences, including distance metrics (Hamming and Edit/Levenshtein) and alignment.

Note

Status: BETA

Functions

template<typename T = size_t>
vector<T> ToSequence(std::string sequence_str)

Generate a sequence from a string. Format: “entry1,entry2,entry3” etc. Entries can be single values (Eg: “72”) or ranges using start[:step]:stop format (Eg: “0:100” or “3:5:33”).

template<typename TYPE>
size_t calc_hamming_distance(const TYPE &in1, const TYPE &in2, int offset = 0)

Hamming distance is a simple count of substitutions needed to convert one array to another.

Parameters:
  • in1 – The first sequence to compare.

  • in2 – The second sequence to compare.

  • offset – (optional) Position in the first sequence to start the second sequence.

template<typename TYPE>
size_t calc_edit_distance(const TYPE &in1, const TYPE &in2)

Edit distance is the minimum number of insertions, deletions and substitutions to convert one array to another.

template<typename TYPE, typename GAP_TYPE>
size_t align(TYPE &in1, TYPE &in2, GAP_TYPE gap)

Use edit distance to find the minimum number of insertions, deletions and substitutions to convert one array to another, and then insert gaps into the arrays appropriately.