Systematics.hpp

Track genotypes, species, clades, or lineages of organisms in a world.

Note that this file powers the Python library phylotrackpy. The main consequence is that we should generally prefer emp_optional_throw to emp_assert in any circumstance where user input could trigger an error. emp_assert will trigger a segfault in Python (killing the whole interpreter), whereas emp_optional_throw will raise a Python exception.

Todo:

We should provide an option to back up systematics data to a file so that it doesn’t all need to be kept in memory, especially if we’re only doing post-analysis.

This inheritance system makes adding new systematics-related data tracking kind of a pain. Over time, this will probably become a maintainability problem. We could make the inheritance go away and just use signals, but then the World could not maintain systematics managers.

template<typename ORG_INFO, typename DATA_STRUCT = datastruct::no_data>
class Taxon
#include <Systematics.hpp>

A Taxon represents a type of organism in a phylogeny.

Genotypes are the most commonly used Taxon; in general taxa can be anything from a shared genome sequence, a phenotypic trait, or a even a position in the world (if you want to track an evolutionary pathway)

Param ORG_INFO:

The information type associated with an organism, used to categorize it.

Public Types

using data_t = DATA_STRUCT

Public Functions

inline Taxon(size_t _id, const info_t &_info, Ptr<this_t> _parent = nullptr)
Taxon(const Taxon&) = default
Taxon(Taxon&&) = default
Taxon &operator=(const Taxon&) = delete
Taxon &operator=(Taxon&&) = default
inline bool operator<(const Taxon &other) const
inline size_t GetID() const

Get a unique ID for this taxon; IDs are assigned sequentially, so newer taxa have higher IDs.

inline const info_t &GetInfo() const

Retrieve the tracked info associated with this Taxon.

inline Ptr<this_t> GetParent() const

Retrieve a pointer to the parent Taxon.

inline void NullifyParent()
inline size_t GetNumOrgs() const

Get the number of living organisms currently associated with this Taxon.

inline void SetNumOrgs(size_t n)

Set the number of living organisms currently associated with this Taxon.

inline size_t GetTotOrgs() const

Get the total number of organisms that have ever lived associated with this Taxon.

inline void SetTotOrgs(size_t n)

Set the total number of organisms that have ever lived associated with this Taxon.

inline size_t GetNumOff() const

Get the number of taxa that were produced by organisms from this Taxon.

inline size_t GetDepth() const

Get the number of taxanomic steps since the ancestral organism was injected into the World.

inline data_t &GetData()

Get data struct associated with this taxon.

inline const data_t &GetData() const

Get data struct associated with this taxon.

inline std::set<Ptr<this_t>> GetOffspring()

Get pointers to this taxon’s offspring.

inline void SetData(data_t d)

Set this taxon’s data struct to the given value.

inline double GetOriginationTime() const
Returns:

this taxon’s origination time

inline void SetOriginationTime(double time)

Set this taxon’s origination time.

inline double GetDestructionTime() const
Returns:

this taxon’s destruction time

inline void SetDestructionTime(double time)

Sets this taxon’s destruction time.

inline void AddOrg()

Add a new organism to this Taxon.

inline void AddOffspring(Ptr<this_t> offspring_tax)

Add a new offspring Taxon to this one.

inline void AddTotalOffspring()

Recursively increment total offspring count for this and all ancestors.

inline int GetTotalOffspring()

Get total number of offspring directly or indirectly descending from this taxon.

inline bool RemoveOrg()

Remove an organism from this Taxon (after it dies). Removals must return true if the taxon needs to continue; false if it should deactivate.

inline void RemoveFromOffspring(Ptr<this_t> offspring_tax)

Remove specified taxon from this taxon’s offspring list.

inline bool RemoveOffspring(Ptr<this_t> offspring_tax)

Remove and offspring taxa after its entire sub-tree has died out (pruning)

inline void RemoveTotalOffspring()

Reduce the total count of extant offspring and recursively do so for all ancestors (gets called on a taxon’s parent when that taxon goes extinct)

Protected Types

using this_t = Taxon<ORG_INFO, DATA_STRUCT>
using info_t = ORG_INFO

Protected Attributes

size_t id

ID for this Taxon (Unique within this Systematics)

const info_t info

Details for the organisms associated within this taxanomic group.

Ptr<this_t> parent

Pointer to parent group (nullptr if injected)

std::set<Ptr<this_t>> offspring

Pointers to all immediate offspring taxa.

int num_orgs

How many organisms currently exist of this group?

int tot_orgs

How many organisms have ever existed of this group?

int num_offspring

How many direct offspring groups exist from this one.

int total_offspring

How many total extant offspring taxa exist from this one (i.e. including indirect)

size_t depth

How deep in tree is this node? (Root is 0)

double origination_time

When did this taxon first appear in the population?

double destruction_time

When did this taxon leave the population?

DATA_STRUCT data

A struct for storing additional information about this taxon.

Friends

friend class Systematics
template<typename ORG>
class SystematicsBase
#include <Systematics.hpp>

A base class for Systematics, maintaining information common to all systematics managers and providing virtual functions. You probably don’t want to instantiate this. It just exists so that you can make containers of Systematics managers of different types.

Subclassed by Systematics< ORG, ORG_INFO, DATA_STRUCT >

Public Types

using data_node_t = DataNode<double, data::Current, data::Info, data::Range, data::Stats, data::Pull>
using data_ptr_t = Ptr<data_node_t>

Public Functions

inline SystematicsBase(bool _active = true, bool _anc = true, bool _all = false, bool _pos = true)
inline virtual ~SystematicsBase()
inline bool GetTrackSynchronous() const

Are we tracking a synchronous population?

inline bool GetStoreActive() const

Are we storing all taxa that are still alive in the population?

inline bool GetStoreAncestors() const

Are we storing all taxa that are the ancestors of living organisms in the population?

inline bool GetStoreOutside() const

Are we storing all taxa that have died out, as have all of their descendants.

inline bool GetArchive() const

Are we storing any taxa types that have died out?

inline bool GetStorePosition() const

Are we storing the positions of taxa?

inline size_t GetTotalOrgs() const

How many living organisms are currently being tracked?

inline size_t GetNumRoots() const

How many independent trees are being tracked?

inline size_t GetNextID() const

What ID will the next taxon have?

inline double GetAveDepth() const

What is the average phylogenetic depth of organisms in the population?

inline size_t GetUpdate() const
Returns:

current update/time step

inline void SetTrackSynchronous(bool new_val)

Are we tracking organisms evolving in synchronous generations?

inline void SetStoreActive(bool new_val)

Are we storing all taxa that are still alive in the population?

inline void SetStoreAncestors(bool new_val)

Are we storing all taxa that are the ancestors of living organisms in the population?

inline void SetStoreOutside(bool new_val)

Are we storing all taxa that have died out, as have all of their descendants.

inline void SetArchive(bool new_val)

Are we storing any taxa types that have died out?

inline void SetStorePosition(bool new_val)

Are we storing the location of taxa?

inline void SetUpdate(size_t ud)

Sets the current update/time step.

inline data_ptr_t AddDataNode(const std::string &name)

Add a data node to this systematics manager

Parameters:

name – the name of the data node (so it can be found later)

inline data_ptr_t AddDataNode(std::function<vector<double>()> pull_set_fun, const std::string &name)

Add a data node to this systematics manager

Parameters:
  • name – the name of the data node (so it can be found later)

  • pull_set_fun – a function to run when the data node is requested to pull data (returns vector of values)

inline data_ptr_t AddDataNode(std::function<double()> pull_fun, const std::string &name)

Add a data node to this systematics manager

Parameters:
  • name – the name of the data node (so it can be found later)

  • pull_fun – a function to run when the data node is requested to pull data (returns single value)

inline data_ptr_t GetDataNode(const std::string &name)
Returns:

a pointer to the data node with the specified name

virtual data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness") = 0
virtual data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance") = 0
virtual data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity") = 0
virtual data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps") = 0
virtual data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility") = 0
virtual data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa") = 0
virtual data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution") = 0
virtual size_t GetNumActive() const = 0
virtual size_t GetNumAncestors() const = 0
virtual size_t GetNumOutside() const = 0
virtual size_t GetTreeSize() const = 0
virtual size_t GetNumTaxa() const = 0
virtual int GetMaxDepth() const = 0
virtual int GetPhylogeneticDiversity() const = 0
virtual double GetMeanPairwiseDistance(bool branch_only) const = 0
virtual double GetSumDistance() const = 0
virtual double GetSumPairwiseDistance(bool branch_only) const = 0
virtual double GetVariancePairwiseDistance(bool branch_only) const = 0
virtual vector<double> GetPairwiseDistances(bool branch_only) const = 0
virtual int SackinIndex() const = 0
virtual double CollessLikeIndex() const = 0
virtual std::unordered_map<int, int> GetOutDegreeDistribution() const = 0
virtual double GetAverageOriginTime(bool) const = 0
virtual int GetMRCADepth() const = 0
virtual void AddOrg(ORG &&org, WorldPosition pos) = 0
virtual void AddOrg(ORG &org, WorldPosition pos) = 0
virtual void AddOrg(ORG &&org, WorldPosition pos, WorldPosition parent) = 0
virtual void AddOrg(ORG &org, WorldPosition pos, WorldPosition parent) = 0
virtual bool RemoveOrg(WorldPosition pos) = 0
virtual void RemoveOrgAfterRepro(WorldPosition pos) = 0
virtual void PrintStatus(std::ostream &os) const = 0
virtual double CalcDiversity() const = 0
virtual void Update() = 0
virtual void SetNextParent(WorldPosition pos) = 0
virtual void SwapPositions(WorldPosition p1, WorldPosition p2) = 0

Protected Attributes

bool store_active

Store all of the currently active taxa?

bool store_ancestors

Store all of the direct ancestors from living taxa?

bool store_outside

Store taxa that are extinct with no living descendants?

bool archive

Set to true if we are supposed to do any archiving of extinct taxa.

bool store_position

Keep a vector mapping positions to pointers.

bool track_synchronous

Does this systematics manager need to keep track of current and next positions?

size_t org_count

How many organisms are currently active?

size_t total_depth

Sum of taxa depths for calculating average.

size_t num_roots

How many distinct injected ancestors are currently in population?

mutable int max_depth

Depth of deepest taxon. -1 means needs to be recalculated.

size_t next_id

What ID value should the next new taxon have?

size_t curr_update
DataManager<double, data::Current, data::Info, data::Range, data::Stats, data::Pull> data_nodes
template<typename ORG, typename ORG_INFO, typename DATA_STRUCT = datastruct::no_data>
class Systematics : public SystematicsBase<ORG>
#include <Systematics.hpp>

A tool to track phylogenetic relationships among organisms. The systematics class tracks the relationships among all organisms based on the INFO_TYPE provided. If an offspring has the same value for INFO_TYPE as its parent, it is grouped into the same taxon. Otherwise a new Taxon is created and the old one is used as its parent in the phylogeny. If the provided INFO_TYPE is the organism’s genome, a traditional phylogeny is formed, with genotypes. If the organism’s behavior/task set is used, then organisms are grouped by phenotypes. If the organism’s position is used, the evolutionary path through space is tracked. Any other aspect of organisms can be tracked this way as well.

Unnamed Group

virtual void AddOrg(ORG &&org, WorldPosition pos)

Add information about a new organism, including its stored info and parent’s taxon; If you would like the systematics manager to track taxon age, you can also supply the update at which the taxon is being added.

Parameters:
  • org – a reference to the organism being added

  • pos – the position of the organism being added

  • parent – a pointer to the org’s parent

Returns:

a pointer for the associated taxon.

virtual void AddOrg(ORG &&org, WorldPosition pos, WorldPosition parent)
Ptr<taxon_t> AddOrg(ORG &&org, WorldPosition pos, Ptr<taxon_t> parent)
Ptr<taxon_t> AddOrg(ORG &&org, Ptr<taxon_t> parent = nullptr)
virtual void AddOrg(ORG &org, WorldPosition pos)
virtual void AddOrg(ORG &org, WorldPosition pos, WorldPosition parent)
Ptr<taxon_t> AddOrg(ORG &org, WorldPosition pos, Ptr<taxon_t> parent)
Ptr<taxon_t> AddOrg(ORG &org, Ptr<taxon_t> parent = nullptr)

Unnamed Group

virtual bool RemoveOrg(WorldPosition pos)

Remove an instance of an organism; track when it’s gone.

Parameters:
  • pos – the world position of the individual being removed

  • taxon – a pointer to the taxon of the individual being removed

bool RemoveOrg(Ptr<taxon_t> taxon)

Unnamed Group

virtual void RemoveOrgAfterRepro(WorldPosition pos)

Mark an instance of a taxon to be removed; track when it’s gone. This is a work-around to deal with steady state/non-synchronous populations in which an organism might die as its offspring is born (e.g. in a spatial world where the offspring replaces the parent). If the bookkeeping is not handled correctly, we could accidentally mark the taxon as extinct when it is actually continuing. By using this method, the taxon won’t be removed until after the next org is added or the next time an org is marked for removal.

Parameters:
  • pos – the world position of the individual being removed

  • taxon – a pointer to the taxon of the individual being removed

void RemoveOrgAfterRepro(Ptr<taxon_t> taxon)

Unnamed Group

inline virtual void SetNextParent(WorldPosition pos)

Tell systematics manager that the parent of the next taxon added will be the one specified by this function (either at the specified position or the one pointed to by the given pointer) Works with version of AddOrg that only takes org, position, and update. Will be set to null after being assigned as the parent of a taxon

inline void SetNextParent(Ptr<taxon_t> p)

Public Types

using taxon_t = Taxon<ORG_INFO, DATA_STRUCT>
using info_t = ORG_INFO
using data_ptr_t = Ptr<data_node_t>

Public Functions

void Prune(Ptr<taxon_t> taxon)

Called whenever a taxon has no organisms AND no descendants.

void RemoveOffspring(Ptr<taxon_t> offspring, Ptr<taxon_t> taxon)

Called when an offspring taxa has been deleted.

void MarkExtinct(Ptr<taxon_t> taxon)

Called when there are no more living members of a taxon. There may be descendants.

inline Systematics(fun_calc_info_t calc_taxon, bool store_active = true, bool store_ancestors = true, bool store_all = false, bool store_pos = true)

Contructor for Systematics; controls what information should be stored.

Parameters:
  • calc_taxon – A function that takes an organism and calculates what taxon it belongs to

  • store_active – Should living organisms’ taxa be tracked? (typically yes!)

  • store_ancestors – Should ancestral organisms’ taxa be maintained? (yes for lineages!)

  • store_all – Should all dead taxa be maintained? (typically no; it gets BIG!)

  • store_pos – Should the systematics tracker keep track of organism positions? (probably yes - only turn this off if you know what you’re doing)

Systematics(const Systematics&) = delete
Systematics(Systematics&&) = default
inline ~Systematics()
virtual void Update()

Switch to next update/time step Useful for keeping track of taxon survival times and population positions in synchronous generation worlds.

inline void SetCalcInfoFun(fun_calc_info_t f)

Set function used to calculate taxons from organisms.

void RemoveBefore(int ud)

Remove all taxa that 1) went extinct before the specified update/time step, and 2) only have ancestors that went extinct before the specified update/time step. Warning: this function invalidates most measurements you could make about tree topology. It is useful in select situations where you need to store ancestors for some period of time, but cannot computationally afford to store all ancestors for your entire run.

inline void ApplyToActiveTaxa(const std::function<void(const Ptr<taxon_t> tax)> &fun) const

Run the given function on every active taxon (const version)

Parameters:

fun – the function to run on each taxon

inline void ApplyToActiveTaxa(const std::function<void(Ptr<taxon_t> tax)> &fun)

Run the given function on every active taxon

Parameters:

fun – the function to run on each taxon

inline void ApplyToAncestorTaxa(const std::function<void(const Ptr<taxon_t> tax)> &fun) const

Run the given function on every ancestor taxon (const version)

Parameters:

fun – the function to run on each taxon

inline void ApplyToAncestorTaxa(const std::function<void(Ptr<taxon_t> tax)> &fun)

Run the given function on every ancestor taxon

Parameters:

fun – the function to run on each taxon

inline void ApplyToOutsideTaxa(const std::function<void(const Ptr<taxon_t> tax)> &fun) const

Run the given function on every outside taxon (const version)

Parameters:

fun – the function to run on each taxon

inline void ApplyToOutsideTaxa(const std::function<void(Ptr<taxon_t> tax)> &fun)

Run the given function on every outside taxon

Parameters:

fun – the function to run on each taxon

inline void ApplyToAllTaxa(const std::function<void(const Ptr<taxon_t> tax)> &fun) const

Run given function on all taxa (const version)

Parameters:

fun – the function to run on each taxon

inline void ApplyToAllTaxa(const std::function<void(Ptr<taxon_t> tax)> &fun)

Run given function on all taxa

Parameters:

fun – the function to run on each taxon

template<typename T>
inline vector<T> ApplyToAllTaxa(const std::function<T(const Ptr<taxon_t> tax)> &fun) const

Run given function on all taxa and return result (const version)

Parameters:

fun – the function to run on each taxon

Returns:

a vector containing the results of running the function on each taxon

template<typename T>
inline vector<T> ApplyToAllTaxa(const std::function<T(Ptr<taxon_t> tax)> &fun)

Run given function on all taxa and return result (const version)

Parameters:

fun – the function to run on each taxon

Returns:

a vector containing the results of running the function on each taxon

inline std::unordered_set<Ptr<taxon_t>, hash_t> *GetActivePtr()
inline const std::unordered_set<Ptr<taxon_t>, hash_t> &GetActive() const
Returns:

set of active (extant/living) taxa0

inline const std::unordered_set<Ptr<taxon_t>, hash_t> &GetAncestors() const
Returns:

set of ancestor taxa (extinct, but have active descendants)

inline const std::unordered_set<Ptr<taxon_t>, hash_t> &GetOutside() const
Returns:

set of outside taxa (extinct, with no active descendants)

inline virtual size_t GetNumActive() const
Returns:

the number of taxa that are still active in the population

inline virtual size_t GetNumAncestors() const
Returns:

the number of taxa that are ancestors of living organisms (but have died out themselves)

inline virtual size_t GetNumOutside() const
Returns:

the number of taxa that are stored that have died out, as have their descendents

inline virtual size_t GetTreeSize() const
Returns:

the number of taxa that are in the current phylogeny

inline virtual size_t GetNumTaxa() const
Returns:

the number of taxa that are stored in total

virtual int GetMaxDepth() const
Returns:

the phylogenetic depth (lineage length) of the taxon with the longest lineage out of all active taxa

inline Ptr<taxon_t> GetNextParent() const
Returns:

the taxon that will be used as the parent of the next taxon created via the version of AddOrg that does not accept a parent

inline Ptr<taxon_t> GetMostRecent() const
Returns:

the most recently created taxon

Ptr<taxon_t> Parent(Ptr<taxon_t> taxon) const
Returns:

a pointer to the parent of a given taxon

inline bool IsTaxonAt(WorldPosition id) const
Returns:

true if there is a taxon at specified location

inline Ptr<taxon_t> GetTaxonAt(WorldPosition id) const
Returns:

pointer to taxon at specified location

inline SignalKey OnNew(std::function<void(Ptr<taxon_t> t, ORG &org)> &fun)

Provide a function for Systematics to call each time a new taxon is created. Trigger: New taxon is made Argument: Pointer to taxon, reference to org taxon was created from

inline SignalKey OnExtinct(std::function<void(Ptr<taxon_t> t)> &fun)

Provide a function for Systematics to call each time a taxon goes extinct. Trigger: Taxon is going extinct Argument: Pointer to taxon

inline SignalKey OnPrune(std::function<void(Ptr<taxon_t>)> &fun)

Provide a function for Systematics to call each time a taxon is about to be pruned (removed from ancestors). Trigger: Taxon is about to be killed Argument: Pointer to taxon

inline virtual data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness")

Add data node that records evolutionary distinctiveness when requested to pull. Used by AddPhylodiversityFile in World_output.hpp

inline virtual data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance")

Add data node that records pairwise distance when requested to pull. Used by AddPhylodiversityFile in World_output.hpp

inline virtual data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity")

Add data node that records phylogenetic distinctiveness when requested to pull. Used by AddPhylodiversityFile in World_output.hpp

inline virtual data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps")

Add data node that records counts of deleterious steps along lineages in this systematics manager when requested to pull. Used by AddLineageMutationFile in World_output.hpp

inline virtual data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility")

Add data node that phenotypic volatility (changes in phenotype) along lineages in this systematics manager when requested to pull. Used by AddLineageMutationFile in World_output.hpp

inline virtual data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa")

Add data node that records counts of unique taxa along lineages in this systematics manager when requested to pull. Used by AddLineageMutationFile in World_output.hpp

inline virtual data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution")

Add data node that records counts of mutations of the specified type along lineages in this systematics manager when requested to pull. Used by AddLineageMutationFile in World_output.hpp

inline virtual int GetPhylogeneticDiversity() const

From (Faith 1992, reviewed in Winters et al., 2013), phylogenetic diversity is the sum of edges in the minimal spanning tree connected the taxa you’re calculating diversity of.

This calculates phylogenetic diversity for all extant taxa in the tree, assuming all edges from parent to child have a length of one. Possible extensions to this function that might be useful in the future include:

  • Pass it a set of taxon_t pointers and have it calculate PD for just those taxa

  • Enable calculation of branch lengths by amount of time that elapsed between origination of parent and origination of offspring

  • Enable a paleontology compatibility mode where only branching points are calculated

int GetPhylogeneticDiversityNormalize(int generation = 0, std::string filename = "") const
Returns:

phylogenetic diversity if used without any arguments . If you want to receive normalized data, you need to include the number of generations your tree has you also need to specify a file with which to normalize your data. If value is outside of the values in the file, 100th percentile will be returned NOTE: This is experimental and in early development/research phases!

inline double GetTaxonDistinctiveness(Ptr<taxon_t> tax) const

This is a metric of how distinct tax is from the rest of the population.

(From Vane-Wright et al., 1991; reviewed in Winter et al., 2013)

double GetEvolutionaryDistinctiveness(Ptr<taxon_t> tax, double time) const

This metric (from Isaac, 2007; reviewed in Winter et al., 2013) measures how distinct tax is from the rest of the population, weighted for the amount of unique evolutionary history that it represents.

To quantify length of evolutionary history, this method needs time: the current time, in whatever units time is being measured in when taxa are added to the systematics manager. Note that passing a time in the past will produce inaccurate results (since we don’t know what the state of the tree was at that time).

Assumes the tree is all connected. Will return -1 if this assumption isn’t met.

inline vector<double> GetAllEvolutionaryDistinctivenesses(double time) const
Parameters:

time – The time step at which the calculation is being done

Returns:

A vector of evolutionary distinctiveness of all active taxa

inline double GetMeanEvolutionaryDistinctiveness(double time) const
Parameters:

time – The time step at which the calculation is being done

Returns:

Mean evolutionary distinctiveness of all active taxa

inline double GetSumEvolutionaryDistinctiveness(double time) const
Parameters:

time – The time step at which the calculation is being done

Returns:

Sum of evolutionary distinctiveness of all active taxa

inline double GetVarianceEvolutionaryDistinctiveness(double time) const
Parameters:

time – The time step at which the calculation is being done

Returns:

Variance of evolutionary distinctiveness of all active taxa

inline virtual double GetMeanPairwiseDistance(bool branch_only = false) const

Calculates mean pairwise distance between extant taxa (Webb and Losos, 2000). This measurement is also called Average Taxonomic Diversity (Warwick and Clark, 1998) (for demonstration of equivalence see Tucker et al, 2016). This measurement tells you about the amount of distinctness in the community as a whole.

This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.

Parameters:

branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).

inline virtual double GetSumDistance() const

Calculates summed branch lengths of tree. Tucker et al 2017 points out that this is a measure of phylogenetic richness.

inline virtual double GetSumPairwiseDistance(bool branch_only = false) const

Calculates summed pairwise distance between extant taxa. Tucker et al 2017 points out that this is a measure of phylogenetic richness.

This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.

Parameters:

branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred)

inline virtual double GetVariancePairwiseDistance(bool branch_only = false) const

Calculates variance of pairwise distance between extant taxa. Tucker et al 2017 points out that this is a measure of phylogenetic regularity.

This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.

Parameters:

branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).

virtual vector<double> GetPairwiseDistances(bool branch_only = false) const

Calculates a vector of all pairwise distances between extant taxa.

This method assumes that the tree is fully connected. Will return -1 if this is not the case.

Parameters:

branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred). *

double GetPairwiseDistance(Ptr<taxon_t> t1, Ptr<taxon_t> t2, bool branch_only = false) const
std::set<Ptr<taxon_t>> GetCanopyExtantRoots(int time_point = 0) const

Returns a vector containing all taxa that were extant at time_point and were at that time the most recent ancestors of taxa that are now extant Example: Say the only current extant taxon is C, its lineage goes A -> B -> C, and B and C were both alive at the specified time_point. This function would only return B. If, however, there were another currently extant taxon that were descended directly from A, then this function would return both A and B.

int GetDistanceToRoot(Ptr<taxon_t> tax) const
Parameters:

tax – the taxon who’s distance to root you want to calculate

Returns:

the total number of ancestors between the given taxon and MRCA, if there is one. If there is no common ancestor, distance to the root of this tree is calculated instead.

int GetBranchesToRoot(Ptr<taxon_t> tax) const

Calculates the number of branching points leading to multiple extant taxa between the given taxon and the most-recent common ancestor (or the root of its subtree, if no MRCA exists). This is useful because a lot of stats for phylogenies are designed for phylogenies reconstructed from extant taxa. These phylogenies generally only contain branching points, rather than every ancestor along the way to the current taxon.

Parameters:

tax – taxon to calculate branches from

Returns:

Number of branching points between tax and root

inline virtual int SackinIndex() const
Returns:

Sackin Index of this tree (Sackin, 1972; reviewed in Shao, 1990). Measures tree balance

inline virtual std::unordered_map<int, int> GetOutDegreeDistribution() const

Returns dictionary containing a histogram of node out degrees e.g. {1:4, 2:10, 3:4} means the tree has 4 unifurcations, 10 bifurcations, and 4 trifurcations

inline virtual double GetAverageOriginTime(bool normalize = false) const

Get average origin time for whole phylogeny. If

Parameters:

normalize – is set to true, will apply normalization to make result comparable to what you would expect from a strictly bifurcating tree (as most reconstruction methods will produce). This normalization is achieved by multiplying each taxon’s values by the number of offspring taxa it has minus one.

inline virtual double CollessLikeIndex() const

Calculate Colless Index of this tree (Colless, 1982; reviewed in Shao, 1990). Measures tree balance. The standard Colless index only works for bifurcating trees, so this will be a Colless-like Index, as suggested in “Sound Colless-like balance indices for multifurcating trees” (Mir, 2018, PLoS One)

Ptr<taxon_t> GetMRCA() const
Returns:

a pointer to the Most-Recent Common Ancestor for the population.

virtual int GetMRCADepth() const
Returns:

the depth of the Most-Recent Common Ancestor; return -1 for none.

Ptr<taxon_t> GetSharedAncestor(Ptr<taxon_t> t1, Ptr<taxon_t> t2) const
Returns:

a pointer to the Most-Recent Ancestor shared by two taxa.

virtual double CalcDiversity() const
Returns:

the genetic diversity of the population.

inline vector<Ptr<taxon_t>> GetLineage(Ptr<taxon_t> tax) const
Returns:

vector containing the lineages of the specified taxon

inline vector<Ptr<taxon_t>> GetLineageToMRCA(Ptr<taxon_t> tax) const
Returns:

vector containing the lineages of the specified taxon up to and including the MRCA, but not past the MRCA

virtual void PrintStatus(std::ostream &os = std::cout) const

Print details about the Systematics manager. First prints setting, followed by all active, ancestor, and outside taxa being stored. Format for taxa is [ id | number of orgs in this taxon, number of offspring taxa of this taxon | parent taxon]

Parameters:

os – output stream to print to

void PrintLineage(Ptr<taxon_t> taxon, std::ostream &os = std::cout) const

Print a whole lineage. Format: “Lineage:”, followed by each taxon in the lineage, each on new line

Parameters:
  • taxon – a pointer to the taxon to print the lineage of

  • os – output stream to print to

inline void AddSnapshotFun(const std::function<std::string(const taxon_t&)> &fun, const std::string &key, const std::string &desc = "")

Add a new snapshot function. When a snapshot of the systematics is taken, in addition to the default set of functions, all user-added snapshot functions are run. Functions take a reference to a taxon as input and return the string to be dumped in the file at the given key.

void Snapshot(const std::string &file_path) const

Take a snapshot of current state of taxon phylogeny. WARNING: Current, this function assumes one parent taxon per-taxon.

Parameters:

file_path – the file to store the snapshot data in

inline virtual void SwapPositions(WorldPosition p1, WorldPosition p2)
void LoadFromFile(const std::string &file_path, const std::string &info_col = "info", bool assume_leaves_extant = true, bool adjust_total_offspring = true)

Load data from a file into the systematics manager. Intended to be used on an empty systematics object Expects a file in ALife phylogeny standard format (see: https://alife-data-standards.github.io/alife-data-standards/phylogeny)

Parameters:
  • file_path – the file to load data from

  • info_col – the name of the column in the file that contains the taxon info (i.e. the information distinguishing taxa; should match return type of calc_taxon_info_fun)

  • assume_leaves_extant – if true, assumes that all leaf nodes correspond to extant taxa

  • adjust_total_offspring – if true, adjusts the total offspring count of each taxon based on the number of offspring it actually has (time consuming but necessary for some stats)

size_t GetNumActive() const = 0
size_t GetNumAncestors() const = 0
size_t GetNumOutside() const = 0
size_t GetTreeSize() const = 0
size_t GetNumTaxa() const = 0
int GetPhylogeneticDiversity() const = 0
double GetMeanPairwiseDistance(bool branch_only) const = 0
double GetSumDistance() const = 0
double GetSumPairwiseDistance(bool branch_only) const = 0
double GetVariancePairwiseDistance(bool branch_only) const = 0
vector<double> GetPairwiseDistances(bool branch_only) const = 0
std::unordered_map<int, int> GetOutDegreeDistribution() const = 0
double GetAverageOriginTime(bool) const = 0
int GetMRCADepth() const = 0
void AddOrg(ORG &&org, WorldPosition pos) = 0
void AddOrg(ORG &org, WorldPosition pos) = 0
void AddOrg(ORG &&org, WorldPosition pos, WorldPosition parent) = 0
void AddOrg(ORG &org, WorldPosition pos, WorldPosition parent) = 0
bool RemoveOrg(WorldPosition pos) = 0
void RemoveOrgAfterRepro(WorldPosition pos) = 0
void PrintStatus(std::ostream &os) const = 0
double CalcDiversity() const = 0
virtual void Update() = 0
inline size_t GetUpdate() const
Returns:

current update/time step

inline void SetUpdate(size_t ud)

Sets the current update/time step.

void SetNextParent(WorldPosition pos) = 0
inline data_ptr_t GetDataNode(const std::string &name)
Returns:

a pointer to the data node with the specified name

inline data_ptr_t AddDataNode(const std::string &name)

Add a data node to this systematics manager

Parameters:

name – the name of the data node (so it can be found later)

inline data_ptr_t AddDataNode(std::function<vector<double>()> pull_set_fun, const std::string &name)

Add a data node to this systematics manager

Parameters:
  • name – the name of the data node (so it can be found later)

  • pull_set_fun – a function to run when the data node is requested to pull data (returns vector of values)

inline data_ptr_t AddDataNode(std::function<double()> pull_fun, const std::string &name)

Add a data node to this systematics manager

Parameters:
  • name – the name of the data node (so it can be found later)

  • pull_fun – a function to run when the data node is requested to pull data (returns single value)

data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness") = 0
data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance") = 0
data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity") = 0
data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps") = 0
data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility") = 0
data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa") = 0
data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution") = 0
int GetMaxDepth() const = 0

Public Members

vector<SnapshotInfo> user_snapshot_funs

Collection of all desired snapshot file columns.

std::unordered_set<Ptr<taxon_t>, hash_t> active_taxa

A set of all living taxa.

std::unordered_set<Ptr<taxon_t>, hash_t> ancestor_taxa

A set of all dead, ancestral taxa.

std::unordered_set<Ptr<taxon_t>, hash_t> outside_taxa

A set of all dead taxa w/o descendants.

Ptr<taxon_t> to_be_removed = nullptr

Taxon to remove org from after next call to AddOrg.

WorldPosition removal_pos = {0, 0}

Position of taxon to next be removed.

vector<vector<Ptr<taxon_t>>> taxon_locations

Positions in this vector indicate taxon positions in world.

Signal<void(Ptr<taxon_t>, ORG &org)> on_new_sig

Trigger when a new taxon is created.

Signal<void(Ptr<taxon_t>)> on_extinct_sig

Trigger when a taxon goes extinct.

Signal<void(Ptr<taxon_t>)> on_prune_sig

Trigger when any organism is pruned from tree.

mutable Ptr<taxon_t> mrca

Most recent common ancestor in the population.

Private Types

using parent_t = SystematicsBase<ORG>
using hash_t = typename Ptr<taxon_t>::hash_t
using fun_calc_info_t = std::function<ORG_INFO(ORG&)>

Private Members

fun_calc_info_t calc_info_fun

Function that takes an organism and returns the unit being tracked by systematics.

Ptr<taxon_t> next_parent

The taxon that has been marked as parent for next new org.

Ptr<taxon_t> most_recent

The most-recently added taxon.

bool num_orgs_defaulted = false

Keep track of whether we have loaded from a file that didn’t

bool total_offspring_defaulted = false

provide num_orgs

Keep track of whether we have loaded from a file without

bool store_active

recalculating total offspring

bool store_ancestors

Store all of the direct ancestors from living taxa?

bool store_outside

Store taxa that are extinct with no living descendants?

bool archive

Set to true if we are supposed to do any archiving of extinct taxa.

bool store_position

Keep a vector mapping positions to pointers.

bool track_synchronous

Does this systematics manager need to keep track of current and next positions?

size_t org_count

How many organisms are currently active?

size_t total_depth

Sum of taxa depths for calculating average.

size_t num_roots

How many distinct injected ancestors are currently in population?

size_t next_id

What ID value should the next new taxon have?

size_t curr_update
mutable int max_depth

Depth of deepest taxon. -1 means needs to be recalculated.

struct SnapshotInfo
#include <Systematics.hpp>

Struct for keeping track of what information to print out in snapshot files.

Public Types

using snapshot_fun_t = std::function<std::string(const taxon_t&)>

Public Functions

inline SnapshotInfo(const snapshot_fun_t &_fun, const std::string &_key, const std::string &_desc = "")

Public Members

snapshot_fun_t fun

Function for converting taxon to string containing desired data.

std::string key

Column name for data calculated with this function.

std::string desc

Description of data in this function.

namespace datastruct

The systematics manager allows an optional second template type that can store additional data about each taxon in the phylogeny. Here are some structs containing common pieces of additional data to track. Note: You are responsible for filling these in! Adding the template just gives you a place to store your data.

struct no_data
#include <Systematics.hpp>

Public Types

using has_fitness_t = std::false_type
using has_mutations_t = std::false_type
using has_phen_t = std::false_type
struct fitness
#include <Systematics.hpp>

The default - an empty struct.

Public Types

using has_fitness_t = std::true_type
using has_mutations_t = std::false_type
using has_phen_t = std::false_type

Public Functions

inline void RecordFitness(double fit)

This taxon’s fitness (for assessing deleterious mutational steps)

inline double GetFitness() const

Public Members

DataNode<double, data::Current, data::Range> fitness
template<typename PHEN_TYPE>
struct mut_landscape_info
#include <Systematics.hpp>

Track information related to the mutational landscape Maps a string representing a type of mutation to a count representing the number of that type of mutation that occurred to bring about this taxon.

Public Types

using phen_t = PHEN_TYPE
using has_phen_t = std::true_type
using has_mutations_t = std::true_type
using has_fitness_t = std::true_type

Public Functions

inline const PHEN_TYPE &GetPhenotype() const

This taxon’s phenotype (for assessing phenotypic change)

Returns:

this taxon’s phenotype

inline double GetFitness() const
Returns:

this taxon’s fitness

inline void RecordMutation(const std::unordered_map<std::string, int> &muts)

Adds mutations to the list of mutations that occurred to make this taxon

Parameters:

muts – can contain as many strings (types of mutation) as desired, each accompanied by a number indicating how many of that mutation occurred Example: {“point_mutation”:2, “insertion”:1}

inline void RecordFitness(double fit)

Record the fitness of this taxon

Parameters:

fit – the fitness

inline void RecordPhenotype(PHEN_TYPE phen)

Record the phenotype of this taxon

Parameters:

phen – the phenotype

Public Members

std::unordered_map<std::string, int> mut_counts = {}
DataNode<double, data::Current, data::Range> fitness

The number of mutations of each type that occurred to make this taxon.

PHEN_TYPE phenotype

This taxon’s fitness (for assessing deleterious mutational steps)