Documentation for Systematics

Systematics

Systematics is a classification of organisms based on evolutionary (phylogenetic) relationships.

Systematics.h

This file is part of Empirical and is located in Empirical/source/Evolve/Systematics.h

The systematics manager is used to track genotypes, species, clades, or lineages of organisms in a world.

Systematics allows a user to generate data to form phylogenetic trees.

The program can be run with different levels of abstraction, meaning the data can be generated by position, phenotype, or even genotype if you have a lot of RAM.

Note: You are responsible for filling in templates! Adding the template just gives you a place to store your data.

Taxon Specifics

  • Taxon - a group of species with similar characteristics

  • Genotypes are the most commonly used Taxon

A user can see the type and number of mutations that ocurred to bring about a taxon.

Some information that can be accessed is:

  • taxon ID# GetID()

  • details of organisms in the taxon GetInfo()

  • pointer to the parent group (will return a null pointer if the species was injected) GetParent()

  • how many organisms currently exist in the group and how many total organisms have ever existed in the group GetNumOrgs() or GetTotOrgs()

  • how many direct offspring groups exist from this group and how many total extant offspring that exist from this taxa GetTotalOffspring()

  • how deep in the tree the node you are examining is GetDepth()

  • when did this taxon first appear in the population GetOriginationTime()

  • when did the taxon leave the population GetDestructionTime()

New organisms are added to the taxon using AddOrg(). New offspring are added to the taxon with AddOffspring() .

Organisms are removed with RemoveOrg(). Offspring are removed with RemoveOffspring() .

If there are no more remaining organisms or offspring the taxon will deactivate.

General Systematics Data

Things that systematics can tell you about a phylogeny and how to access them:

  • Are we tracking a synchronous population? GetTrackSynchronous() SetTrackSynchronous()

  • Are we storing all taxa that are still alive in the population? GetStoreActive() SetStoreActive()

  • Are we storing all taxa that are ancestors of the living organisms in the population? GetStoreAncestors() SetStoreAncestors()

  • Are we storing all taxa that have died out, as have all of their descendants? GetStoreOutside() SetStoreOutside()

  • Are we storing any taxa types that have died out? GetArchive() SetArchive()

  • Are we storing the positions of taxa? GetStorePosition() SetStorePosition()

  • How many living organisms are currently being tracked? GetTotalOrgs()

  • How many independent trees are being tracked? GetNumRoots()

  • What ID will the next taxon have? GetNextID()

  • What is the average phylogenetic depth of organisms in the population? GetAveDepth()

  • To find the most recent common ancestor (MRCA) use GetMRCA() or GetMRCADepth() to find the distance to the MRCA.

The systematics class tracks the relationships among all organisms bases on the INFO_TYPE provided. If an offspring has the same value for INFO_TYPE as its parent, it is grouped into the same taxon. Otherwise a new Taxon is created and the old one is used as its parent in the phylogeny. If the provided INFO_TYPE is the organism’s genome, a traditional phylogeny is formed, with genotypes. If the organism’s behavior/task set is used, then organisms are grouped by phenotypes. If the organism’s position is used, the evolutionary path through space is tracked. Any other aspect of organisms can be tracked this way as well.

Generally, all living organisms’ taxa should be tracked and ancestral organisms’ taxa should be maintained for lineage. However, not all dead taxa should be maintained, it gets too big.

Diversity and Distinction

Systematics.h can also be used to find phylogenetic diversity for all extant taxa in the tree, assuming all edges from parent to child have a length of one.

When all branch lengths are equal, the phylogenetic diversity is the number of internal nodes plus the number of extant taxa minus 1.

You can also find how distinct a specific taxa is from the rest of the population based on the amount of unique evolutionary history that it represents.

Synchronous Populations

A synchronous population is a population in which each generation is a discrete time point and a completely new set of individual organisms is created for each generation. This means that an organism and its parent can never exist at the same time.

An asynchronous population is the opposite, where generations overlap and organisms reproduce when they are ready.

In the systematics manager, synchronicity is controlled with

GetTrackSynchronous() which returns true or false and SetTrackSynchronous(input true or false) which allows you to use a synchronous or asynchronous population.

Using the Systematics Manager

The systematics.h file alone will not give you any useful information. You must use a test file in conjunction with the systematics manager in order to see output.

To retrieve some results we will use the file Systematics.cc which is located in Empirical/tests/Evolve/Systematics.cc.

To compile to code use this command in the tests directory:

make test-Systematics

Output

Terminal Output:

AddOrg 25 (id1, no parent)

AddOrg -10 (id2; parent id1)

AddOrg 26 (id3, parent id1)

AddOrg 27 (id4, parent id2)

The first line of output shows the first organism in the examined phylogeny. This organism is added with AddOrg and is assigned an ID of id1. The organism has no parent, as seen in the farthest column of output, meaning that organism id1 will be the root of the phylogeny and produce offspring.

If we then look at the first number is parenthesis, we see the second organism with and ID of id2. Id2 is a direct descendant of the id1 organism.

Lastly, if we look at id4, we see that its parent is id2, meaning that we have created another node in the tree as the organisms move through generations, producing new offspring.

The terminal output should also include this section:

Active count:   11 [18|1,0|17] [17|1,2|11] [15|1,0|null] [12|1,1|11] [16|1,0|11] [11|1,3|null] [6|1,0|5] [19|1,0|17] [5|1,1|null] [4|1,0|null] [3|1,0|null]

The 11 at the front refers to the number of total taxa in the phylogeny.

If we look at the first set of numbers: [18|1, 0|17]

The first number in brackets, 18 in this case, is the taxon of the organism where a mutation occurred. 1, the next number, is the number of mutations that led to this branch. 0 is the number of offspring from this organism. Lastly, 17 is the id of the parent organism.

As for the second set [17|1, 2|11] – this is taxon 17, one mutation occurred, id17 had 2 offspring, and its parent is id11.

The last portion of the output has several lines of 3 numbers.

It should look like this:

1 : 0 : -1
2 : 0 : -1
3 : 0 : 0
4 : 0 : 0
5 : 0 : 0
6 : 0 : 0
7 : 0 : 0
8 : 0 : 987
9 : 0 : 986
10 : 0 : 987
11 : 0 : 988
12 : 0 : 987
13 : 0 : 988

The first number is the organism number. The second number is the position of the organism. The third number is the fitness of the organism at position 0.