Compiler¶
Deterministic Finite Automata¶
A Deterministic Finite Automata simulator.
- Note
This file is part of Empirical, https://github.com/devosoft/Empirical
- Copyright
Copyright (C) Michigan State University, MIT Software license; see doc/LICENSE.md
- Date
2016-2021.
- Note
Status: BETA
-
namespace
emp
If we are in emscripten, make sure to include the header.
Typedefs
-
template<int
NUM_SYMBOLS
= 128, typenameSTOP_TYPE
= uint8_t>
classtDFA
Public Types
-
using
stop_t
= STOP_TYPE
Public Functions
-
tDFA
(size_t num_states = 0)
-
tDFA
(const tDFA<NUM_SYMBOLS, STOP_TYPE>&) = default
-
~tDFA
()
-
tDFA<NUM_SYMBOLS, STOP_TYPE> &
operator=
(const tDFA<NUM_SYMBOLS, STOP_TYPE>&) = default
-
size_t
GetSize
() const How many states is this DFA using?
-
void
Resize
(size_t new_size) Add Additional empty states.
-
const emp::array<int, NUM_SYMBOLS> &
GetTransitions
(size_t from) const Return an array of all transitions associated with a specified state.
-
void
SetTransition
(size_t from, size_t to, size_t sym) Add a specific transition associated with an input symbol.
-
void
SetStop
(size_t state, stop_t stop_val = 1) Set the stop value (no matter what it currently is)
-
void
AddStop
(size_t state, stop_t stop_val = 1) Set the stop value only if it’s higher than the current stop value.
-
stop_t
GetStop
(int state) const Get the stop value associated with a state.
-
bool
IsActive
(int state) const Test if a state is still valid.
-
bool
IsStop
(int state) const Test if a state has a stop.
-
stop_t
GetStop
(size_t state) const
-
bool
IsActive
(size_t) const
-
bool
IsStop
(size_t state) const
-
int
Next
(int state, size_t sym) const Return the new state after a symbol occurs.
-
int
Next
(int state, std::string sym_set) const Return the new state after a series of symbols.
-
using
-
template<int
Lexer Utilities¶
A set of utilities to convert between NFAs and DFAs.
- Note
This file is part of Empirical, https://github.com/devosoft/Empirical
- Copyright
Copyright (C) Michigan State University, MIT Software license; see doc/LICENSE.md
- Date
2016-2017
- Note
Status: BETA
-
namespace
emp
If we are in emscripten, make sure to include the header.
Functions
-
template<typename
T1
>
NFAMergeNFA
(T1 &&in) Merge multiple automata into one NFA (base case, single converstion)
-
template<typename
T1
, typenameT2
, typename ...Ts
>
NFAMergeNFA
(T1 &&in1, T2 &&in2, Ts&&... others) Merge multiple automata (DFA, NFA, RegEx) into one NFA.
-
struct
DFAStatus
- #include <lexer_utils.hpp>
Structure to track the current status of a DFA.
Public Functions
-
DFAStatus
(size_t _state, const std::string &_seq)
-
-
template<typename
Lexer¶
A general-purpose, fast lexer.
- Note
This file is part of Empirical, https://github.com/devosoft/Empirical
- Copyright
Copyright (C) Michigan State University, MIT Software license; see doc/LICENSE.md
- Date
2016-2019.
- Note
Status: ALPHA
-
namespace
emp
If we are in emscripten, make sure to include the header.
-
class
Lexer
- #include <Lexer.hpp>
A lexer with a set of token types (and associated regular expressions)
Public Functions
-
Lexer
()
-
~Lexer
()
-
size_t
GetNumTokens
() const How many types of tokens can be identified in this Lexer?
-
bool
TokenOK
(int id) const
-
int
AddToken
(const std::string &name, const std::string ®ex, bool save_lexeme = true, bool save_token = true, const std::string &desc = "") Add a new token, specified by a name and the regex used to identify it. Note that token ids count down with highest IDs having priority.
-
int
IgnoreToken
(const std::string &name, const std::string ®ex, const std::string &desc = "") Add a token to ignore, specified by a name and the regex used to identify it.
-
int
GetTokenID
(const std::string &name) const Get the ID associated with a token type (you provide the token name)
-
const TokenInfo &
GetTokenInfo
(int id) const Get the full information about a token (you provide the id)
-
std::string
GetTokenName
(int id) const Get the name associated with a token type (you provide the ID)
-
bool
GetSaveToken
(int id) const Identify if a token should be saved.
-
void
Generate
() const Create the NFA that will identify the current set of tokens in a sequence.
-
Token
Process
(std::istream &is) Get the next token found in an input stream. Do so by examining one character at a time. Keep going as long as there is a chance of a valid lexeme (since we want to choose the longest one we can find.) Every time we do hit a valid lexeme, store it as the current “best” and keep going. Once we hit a point where no other valid lexemes are possible, stop and return the best we’ve found so far.
-
emp::vector<Token>
Tokenize
(const emp::vector<std::string> &str_v) Turn a vector of strings into a vector of tokens.
-
const std::string &
GetLexeme
() Get the lexeme associated with the last token identified.
-
void
Print
(std::ostream &os = std::cout) const Print the full information about this lexer (for debugging)
-
void
DebugString
(std::string test_string) Try out the lexer on a string and demonstrate how it’s tokenized.
-
void
DebugToken
(int token_id) const
Private Members
-
int
cur_token_id
= MAX_ID Which ID should the next new token get?
-
bool
generate_lexer
= false Do we need to regenerate the lexer?
-
DFA
lexer_dfa
Table driven lexer implementation.
-
std::string
lexeme
Current state of lexeme being generated.
Private Static Attributes
-
constexpr int
MAX_ID
= 255 IDs count down so that first ones have priority.
-
constexpr int
ERROR_ID
= -1 Code for unknown token ID.
-
-
struct
Token
- #include <Lexer.hpp>
Information about a token instance from an input stream.
Public Functions
-
Token
(int id, const std::string &str = "", size_t _line = 0)
-
Token
(const Token&) = default
-
Token
(Token&&) = default
-
operator int
() const Token will automatically convert to its ID if used as an int.
-
operator const std::string&
() const Token will automatically convert to its matched sequence (lexeme) is used as a string.
Public Members
-
int
token_id
Which type of token is this?
-
std::string
lexeme
Sequence matched by this token (or empty if not saved)
-
size_t
line_id
Which line did this token start on?
-
-
struct
TokenInfo
- #include <Lexer.hpp>
Information about an individual TYPE of token to be processed within a Lexer.
Public Functions
-
TokenInfo
(const std::string &_name, const std::string &_regex, int _id, bool _save_l = true, bool _save_t = true, const std::string &_desc = "")
-
TokenInfo
(const TokenInfo&) = default
-
TokenInfo
(TokenInfo&&) = default
Public Members
-
std::string
name
Name of this token type.
-
std::string
desc
More detailed description of this token type.
-
RegEx
regex
Pattern to describe token type.
-
int
id
Unique id for token.
-
bool
save_lexeme
Preserve the lexeme for this token?
-
bool
save_token
Keep token at all? (Whitespace and comments are often discarded).
-
-
class
NonDeterministic Finite Automata¶
A Non-deterministic Finite Automata simulator.
To build a standard NFA, use
emp::NFA. If you want to have more symbols or more stop states, use emp::tNFA<S,T> where S is the number of symbols and T is the type used for stop. (defaults are 128 for ASCII-128 and uint8_t respectively.)- Note
This file is part of Empirical, https://github.com/devosoft/Empirical
- Copyright
Copyright (C) Michigan State University, MIT Software license; see doc/LICENSE.md
- Date
2016-2017
- Note
Status: BETA
The constructor can take as parameters the number of states and the id of the start state (both default to 0)
- Note
DFA’s use SetTransition(), but NFA’s use AddTransition. This distinction is intentional since in a DFA a second SetTransition with the same start state and symbol will override first, while in an NFA a second AddTransition will always add a new option.
-
namespace
emp
If we are in emscripten, make sure to include the header.
Typedefs
-
using
NFA_State
= tNFA_State<128, uint8_t> NFA_State is the most standard tNFA_State setup.
-
template<size_t
S
= 128, typenameSTOP_TYPE
= uint8_t>
classtNFA
- #include <NFA.hpp>
A dynamic NFA class, for easily building non-determanistic finite automata.
Public Functions
-
tNFA
(size_t num_states = 1, size_t start_state = 0)
-
~tNFA
()
-
size_t
GetSize
() const Return the current number of states.
-
const std::set<size_t> &
GetStart
() const Return start state and all others reachable through empty transitions.
-
std::set<size_t>
GetNext
(size_t sym, size_t from_id = 0) const Return the states reachable from the current state given the provided symbol.
-
std::set<size_t>
GetNext
(size_t sym, const std::set<size_t> from_set) const return the states reachable from the current set of states given the provided symbol.
-
bool
HasFreeTransitions
(size_t id) const Does the provided state have free transitions?
-
bool
HasSymTransitions
(size_t id) const Does the provided state have symbol-transitions?
-
opts_t
GetSymbolOptions
(const std::set<size_t> &test_set) const Return an emp::BitSet indicating the symbols available from the provided set of states.
-
void
Resize
(size_t new_size) Change the number of available states.
-
size_t
AddNewState
() Add a new state into the NFA and return its id.
-
void
AddTransitionSymbol
(size_t from, size_t to, size_t sym) Add a transition between states ‘from’ and ‘to’ that can be taken with the provided symbol.
-
void
AddTransition
(size_t from, size_t to, const char sym) Add a transition between states ‘from’ and ‘to’ that can be taken with a char symbol.
-
void
AddTransition
(size_t from, size_t to, const std::string &sym_set) Add a transition between states ‘from’ and ‘to’ that can be taken with the provided symbols.
-
void
AddTransition
(size_t from, size_t to, const char *sym_set) Add a transition between states ‘from’ and ‘to’ that can be taken with the provided symbols.
-
void
AddTransition
(size_t from, size_t to, const BitSet<NUM_SYMBOLS> &sym_set) Add a transition between states ‘from’ and ‘to’ that can be taken with the provided symbols.
-
void
AddFreeTransition
(size_t from, size_t to) Create a free transition between ‘from’ and ‘to’.
-
template<typename
T
= stop_t>
voidSetStop
(size_t state, T stop_val = 1) Set the specified state to be a stop state (with an optional stop value.)
-
stop_t
GetStop
(size_t state) const Get any stop value associated with the provided state.
-
bool
IsStart
(size_t state) const Test if NFA begins at provided state (may have free transitions to other states)
-
bool
IsStop
(size_t state) const Test if this state is a legal endpoint for the NFA.
-
bool
IsEmpty
(size_t state) const Test if this state has only empty transitions from it, and not stop state.
-
void
Merge
(const tNFA<NUM_SYMBOLS, STOP_TYPE> &nfa2) Merge another NFA into this one.
-
void
Print
() const Print information about this NFA (for debugging)
-
void
PrintFreeMoves
() Identify free moves in NFA (for debugging)
Public Static Attributes
-
constexpr const size_t
NUM_SYMBOLS
= S
Private Members
-
size_t
start
Main start state (others might be reached for free.)
-
struct
State
Public Functions
-
State
()
Public Members
-
std::map<size_t, Transition>
trans
What symbol transitions are available?
-
std::set<size_t>
free_to
What other states can you move to for free?
-
std::set<size_t>
free_from
What other states can move here for free?
-
-
struct
Transition
Public Functions
-
Transition
()
Public Members
-
opts_t
symbols
-
-
-
template<size_t
NUM_SYMBOLS
= 128, typenameSTOP_TYPE
= uint8_t>
classtNFA_State
- #include <NFA.hpp>
Information about the current full state (i.e., set of legal states) of an NFA.
Public Functions
-
tNFA_State
(const tNFA<NUM_SYMBOLS, STOP_TYPE> &_nfa)
-
~tNFA_State
()
-
const tNFA<NUM_SYMBOLS, STOP_TYPE> &
GetNFA
() const Get the NFA associated with this state.
-
const std::set<size_t> &
GetStateSet
() const Get a set of states that are currently active.
-
bool
IsActive
() const Are there currently any legal NFA states?
-
bool
IsStop
() const Can we legally stop in any of the current states?
-
bool
HasState
(size_t id) Is a particular NFA state currently included?
-
size_t
GetSize
() How many states are currently included?
-
void
SetStateSet
(const std::set<size_t> &in) Set the current states.
-
void
Reset
() Change current states to start + free transitions from start.
-
void
Next
(size_t sym) Update states given a new input symbol.
-
void
Next
(const std::string &sym_set) Update states given a new series of input symbols (as a string)
-
void
Print
() Print out current information about this NFA State (for debugging)
Private Members
-
const tNFA<NUM_SYMBOLS, STOP_TYPE> &
nfa
Which NFA is this state set associated with?
-
std::set<size_t>
state_set
Which states are currently legal?
-
-
using
Regular Expressions¶
Basic regular expression handler.
A fully (well, mostly) functional regular expression processor.
- Note
This file is part of Empirical, https://github.com/devosoft/Empirical
- Copyright
Copyright (C) Michigan State University, MIT Software license; see doc/LICENSE.md
- Date
2016-2019.
- Note
Status: BETA
Special chars: ‘|’ - or ‘*’ - zero or more of previous ‘+’ - one or more of previous ‘?’ - previous is optional ‘.’ - Match any character except
Plus the following group contents (and change may translation rules) ‘(’ and ‘)’ - group contents ‘”’ - Ignore special characters in contents (quotes still need to be escaped) ‘[’ and ‘]’ - character set
choose ONE character ‘^’ as first char negates contents ‘-’ indicates range UNLESS first or last.Additional overloads for functions in lexer_utils.h:
static NFA to_NFA(const RegEx & regex, int stop_id=1); static DFA to_DFA(const RegEx & regex);
-
namespace
emp
If we are in emscripten, make sure to include the header.
Functions
-
class
RegEx
- #include <RegEx.hpp>
A basic regular expression handler.
Public Functions
-
RegEx
() = delete
-
RegEx
(const std::string &r)
-
RegEx
(const RegEx &r)
-
~RegEx
()
-
std::string
AsString
() const Convert the RegEx to an standard string, readable from outsite this class.
-
void
AddToNFA
(NFA &nfa, size_t start, size_t stop) const Add this regex to an NFA being built.
-
void
Generate
() const Assume the RegEx is ready and setup processing for it.
-
bool
Test
(const std::string &str) const Test if a string statisfies this regex.
-
void
PrintInternal
() const For debugging: print the internal representation of the regex.
-
void
PrintNotes
() const For debugging: print any internal notes generated about this regex.
-
void
PrintDebug
() const Print general debuging information about this regex.
Private Types
-
using
opts_t
= BitSet<NUM_SYMBOLS>
Private Functions
-
template<typename ...
T
>
voidError
(T&&... args)
-
bool
EnsureNext
(char x) Make sure that there is another element in the RegEx (e.g., after an ‘|’) or else trigger and error to report the problem.
-
Ptr<re_charset>
ConstructSet
() Construct a character range.
Private Members
-
bool
valid
= true Set to false if regex cannot be processed.
-
size_t
pos
= 0 Position being read in regex.
-
bool
dfa_ready
= false Is the dfa ready? (or does it need to be generated?)
-
re_block
head
Private Static Attributes
-
constexpr size_t
NUM_SYMBOLS
= 128 Maximum number of symbol the RegEx can handle.
-
struct
re_base
Internal base representation of a portion of a regex.
-
struct
re_charset
: public emp::RegEx::re_base Representation of a character set e.g., [abc].
Public Functions
-
re_charset
()
-
re_charset
(char x, bool neg = false)
-
re_charset
(const std::string &s, bool neg = false)
-
void
Print
(std::ostream &os) const override
-
Ptr<re_charset>
AsCharSet
() override
-
size_t
GetSize
() const override
-
char
First
() const
-
void
AddToNFA
(NFA &nfa, size_t start, size_t stop) const override
Public Members
-
opts_t
char_set
-
-
struct
re_parent
: public emp::RegEx::re_base Intermediate base class for RegEx components that have children (such as “and” and “or”)
Public Functions
-
re_parent
()
-
~re_parent
()
-
void
Clear
()
-
size_t
GetSize
() const override
-
bool
Simplify
() override
-
-
struct
re_plus
: public emp::RegEx::re_parent Representations of one-or-more instances of a component. e.g., a+.
-
struct
re_qm
: public emp::RegEx::re_parent Representations of zero-or-one instances of a component. e.g., a?
-
struct
re_star
: public emp::RegEx::re_parent Representations of zero-or-more instances of a component. e.g., a*.
-
struct
re_string
: public emp::RegEx::re_base Representation of strings stored in a RegEx.
Public Functions
-
re_string
()
-
re_string
(char c)
-
re_string
(const std::string &s)
-
void
Print
(std::ostream &os) const override
-
size_t
GetSize
() const override
-
void
AddToNFA
(NFA &nfa, size_t start, size_t stop) const override
Public Members
-
std::string
str
-
-
-
class