fast package#
Submodules#
fast.cbfs module#
- class CBFS(num_queues: int, pop_idx: int = 0)[source]#
Bases:
object
Implements the CBFS algorithm <https://www.researchgate.net/publication/315650019_Cyclic_best_first_search_Using_contours_to_guide_branch-and-bound_algorithms_Cyclic_Best_First_Search>.
Constructor.
- Parameters:
num_queues (int) – A strictly positive
int
specifying the number of queues managed by thisCBFS
instance.pop_idx (int) – The index of the active queue. It must verify
0 <= pop_idx < self.num_queues
.
fast.density module#
- ast_density(ast: RegexpAst, map_len_proba: dict, char_proba: float, map_pa_infix_re: dict = None) float [source]#
Compute the density related to a given regular expression abstract syntax tree.
- Parameters:
ast (RegexpAst) – A
RegexpAst
instance.map_len_proba (dict) –
char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to
1 / len(dfa.alphabet())
?map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}
that maps eachPatternAutomaton
to its corresponding regular expression.
- Returns:
The density of
ast
.
- dfa_density(dfa: Automaton, length: int, char_proba: float)[source]#
Computes the ratio of the number of words of length
length
accepted by an automatondfa
, over the number of words of lengthlength
. This corresponds to the density ofa
if we restrict to the words of lengthlength
.- Parameters:
dfa (Automaton) – A
pybgl.Automaton
instance.length (int) – The considered length. Must be a positive integer.
char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to
1 / len(dfa.alphabet())
?
- Returns:
The density of
a
if we restrict to the words of lengthlength
.
fast.fast module#
- fast(examples: list, obj_id: int = 0, stop_condition: callable = <function <lambda>>, mutators: list = [<class 'fast.regexp_mutators.DisjunctionMutator'>, <class 'fast.regexp_mutators.BotMutator'>, <class 'fast.regexp_mutators.ActivateMutator'>, <class 'fast.regexp_mutators.UpDotMutator'>, <class 'fast.regexp_mutators.DownDotMutator'>, <class 'fast.regexp_mutators.BouncePlusMutator'>, <class 'fast.regexp_mutators.BounceQuestionMutator'>], *cls, **kwargs) list [source]#
Runs fAST algorithm to infer regular expression from a set of positive examples.
- Parameters:
examples (list) – A
set(str)
gathering the positive examples.obj_id (int) – A value among
{OBJ_ADD, OBJ_MULT, OBJ_TUPLE}
.stop (callable) – Stopping
callable(final_results: list, time_elapsed: float) -> bool
callback.mutators (list) – Defaults to
MUTATORS
.cls – see
objective_func.make_*_objective_funcfor_str
functions.kwargs – see
objective_func.make_*_objective_funcfor_str
functions.
- Returns:
The found regular expressions.
- fast_benchmark(map_relen_numre: dict = None, alphabet: list = None, num_examples: int = 10, *cls, **kwargs) float [source]#
Tests fAST on a custom benchmark by crafting a set of test regular expression. The goal is to test whether fAST finds the original (or a better) regular expression.
- Parameters:
map_relen_numre (dict) – A dictionary which maps for each length the number of regular expressions to be randomly crafted.
alphabet (list) – The list of symbols.
num_examples (int) – The number of positive samples to be generated for this evaluation.
cls – See the
fast()
function.kwargs – See the
fast()
function.
- fast_from_re(regexp: str, num_examples: int = 10, p_stop_final: float = 0.5, repeat: bool = True, max_sampling: int = 1000, *cls, **kwargs) list [source]#
Runs fAST algorithm to infer regular expression from a set of positive examples.
- Parameters:
regexp (str) – A regular expression.
num_examples (int) – Number of positive examples.
p_stop_final (float) – A
float
between0.0
and1.0
corresponding the probability to stop when reaching a final state of the automaton induced byregexp
. The higherp_stop_final
, the longer the examples.repeat (bool) – Pass True to rerun sampling in case of reject.
max_sampling (int) – Maximum number of sampling. Pass
None
to continue sampling until finding a non-rejected value. Note that if sample always returnsNone
, this results to an infinite loop.cls – see the py:func:fast function .
kwargs – see the py:func:fast function .
- Returns:
The found regular expressions.
- fast_from_strings(examples: list, objective_function: callable = None, stop_condition: callable = None, mutators: list = None, visitor: FindAstFromStringsDefaultVisitor = None) list [source]#
Regular expression inference algorithm.
- Parameters:
examples (list) – The list of input
PatternAutomaton
instances corresponding to each positive examples.objective_function (callable) – The objective function to be minimized.
stop_condition (callable) –
A
callable(candidate_solutions, elapsed_time) -> bool
function where:candidate_solutions (list): The list of current candidates solutions.
time_elapsed (float): The current execution time, in seconds.
the returned value is
True
iff the execution must be stopped.
mutators (list) – The list of Mutator instances used to update the ASTs.
- Returns:
A list of final solutions, once the queue is empty (dream on), or once the stop condition is called (more reasonable). A final solution is a tuple (h, prefix_regexp) where pref_reg: list is a prefix regular expression of the solution, and h is its objective function value.
- shortness_factor(examples: iter) float [source]#
Computes the normalization factor used to ensure the shortness is always in [0, 1].
- Parameters:
examples (set) – The set of positive samples
- Returns:
The normalization factor.
Notes:
The default behavior of find_ast is to choose n = 1. However, using n = 1 makes the shortness equal to the size of the inferred AST. As a sequel, the shortness is too prepondarant w.r.t. to the accuracy which is in [0.0, 1.0] and fAST waste a lot of energy to explore irrelevant ASTs. That is why we must normalize the shortness.
To do so, we should use n = sum(len(example) for example in examples). However, this scaling factor is too aggressive if len(examples) increases.
To circumvent the problem, we use n = max(len(example) for example in examples). Intuitively, this guarantees that 0 <= shortness(PTA(longest_example)) <= 1 which means that the ASTs we are searching have most of time a scaled shortness below 1. This scaling factor makes the shortness insensitive to the number of examples (only to the length of the longest example) and comparable with the accuracy.
fast.find_ast_visitor module#
- class FindAstFromStringsAggregateVisitor(visitors: list)[source]#
Bases:
FindAstFromStringsDefaultVisitor
Constructor.
- Parameters:
visitors (list) – A list of
FindAstFromStringsDefaultVisitor
instances.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsDefaultVisitor[source]#
Bases:
object
Constructor.
- visit_final_solution(obj_func_value: float, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples: list)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item: tuple)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsEvaluateVisitor[source]#
Bases:
FindAstFromStringsDefaultVisitor
Maintains metrics when running experiments
Constructor.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsVerboseVisitor(verbose_level=1)[source]#
Bases:
FindAstFromStringsDefaultVisitor
Insert useful debug message.
Constructor.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
fast.multi_grep module#
- class MultiGrepFonctor[source]#
Bases:
object
Base class used to customize the behavior of the
multi_grep()
function.
- class MultiGrepFonctorAll[source]#
Bases:
MultiGrepFonctor
The
MultiGrepFonctorAll
class catches (for each pattern Pi and for each index j) each substring w[j:k] matching Pi.
- class MultiGrepFonctorGreedy[source]#
Bases:
MultiGrepFonctorLargest
The
MultiGrepFonctorGreedy
class catches (for each pattern Pi and for each index j) the largest w[j′:k] matching Pi and s.t. j′ < j.
- class MultiGrepFonctorLargest[source]#
Bases:
MultiGrepFonctor
The
MultiGrepFonctorLargest
class catches (for each pattern Pi and for each index j) the largest w[j:k] matching Pi.
- multi_grep(w: str, map_name_dfa: list, callback: callable = <function <lambda>>)[source]#
Searches sub-strings of a string matched by multiple patterns.
- Parameters:
w (str) – A
str
containing the word to process.map_name_dfa (dict) – A
dict{Name: Automaton}
mapping each pattern name with its corresponding DFA.callback (callable) – A
callable(Name, int, int, str)
called wheneverw[j:k]
is matched by patternname
.
- multi_grep_with_delimiters(w: str, map_name_dfa: dict, callback: callable = <function <lambda>>, is_pattern_separator: callable = <function <lambda>>, is_pattern_left_separated: callable = <function <lambda>>, is_pattern_right_separated: callable = <function <lambda>>)[source]#
Searches sub-strings of a string matched by multiple patterns, possibly delimited by a predefined collection of separator patterns.
- Parameters:
w (str) – A
str
containing the word to process.map_name_dfa (dict) – A
dict{Name: Automaton}
mapping each pattern name with its corresponding DFA.callback (callable) – A
callable(Name, int, int, str)
called wheneverw[j:k]
is matched by patternname
.is_pattern_separator (callable) – A
callable(Name) -> bool
returnTrue
if the patternname
is a separator.is_pattern_left_separated (callable) – A
callable(Name) -> bool
returnTrue
if the patternname
must be preceeded by a separator pattern or located at the beginning ofw
.is_pattern_right_separated (callable) – A ̀``callable(Name) -> bool`` return
True
if the patternname
must be followed by a separator pattern or located at the end ofw
.
fast.objective_func module#
- make_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5) callable [source]#
Makes an additive objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomaton
instances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}
that maps eachPatternAutomaton
to its corresponding regular expression.size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.
density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_multiplicative_objective_func_for_str(examples, alphabet, map_pa_infix_re: dict = None, size_exponent=1, density_exponent=1)[source]#
Makes a multiplicative objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomaton
instances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}
that maps eachPatternAutomaton
to its corresponding regular expression.size_exponent (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_exponent, the more important the size of the inferred regular expression.
density_exponent (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_exponent, the more important the size of the inferred regular expression. Should be set to size_exponent - 1.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_normalized_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5)[source]#
Makes a normalized additive objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomaton
instances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}
that maps eachPatternAutomaton
to its corresponding regular expression.size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.
density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_tuple_based_objective_func_for_str(examples: list, alphabet: set, map_pa_infix_re: dict = None)[source]#
Makes a lexicographic objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomaton
instances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}
that maps eachPatternAutomaton
to its corresponding regular expression.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
fast.pattern_automaton module#
- class PatternAutomaton(word: str, map_name_dfa: dict, make_mg: callable = None, filtered_patterns: set = None)[source]#
Bases:
Automaton
A
PatternAutomaton
instance represents a string at the pattern level using a automaton-like structure, whose vertices corresponds to a subset of string indices, and arcs corresponds to infixes and their related pattern.Constructs the PatternAutomaton related to an input word according to a collection of patterns and according to a
multi_grep()
stategy.- Parameters:
word (str) – The input word.
map_name_dfa (dict) – A
dict{str: Automaton}
mapping each relevant type with its corresponding Automaton instance.filtered_patterns (set) – A subset (possibly empty) of
map_name_dfa.keys()
of type that must be catched my multi_grep, but not reflected as arcs in thisPatternAutomaton
instance. It can be used e.g. to drop spaces an get a smallerPatternAutomaton
, but the position of spaces in the original lines will be lost.
- get_infix(e: EdgeDescriptor) str [source]#
Retrieves the infix (substring) related to an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
- Returns:
The infix related to an arbitrary edge of this
PatternAutomaton
instance.
- get_slice(e) tuple [source]#
Retrieves the slice (pair of uint indices) related to an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
- Returns:
The slice related to an arbitrary edge of this
PatternAutomaton
instance.
- pattern_automaton_edge_weight(e: EdgeDescriptor, g: PatternAutomaton, map_name_density: dict = None) float [source]#
Gets the weight of an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
g (PatternAutomaton) – A
PatternAutomaton
instance.
- pattern_automaton_to_path(g: PatternAutomaton, *cls, **kwargs) list [source]#
Extracts from a
PatternAutomaton
instance the most relevant path (highlighting the most relevant pattern-based decomposition).- Parameters:
g (PatternAutomaton) – The considered
PatternAutomaton
instance.cls – See the
pybgl.dijkstra_shortest_path()
function.kwargs – See the
pybgl.dijkstra_shortest_path()
function.
fast.random module#
- random() x in the interval [0, 1). #
- random_ast(re_len: int, alphabet: list = None, map_operators: dict = {'*': Op(cardinality=1, precedence=4, associativity=1), '+': Op(cardinality=1, precedence=4, associativity=1), '.': Op(cardinality=2, precedence=2, associativity=1), '?': Op(cardinality=1, precedence=3, associativity=1), '|': Op(cardinality=2, precedence=1, associativity=1)}) tuple [source]#
Returns a random AST and its root node.
- Parameters:
re_len (int) – The number of nodes of the AST of the output regular expression.
alphabet (list) – A
list
gathering the character of the alphabet of the regular expressions. Default tolist('a...z')
.map_operators (dict) – A
dict{str: pybgl.shunting_yard_postfix.Op}
mapping each meta-character with its corresponding attributes. Defaults topybgl.shunting_yard_postfix.MAP_OPERATORS_RE
.
- Returns:
A
tuple(pybgl.shunting_yard_postfix.Ast, int)
gathering the output AST and its root node.
- random_word_from_automaton(g: Automaton, p: float = 0.5, repeat: bool = True, max_sampling: int = 1000) str [source]#
Perform a uniform random walk over an Automaton to generate a word accepted by this Automaton.
- Parameters:
g (Automaton) – A DFA.
p (float) – A value in [0, 1] corresponding to the probability to stop once a final state is reached
repeat (bool) – Pass
True
to try again while the sample is rejected.max_sampling (int) – Max number of reject samplings. If you pass
None
, the algorithm continues until finding a sample, but depending on p and g, this may result to an infinite loop.
- Returns:
A string representing the word discovered during the walk, or
None
if the walk led to a non-final state without successor.
- reject_sampling(sample, repeat: bool = True, max_sampling: int = 1000)[source]#
Reject sampling procedure.
- Parameters:
sample (callable) – A Callback() -> Result callback which draws a random sample. If it returns None, the sample is rejected.
repeat (bool) – Pass True to rerun sampling in case of reject.
max_sampling (int) – Maximum number of sampling. Pass None to continue sampling until finding a non-rejected value. Note that if sample always returns None, this results to an infinite loop.
- Returns:
The sampled instances,
None
otherwise.
fast.regexp module#
Functions used to craft automata from regular expressions.
- get_pattern_names() list [source]#
Retrieves the list of patterns involved in the default pattern collection.
- Returns:
A list of string, where each string correspond to a pattern name involved in the default pattern collection (
MAP_NAME_RE
).
- make_dfa_any(alphabet: iter = None, separator_alphabet: iter = None) Automaton [source]#
Builds the DFA corresponding to the any non-separator character.
- Parameters:
alphabet (iter) – The characters involved in the alphabet. Default to
string.printable
).separator (iter) – The characters corresponding to separators. Defaults to
{" ", "\t", "\n"}
.
- Returns:
The corresponding
Automaton
.
- make_dfa_empty() Automaton [source]#
Builds the
Automaton
corresponding to the empty language.- Returns:
The corresponding
Automaton
.
- make_map_name_dfa(names: iter = None) dict [source]#
Builds a dictionary that maps a list of pattern name with the corresponding
Automaton
instance built according to regular expressions.- Parameters:
names (list) – A list of string, where each string identifies a pattern names (by default, every keys of
map_name_re
is considered).- Returns:
The dictionary mapping each pattern name with its corresponding DFA.
- make_re_hex_digit(lower_case: bool = True, upper_case: bool = True) str [source]#
Builds the regular expression catching hexadecimal values.
- Parameters:
lower_case (bool) – Pass
False
to discard lower case values.upper_case (bool) – Pass
False
to discard upper case values.
- Returns:
The string storing the regular expression.
- make_re_ipv6(lower_case: bool = True, upper_case: bool = True) str [source]#
Builds the regular expression catching IPv6 addresses.
Note this is not an exact match contrary to
make_re_ipv6_strict
, but the resulting automaton is significantly faster (and should be accurate enough for most of practical use cases).- Parameters:
lower_case (bool) – Pass
False
to discard lower case values.upper_case (bool) – Pass
False
to discard upper case values.
- Returns:
The string storing the regular expression.
- make_re_ipv6_strict(*args, **kwargs) str [source]#
Builds the regular expression catching IPv6 addresses.
- Parameters:
args – See
make_re_hex_digit()
.kwargs – See
make_re_hex_digit()
.
- Returns:
The string storing the regular expression.
fast.regexp_ast module#
- class RegexpAst[source]#
Bases:
object
n-ary representations of a regular expression AST. Do not use binary AST otherwise the priority queue will explode! Note that calling simplify transforms an arbitrary AST to a n-ary AST.
Constructor.
- ROOT = 'root'#
- add_node(label: str) int [source]#
Add a new node to this AST.
- Parameters:
label (str) – The label of the added node.
- Returns:
The newly added vertex descriptor.
- append_child(u: int, v: int, set_parent: bool = True)[source]#
Appends a new child to a node
u
.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
set_parent (bool) – Pass
True
to update the parent ofv
. Defaults toTrue
.
- children(u: int) iter [source]#
Retrieves the children of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The parent of
u
if any orNone
.
- edges() iter [source]#
Retrieves an iterator over the edges of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAst
instance edges.
- epsilon_reachables(u: int, v: int = None) set [source]#
Retrieves the edges that are epsilon-reachable from a
(u, v)
arc. See alsoRegexpAst.epsilon_successors()
.- Parameters:
u (int) – The source of the arc.
v (int) – The target of the arc. If
u
is a leaf you may passNone
.
- Returns:
The set of out-edges of
(u', v')
arcs that are epsilon-reachable from the(u, v)
arc
- epsilon_successors(u: int, v: int) set [source]#
Retrieves the out-edges of
v
that are epsilon-reachable from a(u, v)
arc. See alsoRegexpAst.epsilon_reachables()
.- Parameters:
u (int) – The source of the arc.
v (int) – The target of the arc.
- Returns:
The set of out-edges of
(u', v')
arcs that are epsilon-reachable from the(u, v)
arc
- first_child(u: int) int [source]#
Retrieves the first child of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The vertex descriptor of the first child of
u
.- Raises:
IndexError –
- get_arc_index(u: int, v: int) int [source]#
If (u,v) is downwards, and v is the i^th child of u, returns i. If (u,v) is upwards, and u is the i^th child of v, returns i.
- is_ancestor_of(u: int, v: int) bool [source]#
Tests if
v
is an ancestor ofu
.- Parameters:
u (int) – The vertex descriptor of a node.
v (int) – The vertex descriptor of the candidate ancestor.
- Returns:
True
iffv
is an ancestor ofu
,False
otherwise.
- is_downwards_arc(u: int, v: int) bool [source]#
Checks whether an
(u, v)
arc is downward (i.e., ifv
is a child ofu
).- Parameters:
u (int) – The vertex descriptor of the source of the arc.
v (int) – The vertex descriptor of the target of the arc.
- Returns:
True
iff(u, v)
is downward.
- is_last_child(u: int, v: int) bool [source]#
Checks whether
u
is the last child ofv
.- Parameters:
u (int) – The vertex descriptor of the considered children.
v (int) – The vertex descriptor of the parent node of
u
.
- Returns:
True
if and only ifu
is the last child ofv
.
- is_leaf(u: int) bool [source]#
Checks whether a node is a leaf. It concerns nodes labeled by a symbol of the alphabet.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
True
iffu
is a leaf,False
otherwise.
- is_n_ary(u: int) bool [source]#
Checks whether a node is
n
-ary,n > 1
. It concerns nodes labeled by ann
-ary regular expression operator (i.e., “.”, “|”) and the root of this :py:class:RegexpAst
instance.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
True
iffu
isn
-ary,False
otherwise.
- is_unary(u: int) bool [source]#
Checks whether a node is unary. It concerns nodes labeled by an unary regular expression operator (i.e., “+”, “*”, “?”) and the root of this :py:class:
RegexpAst
instance.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
True
iffu
is unary,False
otherwise.
- is_upwards_arc(u: int, v: int) bool [source]#
Checks whether an
(u, v)
arc is upward (i.e., ifu
is a child ofv
).- Parameters:
u (int) – The vertex descriptor of the source of the arc.
v (int) – The vertex descriptor of the target of the arc.
- Returns:
True
iff(u, v)
is upward,False
otherwise.
- ith_child(u: int, i: int) int [source]#
Retrieves the
i
-th child of a given vertex.- Parameters:
u (int) – The vertex descriptor of the considered node.
i (int) – The child index.
- Returns:
The vertex descriptor of the
i
-th child ofu
.- Raises:
IndexError –
- label(u) str [source]#
Retrieves the label of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The label of
u
.
- num_children(u: int) int [source]#
Retrieves the number of children of
u
.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The number of children of
u
.
- out_edges(u: int) iter [source]#
Retrieves an iterator over the out-edges of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAst
instance vertices.
- parent(u: int) int [source]#
Retrieves the parent of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The parent of
u
if any orNone
.
- recognizes_pa(pa: PatternAutomaton, verbose: bool = False) bool [source]#
- recognizes_pa_prefix(pa: PatternAutomaton, target_pa_node: int, target_ast_leaf: int, u: int = None) bool [source]#
Tests whether there exists in this py:class:RegexpAst instance a path from its root to
target_ast_leaf
that corresponds to a path from the source of a givenPatternAutomaton
instance totarget_ast_leaf
.- Parameters:
pa (PatternAutomaton) – A
PatternAutomaton
instance (modeling a positive example).target_pa_node (int) – The vertex descriptor of the target node in
pa
.target_ast_leaf (int) – The vertex descriptor target leaf in
self
.u (int) – Pass
None
.
- Returns:
True
iff this py:class:RegexpAst instance matchespa
,False
otherwise.
- recognizes_prefix(prefix: list, leaf_to_end_on: int)[source]#
Tests whether there exists a valid walk from the root to a given leaf in this
RegexpAst
instance.- Parameters:
prefix (list) – The prefix (list of symbols).
leaf_to_end_on (int) – The vertex descriptor target leaf.
- Returns:
True
if there exists a walk from the root toleaf_to_end_on
that matchesprefix
,False
otherwise.
- recognizes_word(w: list) bool [source]#
Tests whether a word is matched by this
RegexpAst
instance.- Parameters:
w (list) – A list of symbols, where each symbol is a string (e.g. “a” or “$date”).
- Returns:
True
iff this py:class:RegexpAst instance matchesw
,False
otherwise.
- remove_node(u: int)[source]#
Removes a node from this
RegexpAst
instance.- Parameters:
u (int) – The vertex descriptor of the node to be removed.
- set_child(u: int, v: int, set_parent: bool = True)[source]#
Sets
v
as the only child ofu
.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
set_parent (bool) – Pass
True
to update the parent ofv
. Defaults toTrue
.
- set_children(u: int, vs: iter, set_parents: bool = True)[source]#
Sets the children of
u
.- Parameters:
u (int) – The vertex descriptor of the parent node.
vs (iter) – The vertex descriptors of the child nodes.
set_parent (bool) – Pass
True
to update the parent ofv
. Defaults toTrue
.
- set_ith_child(u: int, v: int, i: int, set_parent: bool = True)[source]#
Sets the
i
-th children ofu
tov
.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
i (int) – The child index. It must verify
0 < i <= self.num_children(u)
.set_parent (bool) – Pass
True
to update the parent ofv
. Defaults toTrue
.
- Raises:
IndexError –
- set_label(u: int, label: str)[source]#
Sets the label of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
label (str) – Set the label of a given node.
- simplify(active_leaf: int = None)[source]#
Applies the following simplifications on this
RegexpAst
instance.- Parameters:
active_leaf (int) – Pass None
- source(e: EdgeDescriptor) int [source]#
Retrieves the source of an arc.
- Parameters:
e (EdgeDescriptor) – The
EdgeDescriptor
of the arc.- Returns:
The vertex descriptor of the source of
e
.
- target(e: EdgeDescriptor) int [source]#
Retrieves the target of an arc.
- Parameters:
e (EdgeDescriptor) – The
EdgeDescriptor
of the arc.- Returns:
The vertex descriptor of the target of
e
.
- to_dot(**kwargs) str [source]#
Exports this
RegexpAst
to Graphviz format.- Returns:
The corresponding Graphviz string.
- to_infix_regexp_str(u=None) str [source]#
Builds the infix regular expression of this
RegexpAst
instance. See alsoRegexpAst.to_prefix_regexp_list()
.- Parameters:
u (int) – Pass
None
.- Returns:
A string storing the prefix regular expression modeling this
RegexpAst
instance.
- to_prefix_regexp_list(u: int = None) list [source]#
Builds the prefix regular expression of this
RegexpAst
instance. See alsoRegexpAst.to_prefix_regexp_str()
.- Parameters:
u (int) – Pass
None
.- Returns:
A string storing the prefix regular expression modeling this
RegexpAst
instance.
- to_prefix_regexp_str(u: int = None) str [source]#
Builds the prefix regular expression of this
RegexpAst
instance. See alsoRegexpAst.to_prefix_regexp_list()
.- Parameters:
u (int) – Pass
None
.- Returns:
A string storing the prefix regular expression modeling this
RegexpAst
instance.
- vertices() iter [source]#
Retrieves an iterator over the vertices of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAst
instance vertices.
- walk_one_char(u: int, a: str, verbose: bool = False) set [source]#
Performs a single character walk on this
RegexpAst
instance. Starts the walk on the node u (u must be a leaf or the root), and returns the set of leaves labeled bya
that are epsilon-reachable fromu
.- Parameters:
a (str) – A symbol, e.g. “a” or “$date”.
- walk_word(w: list, u: int = None) set [source]#
Walk on this
RegexpAst
instances by consuming the symbols involved in an input word.- Parameters:
u (int) – The vertex descriptor of the starting node. Pass
None
to start from the root of thisRegexpAst
instance. Defaults toNone
.w (list) – The input word, which is a list of symbols, where each symbol is a string (e.g. “a” or “$date”).
- Returns:
The set of vertex descriptor that are reachable from
u
by consumingw
fast.regexp_mutators module#
- class ActivateMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
‘Identity’ mutator. Does not modify the ast, only activates an existing leaf. Condition: v is a c-labeled leaf.
- class BotMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
‘Bot’ mutator. Is used to add a leaf to an empty tree. Condition: the ast is empty.
- class BouncePlusMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of +). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an upwards arc, and v, u is not epsilon-reachable from the current leaf.
- class BounceQuestionMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of ?). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an downwards arc, and v, u is not epsilon-reachable from the current leaf.
Constructor.
- class DisjunctionMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
Downwards mutator, inserting a new ‘or’ node and a new leaf labeled c. Condition: the arc u, v must go downwards.
- class DownDotMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
Upwards dot mutator. Starts by trying to inject in the ast a dot node as child of u, a new leaf as left child of it, and v as its right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and v. Condition: the arc u, v must go downwards.
- class UpDotMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator
Upwards dot mutator. Starts by trying to inject in the ast a dot node and a leaf as right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and the leaf Condition: the arc u, v must go upwards.
Module contents#
Top-level package.