fast package#
Submodules#
fast.cbfs module#
- class CBFS(num_queues: int, pop_idx: int = 0)[source]#
Bases:
objectImplements the CBFS algorithm <https://www.researchgate.net/publication/315650019_Cyclic_best_first_search_Using_contours_to_guide_branch-and-bound_algorithms_Cyclic_Best_First_Search>.
Constructor.
- Parameters:
num_queues (int) – A strictly positive
intspecifying the number of queues managed by thisCBFSinstance.pop_idx (int) – The index of the active queue. It must verify
0 <= pop_idx < self.num_queues.
fast.density module#
- ast_density(ast: RegexpAst, map_len_proba: dict, char_proba: float, map_pa_infix_re: dict = None) float[source]#
Compute the density related to a given regular expression abstract syntax tree.
- Parameters:
ast (RegexpAst) – A
RegexpAstinstance.map_len_proba (dict) –
char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to
1 / len(dfa.alphabet())?map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}that maps eachPatternAutomatonto its corresponding regular expression.
- Returns:
The density of
ast.
- dfa_density(dfa: Automaton, length: int, char_proba: float)[source]#
Computes the ratio of the number of words of length
lengthaccepted by an automatondfa, over the number of words of lengthlength. This corresponds to the density ofaif we restrict to the words of lengthlength.- Parameters:
dfa (Automaton) – A
pybgl.Automatoninstance.length (int) – The considered length. Must be a positive integer.
char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to
1 / len(dfa.alphabet())?
- Returns:
The density of
aif we restrict to the words of lengthlength.
fast.fast module#
- fast(examples: list, obj_id: int = 0, stop_condition: callable = <function <lambda>>, mutators: list = [<class 'fast.regexp_mutators.DisjunctionMutator'>, <class 'fast.regexp_mutators.BotMutator'>, <class 'fast.regexp_mutators.ActivateMutator'>, <class 'fast.regexp_mutators.UpDotMutator'>, <class 'fast.regexp_mutators.DownDotMutator'>, <class 'fast.regexp_mutators.BouncePlusMutator'>, <class 'fast.regexp_mutators.BounceQuestionMutator'>], *cls, **kwargs) list[source]#
Runs fAST algorithm to infer regular expression from a set of positive examples.
- Parameters:
examples (list) – A
set(str)gathering the positive examples.obj_id (int) – A value among
{OBJ_ADD, OBJ_MULT, OBJ_TUPLE}.stop (callable) – Stopping
callable(final_results: list, time_elapsed: float) -> boolcallback.mutators (list) – Defaults to
MUTATORS.cls – see
objective_func.make_*_objective_funcfor_strfunctions.kwargs – see
objective_func.make_*_objective_funcfor_strfunctions.
- Returns:
The found regular expressions.
- fast_benchmark(map_relen_numre: dict = None, alphabet: list = None, num_examples: int = 10, *cls, **kwargs) float[source]#
Tests fAST on a custom benchmark by crafting a set of test regular expression. The goal is to test whether fAST finds the original (or a better) regular expression.
- Parameters:
map_relen_numre (dict) – A dictionary which maps for each length the number of regular expressions to be randomly crafted.
alphabet (list) – The list of symbols.
num_examples (int) – The number of positive samples to be generated for this evaluation.
cls – See the
fast()function.kwargs – See the
fast()function.
- fast_from_re(regexp: str, num_examples: int = 10, p_stop_final: float = 0.5, repeat: bool = True, max_sampling: int = 1000, *cls, **kwargs) list[source]#
Runs fAST algorithm to infer regular expression from a set of positive examples.
- Parameters:
regexp (str) – A regular expression.
num_examples (int) – Number of positive examples.
p_stop_final (float) – A
floatbetween0.0and1.0corresponding the probability to stop when reaching a final state of the automaton induced byregexp. The higherp_stop_final, the longer the examples.repeat (bool) – Pass True to rerun sampling in case of reject.
max_sampling (int) – Maximum number of sampling. Pass
Noneto continue sampling until finding a non-rejected value. Note that if sample always returnsNone, this results to an infinite loop.cls – see the py:func:fast function .
kwargs – see the py:func:fast function .
- Returns:
The found regular expressions.
- fast_from_strings(examples: list, objective_function: callable = None, stop_condition: callable = None, mutators: list = None, visitor: FindAstFromStringsDefaultVisitor = None) list[source]#
Regular expression inference algorithm.
- Parameters:
examples (list) – The list of input
PatternAutomatoninstances corresponding to each positive examples.objective_function (callable) – The objective function to be minimized.
stop_condition (callable) –
A
callable(candidate_solutions, elapsed_time) -> boolfunction where:candidate_solutions (list): The list of current candidates solutions.
time_elapsed (float): The current execution time, in seconds.
the returned value is
Trueiff the execution must be stopped.
mutators (list) – The list of Mutator instances used to update the ASTs.
- Returns:
A list of final solutions, once the queue is empty (dream on), or once the stop condition is called (more reasonable). A final solution is a tuple (h, prefix_regexp) where pref_reg: list is a prefix regular expression of the solution, and h is its objective function value.
- shortness_factor(examples: iter) float[source]#
Computes the normalization factor used to ensure the shortness is always in [0, 1].
- Parameters:
examples (set) – The set of positive samples
- Returns:
The normalization factor.
Notes:
The default behavior of find_ast is to choose n = 1. However, using n = 1 makes the shortness equal to the size of the inferred AST. As a sequel, the shortness is too prepondarant w.r.t. to the accuracy which is in [0.0, 1.0] and fAST waste a lot of energy to explore irrelevant ASTs. That is why we must normalize the shortness.
To do so, we should use n = sum(len(example) for example in examples). However, this scaling factor is too aggressive if len(examples) increases.
To circumvent the problem, we use n = max(len(example) for example in examples). Intuitively, this guarantees that 0 <= shortness(PTA(longest_example)) <= 1 which means that the ASTs we are searching have most of time a scaled shortness below 1. This scaling factor makes the shortness insensitive to the number of examples (only to the length of the longest example) and comparable with the accuracy.
fast.find_ast_visitor module#
- class FindAstFromStringsAggregateVisitor(visitors: list)[source]#
Bases:
FindAstFromStringsDefaultVisitorConstructor.
- Parameters:
visitors (list) – A list of
FindAstFromStringsDefaultVisitorinstances.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsDefaultVisitor[source]#
Bases:
objectConstructor.
- visit_final_solution(obj_func_value: float, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples: list)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item: tuple)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsEvaluateVisitor[source]#
Bases:
FindAstFromStringsDefaultVisitorMaintains metrics when running experiments
Constructor.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
- visit_pop_item(pq_item)[source]#
Triggered when processing a mutant.
- Parameters:
pq_item – The popped mutant.
- visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#
Triggered when building a new mutant.
- Parameters:
mutator (Mutator) – The mutator that has produced the mutant.
new_depth (int) – The progression of the item (total number of processed characters).
new_pq_item (tuple) – The pushed mutant.
- class FindAstFromStringsVerboseVisitor(verbose_level=1)[source]#
Bases:
FindAstFromStringsDefaultVisitorInsert useful debug message.
Constructor.
- visit_final_solution(obj_func_value, ast: RegexpAst)[source]#
Triggered when a candidate solution is found.
- Parameters:
obj_func_value (float) – The objective function of the candidate solution.
ast (RegexpAst) – The candidate solution.
- visit_init_sample(examples)[source]#
Triggered when starting the fAST inference.
- Parameters:
examples (list) – The positive examples.
fast.multi_grep module#
- class MultiGrepFonctor[source]#
Bases:
objectBase class used to customize the behavior of the
multi_grep()function.
- class MultiGrepFonctorAll[source]#
Bases:
MultiGrepFonctorThe
MultiGrepFonctorAllclass catches (for each pattern Pi and for each index j) each substring w[j:k] matching Pi.
- class MultiGrepFonctorGreedy[source]#
Bases:
MultiGrepFonctorLargestThe
MultiGrepFonctorGreedyclass catches (for each pattern Pi and for each index j) the largest w[j′:k] matching Pi and s.t. j′ < j.
- class MultiGrepFonctorLargest[source]#
Bases:
MultiGrepFonctorThe
MultiGrepFonctorLargestclass catches (for each pattern Pi and for each index j) the largest w[j:k] matching Pi.
- multi_grep(w: str, map_name_dfa: list, callback: callable = <function <lambda>>)[source]#
Searches sub-strings of a string matched by multiple patterns.
- Parameters:
w (str) – A
strcontaining the word to process.map_name_dfa (dict) – A
dict{Name: Automaton}mapping each pattern name with its corresponding DFA.callback (callable) – A
callable(Name, int, int, str)called wheneverw[j:k]is matched by patternname.
- multi_grep_with_delimiters(w: str, map_name_dfa: dict, callback: callable = <function <lambda>>, is_pattern_separator: callable = <function <lambda>>, is_pattern_left_separated: callable = <function <lambda>>, is_pattern_right_separated: callable = <function <lambda>>)[source]#
Searches sub-strings of a string matched by multiple patterns, possibly delimited by a predefined collection of separator patterns.
- Parameters:
w (str) – A
strcontaining the word to process.map_name_dfa (dict) – A
dict{Name: Automaton}mapping each pattern name with its corresponding DFA.callback (callable) – A
callable(Name, int, int, str)called wheneverw[j:k]is matched by patternname.is_pattern_separator (callable) – A
callable(Name) -> boolreturnTrueif the patternnameis a separator.is_pattern_left_separated (callable) – A
callable(Name) -> boolreturnTrueif the patternnamemust be preceeded by a separator pattern or located at the beginning ofw.is_pattern_right_separated (callable) – A ̀``callable(Name) -> bool`` return
Trueif the patternnamemust be followed by a separator pattern or located at the end ofw.
fast.objective_func module#
- make_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5) callable[source]#
Makes an additive objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomatoninstances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}that maps eachPatternAutomatonto its corresponding regular expression.size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.
density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_multiplicative_objective_func_for_str(examples, alphabet, map_pa_infix_re: dict = None, size_exponent=1, density_exponent=1)[source]#
Makes a multiplicative objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomatoninstances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}that maps eachPatternAutomatonto its corresponding regular expression.size_exponent (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_exponent, the more important the size of the inferred regular expression.
density_exponent (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_exponent, the more important the size of the inferred regular expression. Should be set to size_exponent - 1.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_normalized_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5)[source]#
Makes a normalized additive objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomatoninstances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}that maps eachPatternAutomatonto its corresponding regular expression.size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.
density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
- make_tuple_based_objective_func_for_str(examples: list, alphabet: set, map_pa_infix_re: dict = None)[source]#
Makes a lexicographic objective function, finding a tradeoff between accuracy and shortness.
- Parameters:
examples (list) – A list of
PatternAutomatoninstances.alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).
map_pa_infix_re (dict) – A
dict{PatternAutomaton : str}that maps eachPatternAutomatonto its corresponding regular expression.
- Returns:
The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.
fast.pattern_automaton module#
- class PatternAutomaton(word: str, map_name_dfa: dict, make_mg: callable = None, filtered_patterns: set = None)[source]#
Bases:
AutomatonA
PatternAutomatoninstance represents a string at the pattern level using a automaton-like structure, whose vertices corresponds to a subset of string indices, and arcs corresponds to infixes and their related pattern.Constructs the PatternAutomaton related to an input word according to a collection of patterns and according to a
multi_grep()stategy.- Parameters:
word (str) – The input word.
map_name_dfa (dict) – A
dict{str: Automaton}mapping each relevant type with its corresponding Automaton instance.filtered_patterns (set) – A subset (possibly empty) of
map_name_dfa.keys()of type that must be catched my multi_grep, but not reflected as arcs in thisPatternAutomatoninstance. It can be used e.g. to drop spaces an get a smallerPatternAutomaton, but the position of spaces in the original lines will be lost.
- get_infix(e: EdgeDescriptor) str[source]#
Retrieves the infix (substring) related to an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
- Returns:
The infix related to an arbitrary edge of this
PatternAutomatoninstance.
- get_slice(e) tuple[source]#
Retrieves the slice (pair of uint indices) related to an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
- Returns:
The slice related to an arbitrary edge of this
PatternAutomatoninstance.
- pattern_automaton_edge_weight(e: EdgeDescriptor, g: PatternAutomaton, map_name_density: dict = None) float[source]#
Gets the weight of an edge.
- Parameters:
e (EdgeDescriptor) – The considered edge.
g (PatternAutomaton) – A
PatternAutomatoninstance.
- pattern_automaton_to_path(g: PatternAutomaton, *cls, **kwargs) list[source]#
Extracts from a
PatternAutomatoninstance the most relevant path (highlighting the most relevant pattern-based decomposition).- Parameters:
g (PatternAutomaton) – The considered
PatternAutomatoninstance.cls – See the
pybgl.dijkstra_shortest_path()function.kwargs – See the
pybgl.dijkstra_shortest_path()function.
fast.random module#
- random() x in the interval [0, 1).#
- random_ast(re_len: int, alphabet: list = None, map_operators: dict = {'*': Op(cardinality=1, precedence=4, associativity=1), '+': Op(cardinality=1, precedence=4, associativity=1), '.': Op(cardinality=2, precedence=2, associativity=1), '?': Op(cardinality=1, precedence=3, associativity=1), '|': Op(cardinality=2, precedence=1, associativity=1)}) tuple[source]#
Returns a random AST and its root node.
- Parameters:
re_len (int) – The number of nodes of the AST of the output regular expression.
alphabet (list) – A
listgathering the character of the alphabet of the regular expressions. Default tolist('a...z').map_operators (dict) – A
dict{str: pybgl.shunting_yard_postfix.Op}mapping each meta-character with its corresponding attributes. Defaults topybgl.shunting_yard_postfix.MAP_OPERATORS_RE.
- Returns:
A
tuple(pybgl.shunting_yard_postfix.Ast, int)gathering the output AST and its root node.
- random_word_from_automaton(g: Automaton, p: float = 0.5, repeat: bool = True, max_sampling: int = 1000) str[source]#
Perform a uniform random walk over an Automaton to generate a word accepted by this Automaton.
- Parameters:
g (Automaton) – A DFA.
p (float) – A value in [0, 1] corresponding to the probability to stop once a final state is reached
repeat (bool) – Pass
Trueto try again while the sample is rejected.max_sampling (int) – Max number of reject samplings. If you pass
None, the algorithm continues until finding a sample, but depending on p and g, this may result to an infinite loop.
- Returns:
A string representing the word discovered during the walk, or
Noneif the walk led to a non-final state without successor.
- reject_sampling(sample, repeat: bool = True, max_sampling: int = 1000)[source]#
Reject sampling procedure.
- Parameters:
sample (callable) – A Callback() -> Result callback which draws a random sample. If it returns None, the sample is rejected.
repeat (bool) – Pass True to rerun sampling in case of reject.
max_sampling (int) – Maximum number of sampling. Pass None to continue sampling until finding a non-rejected value. Note that if sample always returns None, this results to an infinite loop.
- Returns:
The sampled instances,
Noneotherwise.
fast.regexp module#
Functions used to craft automata from regular expressions.
- get_pattern_names() list[source]#
Retrieves the list of patterns involved in the default pattern collection.
- Returns:
A list of string, where each string correspond to a pattern name involved in the default pattern collection (
MAP_NAME_RE).
- make_dfa_any(alphabet: iter = None, separator_alphabet: iter = None) Automaton[source]#
Builds the DFA corresponding to the any non-separator character.
- Parameters:
alphabet (iter) – The characters involved in the alphabet. Default to
string.printable).separator (iter) – The characters corresponding to separators. Defaults to
{" ", "\t", "\n"}.
- Returns:
The corresponding
Automaton.
- make_dfa_empty() Automaton[source]#
Builds the
Automatoncorresponding to the empty language.- Returns:
The corresponding
Automaton.
- make_map_name_dfa(names: iter = None) dict[source]#
Builds a dictionary that maps a list of pattern name with the corresponding
Automatoninstance built according to regular expressions.- Parameters:
names (list) – A list of string, where each string identifies a pattern names (by default, every keys of
map_name_reis considered).- Returns:
The dictionary mapping each pattern name with its corresponding DFA.
- make_re_hex_digit(lower_case: bool = True, upper_case: bool = True) str[source]#
Builds the regular expression catching hexadecimal values.
- Parameters:
lower_case (bool) – Pass
Falseto discard lower case values.upper_case (bool) – Pass
Falseto discard upper case values.
- Returns:
The string storing the regular expression.
- make_re_ipv6(lower_case: bool = True, upper_case: bool = True) str[source]#
Builds the regular expression catching IPv6 addresses.
Note this is not an exact match contrary to
make_re_ipv6_strict, but the resulting automaton is significantly faster (and should be accurate enough for most of practical use cases).- Parameters:
lower_case (bool) – Pass
Falseto discard lower case values.upper_case (bool) – Pass
Falseto discard upper case values.
- Returns:
The string storing the regular expression.
- make_re_ipv6_strict(*args, **kwargs) str[source]#
Builds the regular expression catching IPv6 addresses.
- Parameters:
args – See
make_re_hex_digit().kwargs – See
make_re_hex_digit().
- Returns:
The string storing the regular expression.
fast.regexp_ast module#
- class RegexpAst[source]#
Bases:
objectn-ary representations of a regular expression AST. Do not use binary AST otherwise the priority queue will explode! Note that calling simplify transforms an arbitrary AST to a n-ary AST.
Constructor.
- ROOT = 'root'#
- add_node(label: str) int[source]#
Add a new node to this AST.
- Parameters:
label (str) – The label of the added node.
- Returns:
The newly added vertex descriptor.
- append_child(u: int, v: int, set_parent: bool = True)[source]#
Appends a new child to a node
u.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
set_parent (bool) – Pass
Trueto update the parent ofv. Defaults toTrue.
- children(u: int) iter[source]#
Retrieves the children of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The parent of
uif any orNone.
- edges() iter[source]#
Retrieves an iterator over the edges of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAstinstance edges.
- epsilon_reachables(u: int, v: int = None) set[source]#
Retrieves the edges that are epsilon-reachable from a
(u, v)arc. See alsoRegexpAst.epsilon_successors().- Parameters:
u (int) – The source of the arc.
v (int) – The target of the arc. If
uis a leaf you may passNone.
- Returns:
The set of out-edges of
(u', v')arcs that are epsilon-reachable from the(u, v)arc
- epsilon_successors(u: int, v: int) set[source]#
Retrieves the out-edges of
vthat are epsilon-reachable from a(u, v)arc. See alsoRegexpAst.epsilon_reachables().- Parameters:
u (int) – The source of the arc.
v (int) – The target of the arc.
- Returns:
The set of out-edges of
(u', v')arcs that are epsilon-reachable from the(u, v)arc
- first_child(u: int) int[source]#
Retrieves the first child of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The vertex descriptor of the first child of
u.- Raises:
IndexError –
- get_arc_index(u: int, v: int) int[source]#
If (u,v) is downwards, and v is the i^th child of u, returns i. If (u,v) is upwards, and u is the i^th child of v, returns i.
- is_ancestor_of(u: int, v: int) bool[source]#
Tests if
vis an ancestor ofu.- Parameters:
u (int) – The vertex descriptor of a node.
v (int) – The vertex descriptor of the candidate ancestor.
- Returns:
Trueiffvis an ancestor ofu,Falseotherwise.
- is_downwards_arc(u: int, v: int) bool[source]#
Checks whether an
(u, v)arc is downward (i.e., ifvis a child ofu).- Parameters:
u (int) – The vertex descriptor of the source of the arc.
v (int) – The vertex descriptor of the target of the arc.
- Returns:
Trueiff(u, v)is downward.
- is_last_child(u: int, v: int) bool[source]#
Checks whether
uis the last child ofv.- Parameters:
u (int) – The vertex descriptor of the considered children.
v (int) – The vertex descriptor of the parent node of
u.
- Returns:
Trueif and only ifuis the last child ofv.
- is_leaf(u: int) bool[source]#
Checks whether a node is a leaf. It concerns nodes labeled by a symbol of the alphabet.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
Trueiffuis a leaf,Falseotherwise.
- is_n_ary(u: int) bool[source]#
Checks whether a node is
n-ary,n > 1. It concerns nodes labeled by ann-ary regular expression operator (i.e., “.”, “|”) and the root of this :py:class:RegexpAstinstance.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
Trueiffuisn-ary,Falseotherwise.
- is_unary(u: int) bool[source]#
Checks whether a node is unary. It concerns nodes labeled by an unary regular expression operator (i.e., “+”, “*”, “?”) and the root of this :py:class:
RegexpAstinstance.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
Trueiffuis unary,Falseotherwise.
- is_upwards_arc(u: int, v: int) bool[source]#
Checks whether an
(u, v)arc is upward (i.e., ifuis a child ofv).- Parameters:
u (int) – The vertex descriptor of the source of the arc.
v (int) – The vertex descriptor of the target of the arc.
- Returns:
Trueiff(u, v)is upward,Falseotherwise.
- ith_child(u: int, i: int) int[source]#
Retrieves the
i-th child of a given vertex.- Parameters:
u (int) – The vertex descriptor of the considered node.
i (int) – The child index.
- Returns:
The vertex descriptor of the
i-th child ofu.- Raises:
IndexError –
- label(u) str[source]#
Retrieves the label of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The label of
u.
- num_children(u: int) int[source]#
Retrieves the number of children of
u.- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The number of children of
u.
- out_edges(u: int) iter[source]#
Retrieves an iterator over the out-edges of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAstinstance vertices.
- parent(u: int) int[source]#
Retrieves the parent of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
- Returns:
The parent of
uif any orNone.
- recognizes_pa(pa: PatternAutomaton, verbose: bool = False) bool[source]#
- recognizes_pa_prefix(pa: PatternAutomaton, target_pa_node: int, target_ast_leaf: int, u: int = None) bool[source]#
Tests whether there exists in this py:class:RegexpAst instance a path from its root to
target_ast_leafthat corresponds to a path from the source of a givenPatternAutomatoninstance totarget_ast_leaf.- Parameters:
pa (PatternAutomaton) – A
PatternAutomatoninstance (modeling a positive example).target_pa_node (int) – The vertex descriptor of the target node in
pa.target_ast_leaf (int) – The vertex descriptor target leaf in
self.u (int) – Pass
None.
- Returns:
Trueiff this py:class:RegexpAst instance matchespa,Falseotherwise.
- recognizes_prefix(prefix: list, leaf_to_end_on: int)[source]#
Tests whether there exists a valid walk from the root to a given leaf in this
RegexpAstinstance.- Parameters:
prefix (list) – The prefix (list of symbols).
leaf_to_end_on (int) – The vertex descriptor target leaf.
- Returns:
Trueif there exists a walk from the root toleaf_to_end_onthat matchesprefix,Falseotherwise.
- recognizes_word(w: list) bool[source]#
Tests whether a word is matched by this
RegexpAstinstance.- Parameters:
w (list) – A list of symbols, where each symbol is a string (e.g. “a” or “$date”).
- Returns:
Trueiff this py:class:RegexpAst instance matchesw,Falseotherwise.
- remove_node(u: int)[source]#
Removes a node from this
RegexpAstinstance.- Parameters:
u (int) – The vertex descriptor of the node to be removed.
- set_child(u: int, v: int, set_parent: bool = True)[source]#
Sets
vas the only child ofu.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
set_parent (bool) – Pass
Trueto update the parent ofv. Defaults toTrue.
- set_children(u: int, vs: iter, set_parents: bool = True)[source]#
Sets the children of
u.- Parameters:
u (int) – The vertex descriptor of the parent node.
vs (iter) – The vertex descriptors of the child nodes.
set_parent (bool) – Pass
Trueto update the parent ofv. Defaults toTrue.
- set_ith_child(u: int, v: int, i: int, set_parent: bool = True)[source]#
Sets the
i-th children ofutov.- Parameters:
u (int) – The vertex descriptor of the parent node.
v (int) – The vertex descriptor of the child node.
i (int) – The child index. It must verify
0 < i <= self.num_children(u).set_parent (bool) – Pass
Trueto update the parent ofv. Defaults toTrue.
- Raises:
IndexError –
- set_label(u: int, label: str)[source]#
Sets the label of a given vertex.
- Parameters:
u (int) – The vertex descriptor of the considered node.
label (str) – Set the label of a given node.
- simplify(active_leaf: int = None)[source]#
Applies the following simplifications on this
RegexpAstinstance.- Parameters:
active_leaf (int) – Pass None
- source(e: EdgeDescriptor) int[source]#
Retrieves the source of an arc.
- Parameters:
e (EdgeDescriptor) – The
EdgeDescriptorof the arc.- Returns:
The vertex descriptor of the source of
e.
- target(e: EdgeDescriptor) int[source]#
Retrieves the target of an arc.
- Parameters:
e (EdgeDescriptor) – The
EdgeDescriptorof the arc.- Returns:
The vertex descriptor of the target of
e.
- to_dot(**kwargs) str[source]#
Exports this
RegexpAstto Graphviz format.- Returns:
The corresponding Graphviz string.
- to_infix_regexp_str(u=None) str[source]#
Builds the infix regular expression of this
RegexpAstinstance. See alsoRegexpAst.to_prefix_regexp_list().- Parameters:
u (int) – Pass
None.- Returns:
A string storing the prefix regular expression modeling this
RegexpAstinstance.
- to_prefix_regexp_list(u: int = None) list[source]#
Builds the prefix regular expression of this
RegexpAstinstance. See alsoRegexpAst.to_prefix_regexp_str().- Parameters:
u (int) – Pass
None.- Returns:
A string storing the prefix regular expression modeling this
RegexpAstinstance.
- to_prefix_regexp_str(u: int = None) str[source]#
Builds the prefix regular expression of this
RegexpAstinstance. See alsoRegexpAst.to_prefix_regexp_list().- Parameters:
u (int) – Pass
None.- Returns:
A string storing the prefix regular expression modeling this
RegexpAstinstance.
- vertices() iter[source]#
Retrieves an iterator over the vertices of this py:class:RegexpAst instance.
- Returns:
An iterator over this
RegexpAstinstance vertices.
- walk_one_char(u: int, a: str, verbose: bool = False) set[source]#
Performs a single character walk on this
RegexpAstinstance. Starts the walk on the node u (u must be a leaf or the root), and returns the set of leaves labeled byathat are epsilon-reachable fromu.- Parameters:
a (str) – A symbol, e.g. “a” or “$date”.
- walk_word(w: list, u: int = None) set[source]#
Walk on this
RegexpAstinstances by consuming the symbols involved in an input word.- Parameters:
u (int) – The vertex descriptor of the starting node. Pass
Noneto start from the root of thisRegexpAstinstance. Defaults toNone.w (list) – The input word, which is a list of symbols, where each symbol is a string (e.g. “a” or “$date”).
- Returns:
The set of vertex descriptor that are reachable from
uby consumingw
fast.regexp_mutators module#
- class ActivateMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator‘Identity’ mutator. Does not modify the ast, only activates an existing leaf. Condition: v is a c-labeled leaf.
- class BotMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator‘Bot’ mutator. Is used to add a leaf to an empty tree. Condition: the ast is empty.
- class BouncePlusMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of +). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an upwards arc, and v, u is not epsilon-reachable from the current leaf.
- class BounceQuestionMutator(mutators_to_bounce_on)[source]#
Bases:
Mutator‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of ?). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an downwards arc, and v, u is not epsilon-reachable from the current leaf.
Constructor.
- class DisjunctionMutator(mutators_to_bounce_on)[source]#
Bases:
MutatorDownwards mutator, inserting a new ‘or’ node and a new leaf labeled c. Condition: the arc u, v must go downwards.
- class DownDotMutator(mutators_to_bounce_on)[source]#
Bases:
MutatorUpwards dot mutator. Starts by trying to inject in the ast a dot node as child of u, a new leaf as left child of it, and v as its right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and v. Condition: the arc u, v must go downwards.
- class UpDotMutator(mutators_to_bounce_on)[source]#
Bases:
MutatorUpwards dot mutator. Starts by trying to inject in the ast a dot node and a leaf as right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and the leaf Condition: the arc u, v must go upwards.
Module contents#
Top-level package.