fast package#

Submodules#

fast.cbfs module#

class CBFS(num_queues: int, pop_idx: int = 0)[source]#

Bases: object

Implements the CBFS algorithm <https://www.researchgate.net/publication/315650019_Cyclic_best_first_search_Using_contours_to_guide_branch-and-bound_algorithms_Cyclic_Best_First_Search>.

Constructor.

Parameters:
  • num_queues (int) – A strictly positive int specifying the number of queues managed by this CBFS instance.

  • pop_idx (int) – The index of the active queue. It must verify 0 <= pop_idx < self.num_queues.

is_empty() bool[source]#

Checks whether this CBFS instance contains at least one item.

pop() object[source]#

Pops an item from this CBFS instance.

Returns:

The popped item.

push(item: object, push_idx: int)[source]#

Pushes a new item to this CBFS instance.

Parameters:
  • item (object) – The item to be pushed

  • push_idx (int) – The queue index where the item must be pushed. It must verifies 0 <= push_idx < self.num_queues.

fast.density module#

ast_density(ast: RegexpAst, map_len_proba: dict, char_proba: float, map_pa_infix_re: dict = None) float[source]#

Compute the density related to a given regular expression abstract syntax tree.

Parameters:
  • ast (RegexpAst) – A RegexpAst instance.

  • map_len_proba (dict) –

  • char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to 1 / len(dfa.alphabet()) ?

  • map_pa_infix_re (dict) – A dict{PatternAutomaton : str} that maps each PatternAutomaton to its corresponding regular expression.

Returns:

The density of ast.

dfa_density(dfa: Automaton, length: int, char_proba: float)[source]#

Computes the ratio of the number of words of length length accepted by an automaton dfa, over the number of words of length length. This corresponds to the density of a if we restrict to the words of length length.

Parameters:
  • dfa (Automaton) – A pybgl.Automaton instance.

  • length (int) – The considered length. Must be a positive integer.

  • char_proba (float) – The probability to pick a given character in the alphabet. Should probably be set to 1 / len(dfa.alphabet()) ?

Returns:

The density of a if we restrict to the words of length length.

fast.fast module#

fast(examples: list, obj_id: int = 0, stop_condition: callable = <function <lambda>>, mutators: list = [<class 'fast.regexp_mutators.DisjunctionMutator'>, <class 'fast.regexp_mutators.BotMutator'>, <class 'fast.regexp_mutators.ActivateMutator'>, <class 'fast.regexp_mutators.UpDotMutator'>, <class 'fast.regexp_mutators.DownDotMutator'>, <class 'fast.regexp_mutators.BouncePlusMutator'>, <class 'fast.regexp_mutators.BounceQuestionMutator'>], *cls, **kwargs) list[source]#

Runs fAST algorithm to infer regular expression from a set of positive examples.

Parameters:
  • examples (list) – A set(str) gathering the positive examples.

  • obj_id (int) – A value among {OBJ_ADD, OBJ_MULT, OBJ_TUPLE}.

  • stop (callable) – Stopping callable(final_results: list, time_elapsed: float) -> bool callback.

  • mutators (list) – Defaults to MUTATORS.

  • cls – see objective_func.make_*_objective_funcfor_str functions.

  • kwargs – see objective_func.make_*_objective_funcfor_str functions.

Returns:

The found regular expressions.

fast_benchmark(map_relen_numre: dict = None, alphabet: list = None, num_examples: int = 10, *cls, **kwargs) float[source]#

Tests fAST on a custom benchmark by crafting a set of test regular expression. The goal is to test whether fAST finds the original (or a better) regular expression.

Parameters:
  • map_relen_numre (dict) – A dictionary which maps for each length the number of regular expressions to be randomly crafted.

  • alphabet (list) – The list of symbols.

  • num_examples (int) – The number of positive samples to be generated for this evaluation.

  • cls – See the fast() function.

  • kwargs – See the fast() function.

fast_from_re(regexp: str, num_examples: int = 10, p_stop_final: float = 0.5, repeat: bool = True, max_sampling: int = 1000, *cls, **kwargs) list[source]#

Runs fAST algorithm to infer regular expression from a set of positive examples.

Parameters:
  • regexp (str) – A regular expression.

  • num_examples (int) – Number of positive examples.

  • p_stop_final (float) – A float between 0.0 and 1.0 corresponding the probability to stop when reaching a final state of the automaton induced by regexp. The higher p_stop_final, the longer the examples.

  • repeat (bool) – Pass True to rerun sampling in case of reject.

  • max_sampling (int) – Maximum number of sampling. Pass None to continue sampling until finding a non-rejected value. Note that if sample always returns None, this results to an infinite loop.

  • cls – see the py:func:fast function .

  • kwargs – see the py:func:fast function .

Returns:

The found regular expressions.

fast_from_strings(examples: list, objective_function: callable = None, stop_condition: callable = None, mutators: list = None, visitor: FindAstFromStringsDefaultVisitor = None) list[source]#

Regular expression inference algorithm.

Parameters:
  • examples (list) – The list of input PatternAutomaton instances corresponding to each positive examples.

  • objective_function (callable) – The objective function to be minimized.

  • stop_condition (callable) –

    A callable(candidate_solutions, elapsed_time) -> bool function where:

    • candidate_solutions (list): The list of current candidates solutions.

    • time_elapsed (float): The current execution time, in seconds.

    • the returned value is True iff the execution must be stopped.

  • mutators (list) – The list of Mutator instances used to update the ASTs.

Returns:

A list of final solutions, once the queue is empty (dream on), or once the stop condition is called (more reasonable). A final solution is a tuple (h, prefix_regexp) where pref_reg: list is a prefix regular expression of the solution, and h is its objective function value.

shortness_factor(examples: iter) float[source]#

Computes the normalization factor used to ensure the shortness is always in [0, 1].

Parameters:

examples (set) – The set of positive samples

Returns:

The normalization factor.

Notes:

  • The default behavior of find_ast is to choose n = 1. However, using n = 1 makes the shortness equal to the size of the inferred AST. As a sequel, the shortness is too prepondarant w.r.t. to the accuracy which is in [0.0, 1.0] and fAST waste a lot of energy to explore irrelevant ASTs. That is why we must normalize the shortness.

  • To do so, we should use n = sum(len(example) for example in examples). However, this scaling factor is too aggressive if len(examples) increases.

  • To circumvent the problem, we use n = max(len(example) for example in examples). Intuitively, this guarantees that 0 <= shortness(PTA(longest_example)) <= 1 which means that the ASTs we are searching have most of time a scaled shortness below 1. This scaling factor makes the shortness insensitive to the number of examples (only to the length of the longest example) and comparable with the accuracy.

fast.find_ast_visitor module#

class FindAstFromStringsAggregateVisitor(visitors: list)[source]#

Bases: FindAstFromStringsDefaultVisitor

Constructor.

Parameters:

visitors (list) – A list of FindAstFromStringsDefaultVisitor instances.

visit_end_sample()[source]#

Triggered when a positive example is totally processed.

visit_final_solution(obj_func_value, ast: RegexpAst)[source]#

Triggered when a candidate solution is found.

Parameters:
  • obj_func_value (float) – The objective function of the candidate solution.

  • ast (RegexpAst) – The candidate solution.

visit_init_sample(examples)[source]#

Triggered when starting the fAST inference.

Parameters:

examples (list) – The positive examples.

visit_pop_item(pq_item)[source]#

Triggered when processing a mutant.

Parameters:

pq_item – The popped mutant.

visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#

Triggered when building a new mutant.

Parameters:
  • mutator (Mutator) – The mutator that has produced the mutant.

  • new_depth (int) – The progression of the item (total number of processed characters).

  • new_pq_item (tuple) – The pushed mutant.

class FindAstFromStringsDefaultVisitor[source]#

Bases: object

Constructor.

visit_end_sample()[source]#

Triggered when a positive example is totally processed.

visit_final_solution(obj_func_value: float, ast: RegexpAst)[source]#

Triggered when a candidate solution is found.

Parameters:
  • obj_func_value (float) – The objective function of the candidate solution.

  • ast (RegexpAst) – The candidate solution.

visit_init_sample(examples: list)[source]#

Triggered when starting the fAST inference.

Parameters:

examples (list) – The positive examples.

visit_pop_item(pq_item: tuple)[source]#

Triggered when processing a mutant.

Parameters:

pq_item – The popped mutant.

visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#

Triggered when building a new mutant.

Parameters:
  • mutator (Mutator) – The mutator that has produced the mutant.

  • new_depth (int) – The progression of the item (total number of processed characters).

  • new_pq_item (tuple) – The pushed mutant.

class FindAstFromStringsEvaluateVisitor[source]#

Bases: FindAstFromStringsDefaultVisitor

Maintains metrics when running experiments

Constructor.

visit_end_sample()[source]#

Triggered when a positive example is totally processed.

visit_final_solution(obj_func_value, ast: RegexpAst)[source]#

Triggered when a candidate solution is found.

Parameters:
  • obj_func_value (float) – The objective function of the candidate solution.

  • ast (RegexpAst) – The candidate solution.

visit_init_sample(examples)[source]#

Triggered when starting the fAST inference.

Parameters:

examples (list) – The positive examples.

visit_pop_item(pq_item)[source]#

Triggered when processing a mutant.

Parameters:

pq_item – The popped mutant.

visit_push_item(mutator: Mutator, new_depth: int, new_pq_item: tuple)[source]#

Triggered when building a new mutant.

Parameters:
  • mutator (Mutator) – The mutator that has produced the mutant.

  • new_depth (int) – The progression of the item (total number of processed characters).

  • new_pq_item (tuple) – The pushed mutant.

class FindAstFromStringsVerboseVisitor(verbose_level=1)[source]#

Bases: FindAstFromStringsDefaultVisitor

Insert useful debug message.

Constructor.

visit_final_solution(obj_func_value, ast: RegexpAst)[source]#

Triggered when a candidate solution is found.

Parameters:
  • obj_func_value (float) – The objective function of the candidate solution.

  • ast (RegexpAst) – The candidate solution.

visit_init_sample(examples)[source]#

Triggered when starting the fAST inference.

Parameters:

examples (list) – The positive examples.

visit_pop_item(pq_item: tuple)[source]#

Triggered when processing a mutant.

Parameters:

pq_item – The popped mutant.

visit_push_item(mutator: Mutator, new_depth: int, new_pq_item)[source]#

Triggered when a positive example is totally processed.

fast.multi_grep module#

class MultiGrepFonctor[source]#

Bases: object

Base class used to customize the behavior of the multi_grep() function.

indices() dict[source]#
Returns:

i is an int corresponding to a DFA identifier. j is an int corresponding to the beginning of substring catched by the DFA i. k is an int corresponding to the end of a substring catched by the DFA i.

Return type:

A dict{i:  [(j, k)]} where

class MultiGrepFonctorAll[source]#

Bases: MultiGrepFonctor

The MultiGrepFonctorAll class catches (for each pattern Pi and for each index j) each substring w[j:k] matching Pi.

indices() dict[source]#
Returns:

i is an int corresponding to a DFA identifier. j is an int corresponding to the beginning of substring catched by the DFA i. k is an int corresponding to the end of a substring catched by the DFA i.

Return type:

A dict{i:  [(j, k)]} where

class MultiGrepFonctorGreedy[source]#

Bases: MultiGrepFonctorLargest

The MultiGrepFonctorGreedy class catches (for each pattern Pi and for each index j) the largest w[j′:k] matching Pi and s.t. j′ < j.

class MultiGrepFonctorLargest[source]#

Bases: MultiGrepFonctor

The MultiGrepFonctorLargest class catches (for each pattern Pi and for each index j) the largest w[j:k] matching Pi.

indices() dict[source]#
Returns:

i is an int corresponding to a DFA identifier. j is an int corresponding to the beginning of substring catched by the DFA i. k is an int corresponding to the end of a substring catched by the DFA i.

Return type:

A dict{i:  [(j, k)]} where

multi_grep(w: str, map_name_dfa: list, callback: callable = <function <lambda>>)[source]#

Searches sub-strings of a string matched by multiple patterns.

Parameters:
  • w (str) – A str containing the word to process.

  • map_name_dfa (dict) – A dict{Name:  Automaton} mapping each pattern name with its corresponding DFA.

  • callback (callable) – A callable(Name, int, int, str) called whenever w[j:k] is matched by pattern name.

multi_grep_with_delimiters(w: str, map_name_dfa: dict, callback: callable = <function <lambda>>, is_pattern_separator: callable = <function <lambda>>, is_pattern_left_separated: callable = <function <lambda>>, is_pattern_right_separated: callable = <function <lambda>>)[source]#

Searches sub-strings of a string matched by multiple patterns, possibly delimited by a predefined collection of separator patterns.

Parameters:
  • w (str) – A str containing the word to process.

  • map_name_dfa (dict) – A dict{Name:  Automaton} mapping each pattern name with its corresponding DFA.

  • callback (callable) – A callable(Name, int, int, str) called whenever w[j:k] is matched by pattern name.

  • is_pattern_separator (callable) – A callable(Name) -> bool return True if the pattern name is a separator.

  • is_pattern_left_separated (callable) – A callable(Name) -> bool return True if the pattern name must be preceeded by a separator pattern or located at the beginning of w.

  • is_pattern_right_separated (callable) – A ̀``callable(Name) -> bool`` return True if the pattern name must be followed by a separator pattern or located at the end of w.

fast.objective_func module#

make_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5) callable[source]#

Makes an additive objective function, finding a tradeoff between accuracy and shortness.

Parameters:
  • examples (list) – A list of PatternAutomaton instances.

  • alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).

  • map_pa_infix_re (dict) – A dict{PatternAutomaton : str} that maps each PatternAutomaton to its corresponding regular expression.

  • size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.

  • density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.

Returns:

The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.

make_multiplicative_objective_func_for_str(examples, alphabet, map_pa_infix_re: dict = None, size_exponent=1, density_exponent=1)[source]#

Makes a multiplicative objective function, finding a tradeoff between accuracy and shortness.

Parameters:
  • examples (list) – A list of PatternAutomaton instances.

  • alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).

  • map_pa_infix_re (dict) – A dict{PatternAutomaton : str} that maps each PatternAutomaton to its corresponding regular expression.

  • size_exponent (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_exponent, the more important the size of the inferred regular expression.

  • density_exponent (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_exponent, the more important the size of the inferred regular expression. Should be set to size_exponent - 1.

Returns:

The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.

make_normalized_additive_objective_func_for_str(examples: list, alphabet: list, map_pa_infix_re: dict = None, size_factor: float = 0.5, density_factor: float = 0.5)[source]#

Makes a normalized additive objective function, finding a tradeoff between accuracy and shortness.

Parameters:
  • examples (list) – A list of PatternAutomaton instances.

  • alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).

  • map_pa_infix_re (dict) – A dict{PatternAutomaton : str} that maps each PatternAutomaton to its corresponding regular expression.

  • size_factor (float) – The importance of the shortness, between 0.0 and 1.0. The higher size_factor, the more important the size of the inferred regular expression.

  • density_factor (float) – The importance of the accuracy, between 0.0 and 1.0. The higher density_factor, the more important the size of the inferred regular expression. Should be set to 1 - size_factor.

Returns:

The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.

make_tuple_based_objective_func_for_str(examples: list, alphabet: set, map_pa_infix_re: dict = None)[source]#

Makes a lexicographic objective function, finding a tradeoff between accuracy and shortness.

Parameters:
  • examples (list) – A list of PatternAutomaton instances.

  • alphabet (list) – The list of str, where each str is a symbol alphabet (possibly a metacharacter identifying a PatternAutomaton, like “$date”).

  • map_pa_infix_re (dict) – A dict{PatternAutomaton : str} that maps each PatternAutomaton to its corresponding regular expression.

Returns:

The corresponding callable(ast, examples) -> float `objective function where `ast is a candidate solution; examples is the set of positive examples; the returned value is the objective function value given ast.

fast.pattern_automaton module#

class PatternAutomaton(word: str, map_name_dfa: dict, make_mg: callable = None, filtered_patterns: set = None)[source]#

Bases: Automaton

A PatternAutomaton instance represents a string at the pattern level using a automaton-like structure, whose vertices corresponds to a subset of string indices, and arcs corresponds to infixes and their related pattern.

Constructs the PatternAutomaton related to an input word according to a collection of patterns and according to a multi_grep() stategy.

Parameters:
  • word (str) – The input word.

  • map_name_dfa (dict) – A dict{str: Automaton} mapping each relevant type with its corresponding Automaton instance.

  • filtered_patterns (set) – A subset (possibly empty) of map_name_dfa.keys() of type that must be catched my multi_grep, but not reflected as arcs in this PatternAutomaton instance. It can be used e.g. to drop spaces an get a smaller PatternAutomaton, but the position of spaces in the original lines will be lost.

get_infix(e: EdgeDescriptor) str[source]#

Retrieves the infix (substring) related to an edge.

Parameters:

e (EdgeDescriptor) – The considered edge.

Returns:

The infix related to an arbitrary edge of this PatternAutomaton instance.

get_slice(e) tuple[source]#

Retrieves the slice (pair of uint indices) related to an edge.

Parameters:

e (EdgeDescriptor) – The considered edge.

Returns:

The slice related to an arbitrary edge of this PatternAutomaton instance.

pattern_automaton_edge_weight(e: EdgeDescriptor, g: PatternAutomaton, map_name_density: dict = None) float[source]#

Gets the weight of an edge.

Parameters:
pattern_automaton_to_path(g: PatternAutomaton, *cls, **kwargs) list[source]#

Extracts from a PatternAutomaton instance the most relevant path (highlighting the most relevant pattern-based decomposition).

Parameters:
  • g (PatternAutomaton) – The considered PatternAutomaton instance.

  • cls – See the pybgl.dijkstra_shortest_path() function.

  • kwargs – See the pybgl.dijkstra_shortest_path() function.

fast.random module#

random() x in the interval [0, 1).#
random_ast(re_len: int, alphabet: list = None, map_operators: dict = {'*': Op(cardinality=1, precedence=4, associativity=1), '+': Op(cardinality=1, precedence=4, associativity=1), '.': Op(cardinality=2, precedence=2, associativity=1), '?': Op(cardinality=1, precedence=3, associativity=1), '|': Op(cardinality=2, precedence=1, associativity=1)}) tuple[source]#

Returns a random AST and its root node.

Parameters:
  • re_len (int) – The number of nodes of the AST of the output regular expression.

  • alphabet (list) – A list gathering the character of the alphabet of the regular expressions. Default to list('a...z').

  • map_operators (dict) – A dict{str: pybgl.shunting_yard_postfix.Op} mapping each meta-character with its corresponding attributes. Defaults to pybgl.shunting_yard_postfix.MAP_OPERATORS_RE.

Returns:

A tuple(pybgl.shunting_yard_postfix.Ast, int) gathering the output AST and its root node.

random_word_from_automaton(g: Automaton, p: float = 0.5, repeat: bool = True, max_sampling: int = 1000) str[source]#

Perform a uniform random walk over an Automaton to generate a word accepted by this Automaton.

Parameters:
  • g (Automaton) – A DFA.

  • p (float) – A value in [0, 1] corresponding to the probability to stop once a final state is reached

  • repeat (bool) – Pass True to try again while the sample is rejected.

  • max_sampling (int) – Max number of reject samplings. If you pass None, the algorithm continues until finding a sample, but depending on p and g, this may result to an infinite loop.

Returns:

A string representing the word discovered during the walk, or None if the walk led to a non-final state without successor.

reject_sampling(sample, repeat: bool = True, max_sampling: int = 1000)[source]#

Reject sampling procedure.

Parameters:
  • sample (callable) – A Callback() -> Result callback which draws a random sample. If it returns None, the sample is rejected.

  • repeat (bool) – Pass True to rerun sampling in case of reject.

  • max_sampling (int) – Maximum number of sampling. Pass None to continue sampling until finding a non-rejected value. Note that if sample always returns None, this results to an infinite loop.

Returns:

The sampled instances, None otherwise.

fast.regexp module#

Functions used to craft automata from regular expressions.

get_pattern_names() list[source]#

Retrieves the list of patterns involved in the default pattern collection.

Returns:

A list of string, where each string correspond to a pattern name involved in the default pattern collection (MAP_NAME_RE).

make_dfa_any(alphabet: iter = None, separator_alphabet: iter = None) Automaton[source]#

Builds the DFA corresponding to the any non-separator character.

Parameters:
  • alphabet (iter) – The characters involved in the alphabet. Default to string.printable).

  • separator (iter) – The characters corresponding to separators. Defaults to {" ", "\t", "\n"}.

Returns:

The corresponding Automaton.

make_dfa_empty() Automaton[source]#

Builds the Automaton corresponding to the empty language.

Returns:

The corresponding Automaton.

make_map_name_dfa(names: iter = None) dict[source]#

Builds a dictionary that maps a list of pattern name with the corresponding Automaton instance built according to regular expressions.

Parameters:

names (list) – A list of string, where each string identifies a pattern names (by default, every keys of map_name_re is considered).

Returns:

The dictionary mapping each pattern name with its corresponding DFA.

make_re_hex_digit(lower_case: bool = True, upper_case: bool = True) str[source]#

Builds the regular expression catching hexadecimal values.

Parameters:
  • lower_case (bool) – Pass False to discard lower case values.

  • upper_case (bool) – Pass False to discard upper case values.

Returns:

The string storing the regular expression.

make_re_ipv6(lower_case: bool = True, upper_case: bool = True) str[source]#

Builds the regular expression catching IPv6 addresses.

Note this is not an exact match contrary to make_re_ipv6_strict, but the resulting automaton is significantly faster (and should be accurate enough for most of practical use cases).

Parameters:
  • lower_case (bool) – Pass False to discard lower case values.

  • upper_case (bool) – Pass False to discard upper case values.

Returns:

The string storing the regular expression.

make_re_ipv6_strict(*args, **kwargs) str[source]#

Builds the regular expression catching IPv6 addresses.

Parameters:
Returns:

The string storing the regular expression.

fast.regexp_ast module#

class RegexpAst[source]#

Bases: object

n-ary representations of a regular expression AST. Do not use binary AST otherwise the priority queue will explode! Note that calling simplify transforms an arbitrary AST to a n-ary AST.

Constructor.

ROOT = 'root'#
add_node(label: str) int[source]#

Add a new node to this AST.

Parameters:

label (str) – The label of the added node.

Returns:

The newly added vertex descriptor.

append_child(u: int, v: int, set_parent: bool = True)[source]#

Appends a new child to a node u.

Parameters:
  • u (int) – The vertex descriptor of the parent node.

  • v (int) – The vertex descriptor of the child node.

  • set_parent (bool) – Pass True to update the parent of v. Defaults to True.

children(u: int) iter[source]#

Retrieves the children of a given vertex.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

The parent of u if any or None.

copy()[source]#

Copy this RegexpAst instance.

Returns:

A copy this RegexpAst instance.

edges() iter[source]#

Retrieves an iterator over the edges of this py:class:RegexpAst instance.

Returns:

An iterator over this RegexpAst instance edges.

epsilon_reachables(u: int, v: int = None) set[source]#

Retrieves the edges that are epsilon-reachable from a (u, v) arc. See also RegexpAst.epsilon_successors().

Parameters:
  • u (int) – The source of the arc.

  • v (int) – The target of the arc. If u is a leaf you may pass None.

Returns:

The set of out-edges of (u', v') arcs that are epsilon-reachable from the (u, v) arc

epsilon_successors(u: int, v: int) set[source]#

Retrieves the out-edges of v that are epsilon-reachable from a (u, v) arc. See also RegexpAst.epsilon_reachables().

Parameters:
  • u (int) – The source of the arc.

  • v (int) – The target of the arc.

Returns:

The set of out-edges of (u', v') arcs that are epsilon-reachable from the (u, v) arc

find_unary_n_aries(u=None)[source]#
first_child(u: int) int[source]#

Retrieves the first child of a given vertex.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

The vertex descriptor of the first child of u.

Raises:

IndexError

get_arc_index(u: int, v: int) int[source]#

If (u,v) is downwards, and v is the i^th child of u, returns i. If (u,v) is upwards, and u is the i^th child of v, returns i.

is_ancestor_of(u: int, v: int) bool[source]#

Tests if v is an ancestor of u.

Parameters:
  • u (int) – The vertex descriptor of a node.

  • v (int) – The vertex descriptor of the candidate ancestor.

Returns:

True iff v is an ancestor of u, False otherwise.

is_downwards_arc(u: int, v: int) bool[source]#

Checks whether an (u, v) arc is downward (i.e., if v is a child of u).

Parameters:
  • u (int) – The vertex descriptor of the source of the arc.

  • v (int) – The vertex descriptor of the target of the arc.

Returns:

True iff (u, v) is downward.

is_last_child(u: int, v: int) bool[source]#

Checks whether u is the last child of v.

Parameters:
  • u (int) – The vertex descriptor of the considered children.

  • v (int) – The vertex descriptor of the parent node of u.

Returns:

True if and only if u is the last child of v.

is_leaf(u: int) bool[source]#

Checks whether a node is a leaf. It concerns nodes labeled by a symbol of the alphabet.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

True iff u is a leaf, False otherwise.

is_n_ary(u: int) bool[source]#

Checks whether a node is n-ary, n > 1. It concerns nodes labeled by an n-ary regular expression operator (i.e., “.”, “|”) and the root of this :py:class:RegexpAst instance.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

True iff u is n-ary, False otherwise.

is_unary(u: int) bool[source]#

Checks whether a node is unary. It concerns nodes labeled by an unary regular expression operator (i.e., “+”, “*”, “?”) and the root of this :py:class:RegexpAst instance.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

True iff u is unary, False otherwise.

is_upwards_arc(u: int, v: int) bool[source]#

Checks whether an (u, v) arc is upward (i.e., if u is a child of v).

Parameters:
  • u (int) – The vertex descriptor of the source of the arc.

  • v (int) – The vertex descriptor of the target of the arc.

Returns:

True iff (u, v) is upward, False otherwise.

ith_child(u: int, i: int) int[source]#

Retrieves the i-th child of a given vertex.

Parameters:
  • u (int) – The vertex descriptor of the considered node.

  • i (int) – The child index.

Returns:

The vertex descriptor of the i-th child of u.

Raises:

IndexError

label(u) str[source]#

Retrieves the label of a given vertex.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

The label of u.

merge_n_ary_nodes(u, v, remove_v=True)[source]#
num_children(u: int) int[source]#

Retrieves the number of children of u.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

The number of children of u.

out_edges(u: int) iter[source]#

Retrieves an iterator over the out-edges of this py:class:RegexpAst instance.

Returns:

An iterator over this RegexpAst instance vertices.

parent(u: int) int[source]#

Retrieves the parent of a given vertex.

Parameters:

u (int) – The vertex descriptor of the considered node.

Returns:

The parent of u if any or None.

recognizes(x) bool[source]#
recognizes_pa(pa: PatternAutomaton, verbose: bool = False) bool[source]#
recognizes_pa_prefix(pa: PatternAutomaton, target_pa_node: int, target_ast_leaf: int, u: int = None) bool[source]#

Tests whether there exists in this py:class:RegexpAst instance a path from its root to target_ast_leaf that corresponds to a path from the source of a given PatternAutomaton instance to target_ast_leaf.

Parameters:
  • pa (PatternAutomaton) – A PatternAutomaton instance (modeling a positive example).

  • target_pa_node (int) – The vertex descriptor of the target node in pa.

  • target_ast_leaf (int) – The vertex descriptor target leaf in self.

  • u (int) – Pass None.

Returns:

True iff this py:class:RegexpAst instance matches pa,

False otherwise.

recognizes_prefix(prefix: list, leaf_to_end_on: int)[source]#

Tests whether there exists a valid walk from the root to a given leaf in this RegexpAst instance.

Parameters:
  • prefix (list) – The prefix (list of symbols).

  • leaf_to_end_on (int) – The vertex descriptor target leaf.

Returns:

True if there exists a walk from the root to leaf_to_end_on that matches prefix, False otherwise.

recognizes_word(w: list) bool[source]#

Tests whether a word is matched by this RegexpAst instance.

Parameters:

w (list) – A list of symbols, where each symbol is a string (e.g. “a” or “$date”).

Returns:

True iff this py:class:RegexpAst instance matches w, False otherwise.

remove_node(u: int)[source]#

Removes a node from this RegexpAst instance.

Parameters:

u (int) – The vertex descriptor of the node to be removed.

remove_unary_n_aries(u=None)[source]#
reorder_or_nodes(u=None)[source]#
set_child(u: int, v: int, set_parent: bool = True)[source]#

Sets v as the only child of u.

Parameters:
  • u (int) – The vertex descriptor of the parent node.

  • v (int) – The vertex descriptor of the child node.

  • set_parent (bool) – Pass True to update the parent of v. Defaults to True.

set_children(u: int, vs: iter, set_parents: bool = True)[source]#

Sets the children of u.

Parameters:
  • u (int) – The vertex descriptor of the parent node.

  • vs (iter) – The vertex descriptors of the child nodes.

  • set_parent (bool) – Pass True to update the parent of v. Defaults to True.

set_ith_child(u: int, v: int, i: int, set_parent: bool = True)[source]#

Sets the i-th children of u to v.

Parameters:
  • u (int) – The vertex descriptor of the parent node.

  • v (int) – The vertex descriptor of the child node.

  • i (int) – The child index. It must verify 0 < i <= self.num_children(u).

  • set_parent (bool) – Pass True to update the parent of v. Defaults to True.

Raises:

IndexError

set_label(u: int, label: str)[source]#

Sets the label of a given vertex.

Parameters:
  • u (int) – The vertex descriptor of the considered node.

  • label (str) – Set the label of a given node.

simplify(active_leaf: int = None)[source]#

Applies the following simplifications on this RegexpAst instance.

Parameters:

active_leaf (int) – Pass None

simplify_n_ary_nodes(u=None)[source]#
simplify_or_nodes(active_leaf=None, u=None)[source]#
simplify_unary_nodes(u=None)[source]#
source(e: EdgeDescriptor) int[source]#

Retrieves the source of an arc.

Parameters:

e (EdgeDescriptor) – The EdgeDescriptor of the arc.

Returns:

The vertex descriptor of the source of e.

target(e: EdgeDescriptor) int[source]#

Retrieves the target of an arc.

Parameters:

e (EdgeDescriptor) – The EdgeDescriptor of the arc.

Returns:

The vertex descriptor of the target of e.

to_dot(**kwargs) str[source]#

Exports this RegexpAst to Graphviz format.

Returns:

The corresponding Graphviz string.

to_infix_regexp_str(u=None) str[source]#

Builds the infix regular expression of this RegexpAst instance. See also RegexpAst.to_prefix_regexp_list().

Parameters:

u (int) – Pass None.

Returns:

A string storing the prefix regular expression modeling this RegexpAst instance.

to_prefix_regexp_list(u: int = None) list[source]#

Builds the prefix regular expression of this RegexpAst instance. See also RegexpAst.to_prefix_regexp_str().

Parameters:

u (int) – Pass None.

Returns:

A string storing the prefix regular expression modeling this RegexpAst instance.

to_prefix_regexp_str(u: int = None) str[source]#

Builds the prefix regular expression of this RegexpAst instance. See also RegexpAst.to_prefix_regexp_list().

Parameters:

u (int) – Pass None.

Returns:

A string storing the prefix regular expression modeling this RegexpAst instance.

vertices() iter[source]#

Retrieves an iterator over the vertices of this py:class:RegexpAst instance.

Returns:

An iterator over this RegexpAst instance vertices.

walk_one_char(u: int, a: str, verbose: bool = False) set[source]#

Performs a single character walk on this RegexpAst instance. Starts the walk on the node u (u must be a leaf or the root), and returns the set of leaves labeled by a that are epsilon-reachable from u.

Parameters:

a (str) – A symbol, e.g. “a” or “$date”.

walk_word(w: list, u: int = None) set[source]#

Walk on this RegexpAst instances by consuming the symbols involved in an input word.

Parameters:
  • u (int) – The vertex descriptor of the starting node. Pass None to start from the root of this RegexpAst instance. Defaults to None.

  • w (list) – The input word, which is a list of symbols, where each symbol is a string (e.g. “a” or “$date”).

Returns:

The set of vertex descriptor that are reachable from u by consuming w

prefix_regexp_to_ast(prefix_regexp: list) RegexpAst[source]#

Builds a RegexpAst instance from a prefix regular expression.

Parameters:

prefix_regexp (list) – A list modeling a binary prefix regular expression e.g. [“.”, “a”, “$date”].

Returns:

The corresponding RegexpAst instance.

fast.regexp_mutators module#

class ActivateMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

‘Identity’ mutator. Does not modify the ast, only activates an existing leaf. Condition: v is a c-labeled leaf.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class BotMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

‘Bot’ mutator. Is used to add a leaf to an empty tree. Condition: the ast is empty.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class BouncePlusMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of +). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an upwards arc, and v, u is not epsilon-reachable from the current leaf.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class BounceQuestionMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

‘Bouncer’ mutator. Starts by applying a first mutation on e (insertion of ?). Then generates from this mutation a new set of Epsilon reachable arc on which the non bouncing mutators can be applied. TODO: Find out if/how the bouncers can call each other Condition: u, v is an downwards arc, and v, u is not epsilon-reachable from the current leaf.

Constructor.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#

Mutation related to the ‘?’ operator.

class DisjunctionMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

Downwards mutator, inserting a new ‘or’ node and a new leaf labeled c. Condition: the arc u, v must go downwards.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class DownDotMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

Upwards dot mutator. Starts by trying to inject in the ast a dot node as child of u, a new leaf as left child of it, and v as its right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and v. Condition: the arc u, v must go downwards.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class Mutator(mutators_to_bounce_on)[source]#

Bases: object

mutate(ast: RegexpAst, sigma, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#
class UpDotMutator(mutators_to_bounce_on)[source]#

Bases: Mutator

Upwards dot mutator. Starts by trying to inject in the ast a dot node and a leaf as right child. If this newly obtained ast does not recognize the prefix word being processed now, or does not recognize one of the previous examples processed, then we also insert a ? node between the dot and the leaf Condition: the arc u, v must go upwards.

mutate(ast: RegexpAst, c, u, v, prefix_word: list, previous_words: List[list], epsilon_reachables: List[tuple], current_pa) List[Tuple[RegexpAst, int]][source]#

Module contents#

Top-level package.