augly.text package

Submodules

augly.text.composition module

class augly.text.composition.BaseComposition(transforms, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms

  • p (float) – the probability of the transform being applied; default value is 1.0

class augly.text.composition.Compose(transforms, p=1.0)

Bases: augly.text.composition.BaseComposition

__call__(texts, seed=None, metadata=None)

Applies the list of transforms in order to the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • seed (Optional[int]) – if provided, the random seed will be set to this before calling the transform

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.composition.OneOf(transforms, p=1.0)

Bases: augly.text.composition.BaseComposition

__call__(texts, force=False, seed=None, metadata=None)
Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • force (bool) – if set to True, the transform will be applied. Otherwise, application is determined by the probability set

  • seed (Optional[int]) – if provided, the random seed will be set to this before calling the transform

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the text

  • p (float) – the probability of the transform being applied; default value is 1.0

augly.text.functional module

augly.text.functional.apply_lambda(texts, aug_function=<function <lambda>>, metadata=None, **kwargs)

Apply a user-defined lambda on a list of text documents

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_function (Callable[..., List[str]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)

  • **kwargs

    the input attributes to be passed into the augmentation function to be applied

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.change_case(texts, granularity='word', cadence=1.0, case='random', seed=10, metadata=None)

Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)

  • cadence (float) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’

  • case (str) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case the case will randomly be changed to one of the previous three)

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.contractions(texts, aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, metadata=None)

Replaces pairs (or longer strings) of words with contractions given a mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mapping

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • max_contraction_length (int) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checked

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.get_baseline(texts, metadata=None)

Generates a baseline by tokenizing and detokenizing the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.insert_punctuation_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts punctuation characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different punctuation char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.functional.insert_whitespace_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts whitespace characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different whitespace char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.functional.insert_zero_width_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts zero-width characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different zero-width char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.functional.merge_words(texts, aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)

Merges words in the text together

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word to be merged

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.replace_bidirectional(texts, granularity='all', split_word=False, metadata=None)

Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – the level at which the font is applied; this must be either ‘word’ or ‘all’

  • split_word (bool) – if true and granularity is ‘word’, reverses only the second half of each word

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.functional.replace_fun_fonts(texts, aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, metadata=None)

Replaces words or characters depending on the granularity with fun fonts applied

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • vary_fonts (bool) – whether or not to switch font in each replacement

  • fonts_path (str) – iopath uri where the fonts are stored

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.replace_similar_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, metadata=None)

Replaces letters in each text with similar characters

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (Optional[str]) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.replace_similar_unicode_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, metadata=None)

Replaces letters in each text with similar unicodes

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (str) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.replace_upside_down(texts, aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, metadata=None)

Flips words in the text upside down depending on the granularity

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be either word, char, or all

  • n (int) – number of augmentations to be performed for each text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.replace_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, metadata=None)

Replaces words in each text based on a given mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.simulate_typos(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, metadata=None)

Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swapping

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swapping

  • aug_char_min (int) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_char_max (int) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • typo_type (str) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”

  • misspelling_dict_path (Optional[str]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be None

  • max_typo_length (int) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checked

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.split_words(texts, aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)

Splits words in the text into subwords

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word for a split

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.functional.swap_gendered_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, metadata=None)

Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, str]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dict

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.intensity module

augly.text.intensity.apply_lambda_intensity(aug_function, **kwargs)
Return type

float

augly.text.intensity.change_case_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.intensity.char_insertion_intensity_helper(granularity, cadence)
Return type

float

augly.text.intensity.contractions_intensity(aug_p, **kwargs)
Return type

float

augly.text.intensity.get_baseline_intensity(**kwargs)
Return type

float

augly.text.intensity.insert_punctuation_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.intensity.insert_whitespace_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.intensity.insert_zero_width_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.intensity.merge_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float

augly.text.intensity.replace_bidirectional_intensity(**kwargs)
Return type

float

augly.text.intensity.replace_fun_fonts_intensity(aug_p, aug_max, granularity, **kwargs)
Return type

float

augly.text.intensity.replace_intensity_helper(aug_p, aug_max)
Return type

float

augly.text.intensity.replace_similar_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.intensity.replace_similar_unicode_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.intensity.replace_upside_down_intensity(aug_p, aug_max, granularity, **kwargs)
Return type

float

augly.text.intensity.replace_words_intensity(aug_word_p, aug_word_max, mapping, **kwargs)
Return type

float

augly.text.intensity.simulate_typos_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.intensity.split_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float

augly.text.intensity.swap_gendered_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float

augly.text.transforms module

class augly.text.transforms.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)

Bases: augly.text.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
Parameters
  • aug_function (Callable[..., List[str]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)

  • p (float) – the probability of the transform being applied; default value is 1.0

  • **kwargs

    the input attributes to be passed into the augmentation function to be applied

apply_transform(texts, metadata=None, **aug_kwargs)

Apply a user-defined lambda on a list of text documents

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.BaseTransform(p=1.0)

Bases: object

__call__(texts, force=False, metadata=None, **kwargs)
Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • force (bool) – if set to True, the transform will be applied. Otherwise, application is determined by the probability set

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

__init__(p=1.0)
Parameters

p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

This function is to be implemented in the child classes. From this function, call the augmentation function, passing in ‘texts’, ‘metadata’, & the given ‘aug_kwargs’

Return type

Union[str, List[str]]

get_aug_kwargs(**kwargs)
Parameters

kwargs – any kwargs that were passed into __call__() intended to override the instance variables set in __init__() when calling the augmentation function in apply_transform()

Return type

Dict[str, Any]

Returns

the kwargs that should be passed into the augmentation function apply_transform() – this will be the instance variables set in __init__(), potentially overridden by anything passed in as kwargs

class augly.text.transforms.ChangeCase(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
Parameters
  • granularity (str) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)

  • cadence (float) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’

  • case (str) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case every word will be randomly changed to one of the 3 cases)

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.Contractions(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
Parameters
  • aug_p (float) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mapping

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • max_contraction_length (int) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checked

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces pairs (or longer strings) of words with contractions given a mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.GetBaseline(p=1.0)

Bases: augly.text.transforms.BaseTransform

apply_transform(texts, metadata=None, **aug_kwargs)

Generates a baseline by tokenizing and detokenizing the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.InsertPunctuationChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different punctuation char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts punctuation characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.InsertWhitespaceChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different whitespace char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts whitespace characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.InsertZeroWidthChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – If true, picks a different zero-width char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts zero-width characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.MergeWords(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word to be merged

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Merges words in the text together

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceBidirectional(granularity='all', split_word=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', split_word=False, p=1.0)
Parameters
  • granularity (str) – the level at which the font is applied; this must be either ‘word’ or ‘all’

  • split_word (bool) – if true and granularity is ‘word’, reverses only the second half of each word

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceFunFonts(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
Parameters
  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • vary_fonts (bool) – whether or not to switch font in each replacement

  • fonts_path (str) – iopath uri where the fonts are stored

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words or characters depending on the granularity with fun fonts applied

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceSimilarChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (Optional[str]) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces letters in each text with similar characters

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceSimilarUnicodeChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (str) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces letters in each text with similar unicodes

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceUpsideDown(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
Parameters
  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • n (int) – number of augmentations to be performed for each text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Flips words in the text upside down depending on the granularity

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.ReplaceWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words in each text based on a given mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.SimulateTypos(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swapping

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swapping

  • aug_char_min (int) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_char_max (int) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • typo_type (str) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”

  • misspelling_dict_path (Optional[str]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be None

  • max_typo_length (int) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checked

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.SplitWords(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word for a split

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Splits words in the text into subwords

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.transforms.SwapGenderedWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, str]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dict

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

Module contents

class augly.text.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)

Bases: augly.text.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
Parameters
  • aug_function (Callable[..., List[str]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)

  • p (float) – the probability of the transform being applied; default value is 1.0

  • **kwargs

    the input attributes to be passed into the augmentation function to be applied

apply_transform(texts, metadata=None, **aug_kwargs)

Apply a user-defined lambda on a list of text documents

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ChangeCase(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
Parameters
  • granularity (str) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)

  • cadence (float) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’

  • case (str) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case every word will be randomly changed to one of the 3 cases)

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.Compose(transforms, p=1.0)

Bases: augly.text.composition.BaseComposition

__call__(texts, seed=None, metadata=None)

Applies the list of transforms in order to the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • seed (Optional[int]) – if provided, the random seed will be set to this before calling the transform

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.Contractions(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
Parameters
  • aug_p (float) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mapping

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • max_contraction_length (int) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checked

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces pairs (or longer strings) of words with contractions given a mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.GetBaseline(p=1.0)

Bases: augly.text.transforms.BaseTransform

apply_transform(texts, metadata=None, **aug_kwargs)

Generates a baseline by tokenizing and detokenizing the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.InsertPunctuationChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different punctuation char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts punctuation characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.InsertWhitespaceChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different whitespace char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts whitespace characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.InsertZeroWidthChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Parameters
  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – If true, picks a different zero-width char each time one is used instead of just one per word/text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Inserts zero-width characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.MergeWords(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word to be merged

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Merges words in the text together

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.OneOf(transforms, p=1.0)

Bases: augly.text.composition.BaseComposition

__call__(texts, force=False, seed=None, metadata=None)
Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • force (bool) – if set to True, the transform will be applied. Otherwise, application is determined by the probability set

  • seed (Optional[int]) – if provided, the random seed will be set to this before calling the transform

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the text

  • p (float) – the probability of the transform being applied; default value is 1.0

class augly.text.ReplaceBidirectional(granularity='all', split_word=False, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(granularity='all', split_word=False, p=1.0)
Parameters
  • granularity (str) – the level at which the font is applied; this must be either ‘word’ or ‘all’

  • split_word (bool) – if true and granularity is ‘word’, reverses only the second half of each word

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ReplaceFunFonts(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
Parameters
  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • vary_fonts (bool) – whether or not to switch font in each replacement

  • fonts_path (str) – iopath uri where the fonts are stored

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words or characters depending on the granularity with fun fonts applied

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ReplaceSimilarChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (Optional[str]) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces letters in each text with similar characters

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ReplaceSimilarUnicodeChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (str) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces letters in each text with similar unicodes

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ReplaceUpsideDown(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
Parameters
  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • n (int) – number of augmentations to be performed for each text

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Flips words in the text upside down depending on the granularity

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.ReplaceWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words in each text based on a given mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.SimulateTypos(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
Parameters
  • aug_char_p (float) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swapping

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swapping

  • aug_char_min (int) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_char_max (int) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • typo_type (str) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”

  • misspelling_dict_path (Optional[str]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be None

  • max_typo_length (int) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checked

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.SplitWords(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word for a split

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Splits words in the text into subwords

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

class augly.text.SwapGenderedWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)

Bases: augly.text.transforms.BaseTransform

__init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
Parameters
  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, str]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dict

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(texts, metadata=None, **aug_kwargs)

Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

  • aug_kwargs – kwargs to pass into the augmentation that will override values set in __init__

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.apply_lambda(texts, aug_function=<function <lambda>>, metadata=None, **kwargs)

Apply a user-defined lambda on a list of text documents

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_function (Callable[..., List[str]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)

  • **kwargs

    the input attributes to be passed into the augmentation function to be applied

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.apply_lambda_intensity(aug_function, **kwargs)
Return type

float

augly.text.change_case(texts, granularity='word', cadence=1.0, case='random', seed=10, metadata=None)

Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)

  • cadence (float) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’

  • case (str) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case the case will randomly be changed to one of the previous three)

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.change_case_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.contractions(texts, aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, metadata=None)

Replaces pairs (or longer strings) of words with contractions given a mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mapping

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • max_contraction_length (int) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checked

  • seed (Optional[int]) – if provided, this will set the random seed to ensure consistency between runs

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.contractions_intensity(aug_p, **kwargs)
Return type

float

augly.text.get_baseline(texts, metadata=None)

Generates a baseline by tokenizing and detokenizing the text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.get_baseline_intensity(**kwargs)
Return type

float

augly.text.insert_punctuation_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts punctuation characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different punctuation char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.insert_punctuation_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.insert_whitespace_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts whitespace characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different whitespace char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.insert_whitespace_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.insert_zero_width_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)

Inserts zero-width characters in each input text

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the text

  • cadence (float) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence

  • vary_chars (bool) – if true, picks a different zero-width char each time one is used instead of just one per word/text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.insert_zero_width_chars_intensity(granularity, cadence, **kwargs)
Return type

float

augly.text.merge_words(texts, aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)

Merges words in the text together

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word to be merged

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.merge_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float

augly.text.replace_bidirectional(texts, granularity='all', split_word=False, metadata=None)

Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • granularity (str) – the level at which the font is applied; this must be either ‘word’ or ‘all’

  • split_word (bool) – if true and granularity is ‘word’, reverses only the second half of each word

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented texts

augly.text.replace_bidirectional_intensity(**kwargs)
Return type

float

augly.text.replace_fun_fonts(texts, aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, metadata=None)

Replaces words or characters depending on the granularity with fun fonts applied

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be be either word, char, or all

  • vary_fonts (bool) – whether or not to switch font in each replacement

  • fonts_path (str) – iopath uri where the fonts are stored

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.replace_fun_fonts_intensity(aug_p, aug_max, granularity, **kwargs)
Return type

float

augly.text.replace_similar_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, metadata=None)

Replaces letters in each text with similar characters

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (Optional[str]) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.replace_similar_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.replace_similar_unicode_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, metadata=None)

Replaces letters in each text with similar unicodes

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation

  • aug_char_min (int) – minimum # of letters to be replaced in each word

  • aug_char_max (int) – maximum # of letters to be replaced in each word

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping_path (str) – iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.replace_similar_unicode_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.replace_upside_down(texts, aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, metadata=None)

Flips words in the text upside down depending on the granularity

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_p (float) – probability of words to be augmented

  • aug_min (int) – minimum # of words to be augmented

  • aug_max (int) – maximum # of words to be augmented

  • granularity (str) – the level at which the font is applied; this must be either word, char, or all

  • n (int) – number of augmentations to be performed for each text

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.replace_upside_down_intensity(aug_p, aug_max, granularity, **kwargs)
Return type

float

augly.text.replace_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, metadata=None)

Replaces words in each text based on a given mapping

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, Any], None]) – either a dictionary representing the mapping or an iopath uri where the mapping is stored

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.replace_words_intensity(aug_word_p, aug_word_max, mapping, **kwargs)
Return type

float

augly.text.simulate_typos(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, metadata=None)

Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_char_p (float) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swapping

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swapping

  • aug_char_min (int) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_char_max (int) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swapping

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • typo_type (str) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”

  • misspelling_dict_path (Optional[str]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be None

  • max_typo_length (int) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checked

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.simulate_typos_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
Return type

float

augly.text.split_words(texts, aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)

Splits words in the text into subwords

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • min_char (int) – minimum # of characters in a word for a split

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.split_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float

augly.text.swap_gendered_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, metadata=None)

Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf

Parameters
  • texts (Union[str, List[str]]) – a string or a list of text documents to be augmented

  • aug_word_p (float) – probability of words to be augmented

  • aug_word_min (int) – minimum # of words to be augmented

  • aug_word_max (int) – maximum # of words to be augmented

  • n (int) – number of augmentations to be performed for each text

  • mapping (Union[str, Dict[str, str]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dict

  • priority_words (Optional[List[str]]) – list of target words that the augmenter should prioritize to augment first

  • ignore_words (Optional[List[str]]) – list of words that the augmenter should not augment

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned

Return type

Union[str, List[str]]

Returns

the list of augmented text documents

augly.text.swap_gendered_words_intensity(aug_word_p, aug_word_max, **kwargs)
Return type

float