augly.text package
Submodules
augly.text.composition module
- class augly.text.composition.BaseComposition(transforms, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transformsp (
float
) – the probability of the transform being applied; default value is 1.0
- class augly.text.composition.Compose(transforms, p=1.0)
Bases:
augly.text.composition.BaseComposition
- __call__(texts, seed=None, metadata=None)
Applies the list of transforms in order to the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedseed (
Optional
[int
]) – if provided, the random seed will be set to this before calling the transformmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.composition.OneOf(transforms, p=1.0)
Bases:
augly.text.composition.BaseComposition
- __call__(texts, force=False, seed=None, metadata=None)
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedforce (
bool
) – if set to True, the transform will be applied. Otherwise, application is determined by the probability setseed (
Optional
[int
]) – if provided, the random seed will be set to this before calling the transformmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transforms to select from; one of which will be chosen to be applied to the textp (
float
) – the probability of the transform being applied; default value is 1.0
augly.text.functional module
- augly.text.functional.apply_lambda(texts, aug_function=<function <lambda>>, metadata=None, **kwargs)
Apply a user-defined lambda on a list of text documents
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_function (
Callable
[...
,List
[str
]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)**kwargs –
the input attributes to be passed into the augmentation function to be applied
metadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.change_case(texts, granularity='word', cadence=1.0, case='random', seed=10, metadata=None)
Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)cadence (
float
) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’case (
str
) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case the case will randomly be changed to one of the previous three)seed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.contractions(texts, aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, metadata=None)
Replaces pairs (or longer strings) of words with contractions given a mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mappingmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedmax_contraction_length (
int
) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checkedseed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.get_baseline(texts, metadata=None)
Generates a baseline by tokenizing and detokenizing the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.insert_punctuation_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts punctuation characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different punctuation char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.functional.insert_whitespace_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts whitespace characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different whitespace char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.functional.insert_zero_width_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts zero-width characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different zero-width char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.functional.merge_words(texts, aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)
Merges words in the text together
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word to be mergedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.replace_bidirectional(texts, granularity='all', split_word=False, metadata=None)
Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – the level at which the font is applied; this must be either ‘word’ or ‘all’split_word (
bool
) – if true and granularity is ‘word’, reverses only the second half of each wordmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.functional.replace_fun_fonts(texts, aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, metadata=None)
Replaces words or characters depending on the granularity with fun fonts applied
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or allvary_fonts (
bool
) – whether or not to switch font in each replacementfonts_path (
str
) – iopath uri where the fonts are storedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.replace_similar_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, metadata=None)
Replaces letters in each text with similar characters
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
Optional
[str
]) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.replace_similar_unicode_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, metadata=None)
Replaces letters in each text with similar unicodes
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
str
) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.replace_upside_down(texts, aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, metadata=None)
Flips words in the text upside down depending on the granularity
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be either word, char, or alln (
int
) – number of augmentations to be performed for each textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.replace_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, metadata=None)
Replaces words in each text based on a given mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.simulate_typos(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, metadata=None)
Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swappingaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swappingaug_char_min (
int
) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_char_max (
int
) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each texttypo_type (
str
) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”misspelling_dict_path (
Optional
[str
]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be Nonemax_typo_length (
int
) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checkedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.split_words(texts, aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)
Splits words in the text into subwords
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word for a splitaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.functional.swap_gendered_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, metadata=None)
Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,str
]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dictpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
augly.text.intensity module
- augly.text.intensity.apply_lambda_intensity(aug_function, **kwargs)
- Return type
float
- augly.text.intensity.change_case_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.intensity.char_insertion_intensity_helper(granularity, cadence)
- Return type
float
- augly.text.intensity.contractions_intensity(aug_p, **kwargs)
- Return type
float
- augly.text.intensity.get_baseline_intensity(**kwargs)
- Return type
float
- augly.text.intensity.insert_punctuation_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.intensity.insert_whitespace_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.intensity.insert_zero_width_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.intensity.merge_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float
- augly.text.intensity.replace_bidirectional_intensity(**kwargs)
- Return type
float
- augly.text.intensity.replace_fun_fonts_intensity(aug_p, aug_max, granularity, **kwargs)
- Return type
float
- augly.text.intensity.replace_intensity_helper(aug_p, aug_max)
- Return type
float
- augly.text.intensity.replace_similar_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.intensity.replace_similar_unicode_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.intensity.replace_upside_down_intensity(aug_p, aug_max, granularity, **kwargs)
- Return type
float
- augly.text.intensity.replace_words_intensity(aug_word_p, aug_word_max, mapping, **kwargs)
- Return type
float
- augly.text.intensity.simulate_typos_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.intensity.split_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float
- augly.text.intensity.swap_gendered_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float
augly.text.transforms module
- class augly.text.transforms.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
- Parameters
aug_function (
Callable
[...
,List
[str
]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)p (
float
) – the probability of the transform being applied; default value is 1.0**kwargs –
the input attributes to be passed into the augmentation function to be applied
- apply_transform(texts, metadata=None, **aug_kwargs)
Apply a user-defined lambda on a list of text documents
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.BaseTransform(p=1.0)
Bases:
object
- __call__(texts, force=False, metadata=None, **kwargs)
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedforce (
bool
) – if set to True, the transform will be applied. Otherwise, application is determined by the probability setmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- __init__(p=1.0)
- Parameters
p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
This function is to be implemented in the child classes. From this function, call the augmentation function, passing in ‘texts’, ‘metadata’, & the given ‘aug_kwargs’
- Return type
Union
[str
,List
[str
]]
- get_aug_kwargs(**kwargs)
- Parameters
kwargs – any kwargs that were passed into __call__() intended to override the instance variables set in __init__() when calling the augmentation function in apply_transform()
- Return type
Dict
[str
,Any
]- Returns
the kwargs that should be passed into the augmentation function apply_transform() – this will be the instance variables set in __init__(), potentially overridden by anything passed in as kwargs
- class augly.text.transforms.ChangeCase(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
- Parameters
granularity (
str
) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)cadence (
float
) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’case (
str
) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case every word will be randomly changed to one of the 3 cases)seed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.Contractions(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
- Parameters
aug_p (
float
) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mappingmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedmax_contraction_length (
int
) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checkedseed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces pairs (or longer strings) of words with contractions given a mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.GetBaseline(p=1.0)
Bases:
augly.text.transforms.BaseTransform
- apply_transform(texts, metadata=None, **aug_kwargs)
Generates a baseline by tokenizing and detokenizing the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.InsertPunctuationChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different punctuation char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts punctuation characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.InsertWhitespaceChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different whitespace char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts whitespace characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.InsertZeroWidthChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – If true, picks a different zero-width char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts zero-width characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.MergeWords(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word to be mergedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Merges words in the text together
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceBidirectional(granularity='all', split_word=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', split_word=False, p=1.0)
- Parameters
granularity (
str
) – the level at which the font is applied; this must be either ‘word’ or ‘all’split_word (
bool
) – if true and granularity is ‘word’, reverses only the second half of each wordp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceFunFonts(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
- Parameters
aug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or allvary_fonts (
bool
) – whether or not to switch font in each replacementfonts_path (
str
) – iopath uri where the fonts are storedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words or characters depending on the granularity with fun fonts applied
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceSimilarChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
Optional
[str
]) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces letters in each text with similar characters
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceSimilarUnicodeChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
str
) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces letters in each text with similar unicodes
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceUpsideDown(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
- Parameters
aug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or alln (
int
) – number of augmentations to be performed for each textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Flips words in the text upside down depending on the granularity
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.ReplaceWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words in each text based on a given mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.SimulateTypos(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swappingaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swappingaug_char_min (
int
) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_char_max (
int
) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each texttypo_type (
str
) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”misspelling_dict_path (
Optional
[str
]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be Nonemax_typo_length (
int
) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checkedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.SplitWords(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word for a splitaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Splits words in the text into subwords
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.transforms.SwapGenderedWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,str
]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dictpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
Module contents
- class augly.text.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0, **kwargs)
- Parameters
aug_function (
Callable
[...
,List
[str
]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)p (
float
) – the probability of the transform being applied; default value is 1.0**kwargs –
the input attributes to be passed into the augmentation function to be applied
- apply_transform(texts, metadata=None, **aug_kwargs)
Apply a user-defined lambda on a list of text documents
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ChangeCase(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='word', cadence=1.0, case='random', seed=10, p=1.0)
- Parameters
granularity (
str
) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)cadence (
float
) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’case (
str
) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case every word will be randomly changed to one of the 3 cases)seed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.Compose(transforms, p=1.0)
Bases:
augly.text.composition.BaseComposition
- __call__(texts, seed=None, metadata=None)
Applies the list of transforms in order to the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedseed (
Optional
[int
]) – if provided, the random seed will be set to this before calling the transformmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.Contractions(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, p=1.0)
- Parameters
aug_p (
float
) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mappingmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedmax_contraction_length (
int
) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checkedseed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces pairs (or longer strings) of words with contractions given a mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.GetBaseline(p=1.0)
Bases:
augly.text.transforms.BaseTransform
- apply_transform(texts, metadata=None, **aug_kwargs)
Generates a baseline by tokenizing and detokenizing the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.InsertPunctuationChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different punctuation char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts punctuation characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.InsertWhitespaceChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different whitespace char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts whitespace characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.InsertZeroWidthChars(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', cadence=1.0, vary_chars=False, p=1.0)
- Parameters
granularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – If true, picks a different zero-width char each time one is used instead of just one per word/textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Inserts zero-width characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.MergeWords(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word to be mergedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Merges words in the text together
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.OneOf(transforms, p=1.0)
Bases:
augly.text.composition.BaseComposition
- __call__(texts, force=False, seed=None, metadata=None)
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedforce (
bool
) – if set to True, the transform will be applied. Otherwise, application is determined by the probability setseed (
Optional
[int
]) – if provided, the random seed will be set to this before calling the transformmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transforms to select from; one of which will be chosen to be applied to the textp (
float
) – the probability of the transform being applied; default value is 1.0
- class augly.text.ReplaceBidirectional(granularity='all', split_word=False, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(granularity='all', split_word=False, p=1.0)
- Parameters
granularity (
str
) – the level at which the font is applied; this must be either ‘word’ or ‘all’split_word (
bool
) – if true and granularity is ‘word’, reverses only the second half of each wordp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ReplaceFunFonts(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, p=1.0)
- Parameters
aug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or allvary_fonts (
bool
) – whether or not to switch font in each replacementfonts_path (
str
) – iopath uri where the fonts are storedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words or characters depending on the granularity with fun fonts applied
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ReplaceSimilarChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
Optional
[str
]) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces letters in each text with similar characters
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ReplaceSimilarUnicodeChars(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
str
) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces letters in each text with similar unicodes
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ReplaceUpsideDown(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, p=1.0)
- Parameters
aug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or alln (
int
) – number of augmentations to be performed for each textp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Flips words in the text upside down depending on the granularity
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.ReplaceWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words in each text based on a given mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.SimulateTypos(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, p=1.0)
- Parameters
aug_char_p (
float
) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swappingaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swappingaug_char_min (
int
) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_char_max (
int
) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each texttypo_type (
str
) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”misspelling_dict_path (
Optional
[str
]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be Nonemax_typo_length (
int
) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checkedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.SplitWords(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word for a splitaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Splits words in the text into subwords
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- class augly.text.SwapGenderedWords(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
Bases:
augly.text.transforms.BaseTransform
- __init__(aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, p=1.0)
- Parameters
aug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,str
]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dictpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(texts, metadata=None, **aug_kwargs)
Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returnedaug_kwargs – kwargs to pass into the augmentation that will override values set in __init__
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.apply_lambda(texts, aug_function=<function <lambda>>, metadata=None, **kwargs)
Apply a user-defined lambda on a list of text documents
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_function (
Callable
[...
,List
[str
]]) – the augmentation function to be applied onto the text (should expect a list of text documents as input and return a list of text documents)**kwargs –
the input attributes to be passed into the augmentation function to be applied
metadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.apply_lambda_intensity(aug_function, **kwargs)
- Return type
float
- augly.text.change_case(texts, granularity='word', cadence=1.0, case='random', seed=10, metadata=None)
Changes the case (e.g. upper, lower, title) of random chars, words, or the entire text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ (case of the entire text is changed), ‘word’ (case of random words is changed), or ‘char’ (case of random chars is changed)cadence (
float
) – how frequent (i.e. between this many characters/words) to change the case. Must be at least 1.0. Non-integer values are used as an ‘average’ cadence. Not used for granularity ‘all’case (
str
) – the case to change words to; valid values are ‘lower’, ‘upper’, ‘title’, or ‘random’ (in which case the case will randomly be changed to one of the previous three)seed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.change_case_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.contractions(texts, aug_p=0.3, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/contractions.json', max_contraction_length=2, seed=10, metadata=None)
Replaces pairs (or longer strings) of words with contractions given a mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – the probability that each pair (or longer string) of words will be replaced with the corresponding contraction, if there is one in the mappingmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedmax_contraction_length (
int
) – the words in each text will be checked for matches in the mapping up to this length; i.e. if ‘max_contraction_length’ is 3 then every substring of 2 and 3 words will be checkedseed (
Optional
[int
]) – if provided, this will set the random seed to ensure consistency between runsmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.contractions_intensity(aug_p, **kwargs)
- Return type
float
- augly.text.get_baseline(texts, metadata=None)
Generates a baseline by tokenizing and detokenizing the text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.get_baseline_intensity(**kwargs)
- Return type
float
- augly.text.insert_punctuation_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts punctuation characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a punctuation character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different punctuation char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.insert_punctuation_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.insert_whitespace_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts whitespace characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a whitespace character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different whitespace char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.insert_whitespace_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.insert_zero_width_chars(texts, granularity='all', cadence=1.0, vary_chars=False, metadata=None)
Inserts zero-width characters in each input text
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – ‘all’ or ‘word’ – if ‘word’, a new char is picked and the cadence resets for each word in the textcadence (
float
) – how frequent (i.e. between this many characters) to insert a zero-width character. Must be at least 1.0. Non-integer values are used as an ‘average’ cadencevary_chars (
bool
) – if true, picks a different zero-width char each time one is used instead of just one per word/textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.insert_zero_width_chars_intensity(granularity, cadence, **kwargs)
- Return type
float
- augly.text.merge_words(texts, aug_word_p=0.3, min_char=2, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)
Merges words in the text together
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word to be mergedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.merge_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float
- augly.text.replace_bidirectional(texts, granularity='all', split_word=False, metadata=None)
Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedgranularity (
str
) – the level at which the font is applied; this must be either ‘word’ or ‘all’split_word (
bool
) – if true and granularity is ‘word’, reverses only the second half of each wordmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented texts
- augly.text.replace_bidirectional_intensity(**kwargs)
- Return type
float
- augly.text.replace_fun_fonts(texts, aug_p=0.3, aug_min=1, aug_max=10000, granularity='all', vary_fonts=False, fonts_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/fun_fonts.json', n=1, priority_words=None, metadata=None)
Replaces words or characters depending on the granularity with fun fonts applied
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be be either word, char, or allvary_fonts (
bool
) – whether or not to switch font in each replacementfonts_path (
str
) – iopath uri where the fonts are storedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.replace_fun_fonts_intensity(aug_p, aug_max, granularity, **kwargs)
- Return type
float
- augly.text.replace_similar_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path=None, priority_words=None, metadata=None)
Replaces letters in each text with similar characters
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
Optional
[str
]) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.replace_similar_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.replace_similar_unicode_chars(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1000, aug_word_min=1, aug_word_max=1000, n=1, mapping_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/letter_unicode_mapping.json', priority_words=None, metadata=None)
Replaces letters in each text with similar unicodes
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each wordaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentationaug_char_min (
int
) – minimum # of letters to be replaced in each wordaug_char_max (
int
) – maximum # of letters to be replaced in each wordaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping_path (
str
) – iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.replace_similar_unicode_chars_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.replace_upside_down(texts, aug_p=0.3, aug_min=1, aug_max=1000, granularity='all', n=1, metadata=None)
Flips words in the text upside down depending on the granularity
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_p (
float
) – probability of words to be augmentedaug_min (
int
) – minimum # of words to be augmentedaug_max (
int
) – maximum # of words to be augmentedgranularity (
str
) – the level at which the font is applied; this must be either word, char, or alln (
int
) – number of augmentations to be performed for each textmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.replace_upside_down_intensity(aug_p, aug_max, granularity, **kwargs)
- Return type
float
- augly.text.replace_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping=None, priority_words=None, ignore_words=None, metadata=None)
Replaces words in each text based on a given mapping
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,Any
],None
]) – either a dictionary representing the mapping or an iopath uri where the mapping is storedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.replace_words_intensity(aug_word_p, aug_word_max, mapping, **kwargs)
- Return type
float
- augly.text.simulate_typos(texts, aug_char_p=0.3, aug_word_p=0.3, min_char=2, aug_char_min=1, aug_char_max=1, aug_word_min=1, aug_word_max=1000, n=1, typo_type='all', misspelling_dict_path='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/misspelling.json', max_typo_length=1, priority_words=None, metadata=None)
Simulates typos in each text using misspellings, keyboard distance, and swapping. You can specify a typo_type: charmix, which does a combination of character-level modifications (delete, insert, substitute, & swap); keyboard, which swaps characters which those close to each other on the QWERTY keyboard; misspelling, which replaces words with misspellings defined in a dictionary file; or all, which will apply a random combination of all 4
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_char_p (
float
) – probability of letters to be replaced in each word; This is only applicable for keyboard distance and swappingaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of letters in a word for a valid augmentation; This is only applicable for keyboard distance and swappingaug_char_min (
int
) – minimum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_char_max (
int
) – maximum # of letters to be replaced/swapped in each word; This is only applicable for keyboard distance and swappingaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each texttypo_type (
str
) – the type of typos to apply to the text; valid values are “misspelling”, “keyboard”, “charmix”, or “all”misspelling_dict_path (
Optional
[str
]) – iopath uri where the misspelling dictionary is stored; must be specified if typo_type is “misspelling” or “all”, but otherwise can be Nonemax_typo_length (
int
) – the words in the misspelling dictionary will be checked for matches in the mapping up to this length; i.e. if ‘max_typo_length’ is 3 then every substring of 2 and 3 words will be checkedpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.simulate_typos_intensity(aug_char_p, aug_word_p, aug_char_max, aug_word_max, **kwargs)
- Return type
float
- augly.text.split_words(texts, aug_word_p=0.3, min_char=4, aug_word_min=1, aug_word_max=1000, n=1, priority_words=None, metadata=None)
Splits words in the text into subwords
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedmin_char (
int
) – minimum # of characters in a word for a splitaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.split_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float
- augly.text.swap_gendered_words(texts, aug_word_p=0.3, aug_word_min=1, aug_word_max=1000, n=1, mapping='/home/docs/checkouts/readthedocs.org/user_builds/augly/checkouts/latest/augly/assets/text/gendered_words_mapping.json', priority_words=None, ignore_words=None, metadata=None)
Replaces words in each text based on a provided mapping, which can either be a dict already constructed mapping words from one gender to another or a file path to a dict. Note: the logic in this augmentation was originally written by Adina Williams and has been used in influential work, e.g. https://arxiv.org/pdf/2005.00614.pdf
- Parameters
texts (
Union
[str
,List
[str
]]) – a string or a list of text documents to be augmentedaug_word_p (
float
) – probability of words to be augmentedaug_word_min (
int
) – minimum # of words to be augmentedaug_word_max (
int
) – maximum # of words to be augmentedn (
int
) – number of augmentations to be performed for each textmapping (
Union
[str
,Dict
[str
,str
]]) – a mapping of words from one gender to another; a mapping can be supplied either directly as a dict or as a filepath to a json file containing the dictpriority_words (
Optional
[List
[str
]]) – list of target words that the augmenter should prioritize to augment firstignore_words (
Optional
[List
[str
]]) – list of words that the augmenter should not augmentmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest length, etc. will be appended to the inputted list. If set to None, no metadata will be appended or returned
- Return type
Union
[str
,List
[str
]]- Returns
the list of augmented text documents
- augly.text.swap_gendered_words_intensity(aug_word_p, aug_word_max, **kwargs)
- Return type
float