augly.audio package

Submodules

augly.audio.composition module

class augly.audio.composition.BaseComposition(transforms, p=1.0)

Bases: object

__init__(transforms, p=1.0)

Parameters

transforms (List[BaseTransform]) – a list of transforms
p (float) – the probability of the transform being applied; default value is 1.0

class augly.audio.composition.Compose(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies the list of transforms in order to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.composition.OneOf(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies one of the transforms to the audio (with probability p)

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(transforms, p=1.0)

Parameters

transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the audio
p (float) – the probability of the transform being applied; default value is 1.0

augly.audio.functional module

augly.audio.functional.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)

Mixes in a background sound into the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
snr_level_db (float) – signal-to-noise ratio in dB
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)

Apply a user-defined lambda to the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
**kwargs –
the input attributes to be passed into aug_function

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)

Changes the volume of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds
snr_level_db (float) – signal-to-noise ratio in dB
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the harmonic part of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)

Inserts audio into a background clip in a non-overlapping manner.

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
offset_factor (float) – insert point relative to the background duration (this parameter is multiplied by the background duration)
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)

Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)

Loops the audio ‘n’ times

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
n (int) – the number of times the audio will be looped
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed
axis (int) – axis along which to compute the norm
threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized
fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
center_hz (float) – point in the frequency spectrum at which EQ is applied
q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth
gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the percussive part of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
n_steps (float) – each step is equal to one semitone
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)

Adds reverberation to the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics
hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies
room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room
stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels
pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail
wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix
wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)

Time-stretches the audio by a fixed rate

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
rate (float) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.intensity module

augly.audio.intensity.add_background_noise_intensity(snr_level_db=10.0, **kwargs)

Return type: float

augly.audio.intensity.apply_lambda_intensity(aug_function, **kwargs)

Return type: float

augly.audio.intensity.change_volume_intensity(volume_db=0.0, **kwargs)

Return type: float

augly.audio.intensity.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)

Return type: float

augly.audio.intensity.clip_intensity(duration_factor=1.0, **kwargs)

Return type: float

augly.audio.intensity.harmonic_intensity(**kwargs)

Return type: float

augly.audio.intensity.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)

Return type: float

augly.audio.intensity.insert_in_background_intensity(metadata, **kwargs)

Return type: float

augly.audio.intensity.invert_channels_intensity(metadata, **kwargs)

Return type: float

augly.audio.intensity.loop_intensity(n=1, **kwargs)

Return type: float

augly.audio.intensity.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)

Return type: float

augly.audio.intensity.normalize_intensity(norm=inf, **kwargs)

Return type: float

augly.audio.intensity.peaking_equalizer_intensity(q, gain_db, **kwargs)

Return type: float

augly.audio.intensity.percussive_intensity(**kwargs)

Return type: float

augly.audio.intensity.pitch_shift_intensity(n_steps=2.0, **kwargs)

Return type: float

augly.audio.intensity.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)

Return type: float

augly.audio.intensity.speed_intensity(factor=2.0, **kwargs)

Return type: float

augly.audio.intensity.tempo_intensity(factor=2.0, **kwargs)

Return type: float

augly.audio.intensity.time_stretch_intensity(rate=1.5, **kwargs)

Return type: float

augly.audio.intensity.to_mono_intensity(metadata, **kwargs)

Return type: float

augly.audio.transforms module

class augly.audio.transforms.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Parameters

background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
snr_level_db (float) – signal-to-noise ratio in dB
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Mixes in a background sound into the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Parameters

aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Apply a user-defined lambda to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.BaseTransform(p=1.0)

Bases: object

__call__(audio, sample_rate=44100, metadata=None, force=False)

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
force (bool) – if set to True, the transform will be applied. otherwise, application is determined by the probability set

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(p=1.0)

Parameters: p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

This function is to be implemented in the child classes. From this function, call the augmentation function with the parameters specified

Return type: Tuple[ndarray, int]

class augly.audio.transforms.ChangeVolume(volume_db=0.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(volume_db=0.0, p=1.0)

Parameters

volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the volume of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Clicks(seconds_between_clicks=0.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(seconds_between_clicks=0.5, p=1.0)

Parameters

seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, duration_factor=1.0, p=1.0)

Parameters

offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Parameters

kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the harmonic part of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.HighPassFilter(cutoff_hz=3000.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=3000.0, p=1.0)

Parameters

cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Parameters

offset_factor (float) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Non-overlapping insert audio in a background audio.

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.InvertChannels(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Inverts the channels of the audio.

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Loop(n=1, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n=1, p=1.0)

Parameters

n (int) – the number of times the audio will be looped
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Loops the audio ‘n’ times

Parameters

audio (ndarray) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.LowPassFilter(cutoff_hz=500.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=500.0, p=1.0)

Parameters

cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Parameters

norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed
axis (int) – axis along which to compute the norm
threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized
fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Parameters

center_hz (float) – point in the frequency spectrum at which EQ is applied
q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth
gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Parameters

kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the percussive part of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.PitchShift(n_steps=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n_steps=1.0, p=1.0)

Parameters

n_steps (float) – each step is equal to one semitone
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Parameters

reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics
hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies
room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room
stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels
pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail
wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix
wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds reverberation to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Speed(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)

Parameters

factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Tempo(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)

Parameters

factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.TimeStretch(rate=1.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(rate=1.5, p=1.0)

Parameters

rate (float) – the time stretch factor
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Time-stretches the audio by a fixed rate

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.ToMono(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

Module contents

class augly.audio.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Parameters

background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
snr_level_db (float) – signal-to-noise ratio in dB
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Mixes in a background sound into the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Parameters

aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Apply a user-defined lambda to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ChangeVolume(volume_db=0.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(volume_db=0.0, p=1.0)

Parameters

volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the volume of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Clicks(seconds_between_clicks=0.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(seconds_between_clicks=0.5, p=1.0)

Parameters

seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, duration_factor=1.0, p=1.0)

Parameters

offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Compose(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies the list of transforms in order to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Parameters

kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the harmonic part of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.HighPassFilter(cutoff_hz=3000.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=3000.0, p=1.0)

Parameters

cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Parameters

offset_factor (float) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Non-overlapping insert audio in a background audio.

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.InvertChannels(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Inverts the channels of the audio.

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Loop(n=1, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n=1, p=1.0)

Parameters

n (int) – the number of times the audio will be looped
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Loops the audio ‘n’ times

Parameters

audio (ndarray) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.LowPassFilter(cutoff_hz=500.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=500.0, p=1.0)

Parameters

cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Parameters

norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed
axis (int) – axis along which to compute the norm
threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized
fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.OneOf(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies one of the transforms to the audio (with probability p)

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(transforms, p=1.0)

Parameters

transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the audio
p (float) – the probability of the transform being applied; default value is 1.0

class augly.audio.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Parameters

center_hz (float) – point in the frequency spectrum at which EQ is applied
q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth
gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Parameters

kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the percussive part of the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.PitchShift(n_steps=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n_steps=1.0, p=1.0)

Parameters

n_steps (float) – each step is equal to one semitone
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Parameters

reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics
hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies
room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room
stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels
pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail
wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix
wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds reverberation to the audio

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Speed(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)

Parameters

factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Tempo(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)

Parameters

factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.TimeStretch(rate=1.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(rate=1.5, p=1.0)

Parameters

rate (float) – the time stretch factor
p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Time-stretches the audio by a fixed rate

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ToMono(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters

audio (ndarray) – the audio array to be augmented
sample_rate (int) – the audio sample rate of the inputted audio
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)

Mixes in a background sound into the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise
snr_level_db (float) – signal-to-noise ratio in dB
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.add_background_noise_intensity(snr_level_db=10.0, **kwargs)

Return type: float

augly.audio.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)

Apply a user-defined lambda to the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
**kwargs –
the input attributes to be passed into aug_function

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.apply_lambda_intensity(aug_function, **kwargs)

Return type: float

augly.audio.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)

Changes the volume of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.change_volume_intensity(volume_db=0.0, **kwargs)

Return type: float

augly.audio.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds
snr_level_db (float) – signal-to-noise ratio in dB
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)

Return type: float

augly.audio.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.clip_intensity(duration_factor=1.0, **kwargs)

Return type: float

augly.audio.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the harmonic part of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.harmonic_intensity(**kwargs)

Return type: float

augly.audio.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)

Return type: float

augly.audio.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)

Inserts audio into a background clip in a non-overlapping manner.

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
offset_factor (float) – insert point relative to the background duration (this parameter is multiplied by the background duration)
background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.
seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.insert_in_background_intensity(metadata, **kwargs)

Return type: float

augly.audio.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)

Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.invert_channels_intensity(metadata, **kwargs)

Return type: float

augly.audio.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)

Loops the audio ‘n’ times

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
n (int) – the number of times the audio will be looped
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.loop_intensity(n=1, **kwargs)

Return type: float

augly.audio.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)

Return type: float

augly.audio.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed
axis (int) – axis along which to compute the norm
threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized
fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.normalize_intensity(norm=inf, **kwargs)

Return type: float

augly.audio.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
center_hz (float) – point in the frequency spectrum at which EQ is applied
q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth
gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.peaking_equalizer_intensity(q, gain_db, **kwargs)

Return type: float

augly.audio.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the percussive part of the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
kernel_size (int) – kernel size for the median filters
power (float) – exponent for the Wiener filter when constructing soft mask matrices
margin (float) – margin size for the masks
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.percussive_intensity(**kwargs)

Return type: float

augly.audio.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
n_steps (float) – each step is equal to one semitone
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.pitch_shift_intensity(n_steps=2.0, **kwargs)

Return type: float

augly.audio.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)

Adds reverberation to the audio

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics
hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies
room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room
stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels
pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail
wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix
wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)

Return type: float

augly.audio.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.speed_intensity(factor=2.0, **kwargs)

Return type: float

augly.audio.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.tempo_intensity(factor=2.0, **kwargs)

Return type: float

augly.audio.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)

Time-stretches the audio by a fixed rate

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
rate (float) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.time_stretch_intensity(rate=1.5, **kwargs)

Return type: float

augly.audio.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters

audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented
sample_rate (int) – the audio sample rate of the inputted audio
output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned
metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.to_mono_intensity(metadata, **kwargs)

Return type: float