augly.audio package

Submodules

augly.audio.composition module

class augly.audio.composition.BaseComposition(transforms, p=1.0)

Bases: object

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms

  • p (float) – the probability of the transform being applied; default value is 1.0

class augly.audio.composition.Compose(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies the list of transforms in order to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.composition.OneOf(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies one of the transforms to the audio (with probability p)

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the audio

  • p (float) – the probability of the transform being applied; default value is 1.0

augly.audio.functional module

augly.audio.functional.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)

Mixes in a background sound into the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • snr_level_db (float) – signal-to-noise ratio in dB

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)

Apply a user-defined lambda to the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

  • **kwargs

    the input attributes to be passed into aug_function

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)

Changes the volume of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds

  • snr_level_db (float) – signal-to-noise ratio in dB

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the harmonic part of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)

Inserts audio into a background clip in a non-overlapping manner.

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • offset_factor (float) – insert point relative to the background duration (this parameter is multiplied by the background duration)

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)

Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)

Loops the audio ‘n’ times

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • n (int) – the number of times the audio will be looped

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed

  • axis (int) – axis along which to compute the norm

  • threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized

  • fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • center_hz (float) – point in the frequency spectrum at which EQ is applied

  • q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth

  • gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the percussive part of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • n_steps (float) – each step is equal to one semitone

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)

Adds reverberation to the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics

  • hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies

  • room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room

  • stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels

  • pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail

  • wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix

  • wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)

Time-stretches the audio by a fixed rate

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • rate (float) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.functional.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.intensity module

augly.audio.intensity.add_background_noise_intensity(snr_level_db=10.0, **kwargs)
Return type

float

augly.audio.intensity.apply_lambda_intensity(aug_function, **kwargs)
Return type

float

augly.audio.intensity.change_volume_intensity(volume_db=0.0, **kwargs)
Return type

float

augly.audio.intensity.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)
Return type

float

augly.audio.intensity.clip_intensity(duration_factor=1.0, **kwargs)
Return type

float

augly.audio.intensity.harmonic_intensity(**kwargs)
Return type

float

augly.audio.intensity.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)
Return type

float

augly.audio.intensity.insert_in_background_intensity(metadata, **kwargs)
Return type

float

augly.audio.intensity.invert_channels_intensity(metadata, **kwargs)
Return type

float

augly.audio.intensity.loop_intensity(n=1, **kwargs)
Return type

float

augly.audio.intensity.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)
Return type

float

augly.audio.intensity.normalize_intensity(norm=inf, **kwargs)
Return type

float

augly.audio.intensity.peaking_equalizer_intensity(q, gain_db, **kwargs)
Return type

float

augly.audio.intensity.percussive_intensity(**kwargs)
Return type

float

augly.audio.intensity.pitch_shift_intensity(n_steps=2.0, **kwargs)
Return type

float

augly.audio.intensity.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)
Return type

float

augly.audio.intensity.speed_intensity(factor=2.0, **kwargs)
Return type

float

augly.audio.intensity.tempo_intensity(factor=2.0, **kwargs)
Return type

float

augly.audio.intensity.time_stretch_intensity(rate=1.5, **kwargs)
Return type

float

augly.audio.intensity.to_mono_intensity(metadata, **kwargs)
Return type

float

augly.audio.transforms module

class augly.audio.transforms.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
Parameters
  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • snr_level_db (float) – signal-to-noise ratio in dB

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Mixes in a background sound into the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
Parameters
  • aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Apply a user-defined lambda to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.BaseTransform(p=1.0)

Bases: object

__call__(audio, sample_rate=44100, metadata=None, force=False)
Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

  • force (bool) – if set to True, the transform will be applied. otherwise, application is determined by the probability set

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(p=1.0)
Parameters

p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

This function is to be implemented in the child classes. From this function, call the augmentation function with the parameters specified

Return type

Tuple[ndarray, int]

class augly.audio.transforms.ChangeVolume(volume_db=0.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(volume_db=0.0, p=1.0)
Parameters
  • volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the volume of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Clicks(seconds_between_clicks=0.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(seconds_between_clicks=0.5, p=1.0)
Parameters
  • seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, duration_factor=1.0, p=1.0)
Parameters
  • offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Parameters
  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the harmonic part of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.HighPassFilter(cutoff_hz=3000.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=3000.0, p=1.0)
Parameters
  • cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
Parameters
  • offset_factor (float) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Non-overlapping insert audio in a background audio.

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.InvertChannels(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Inverts the channels of the audio.

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Loop(n=1, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n=1, p=1.0)
Parameters
  • n (int) – the number of times the audio will be looped

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Loops the audio ‘n’ times

Parameters
  • audio (ndarray) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.LowPassFilter(cutoff_hz=500.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=500.0, p=1.0)
Parameters
  • cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
Parameters
  • norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed

  • axis (int) – axis along which to compute the norm

  • threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized

  • fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
Parameters
  • center_hz (float) – point in the frequency spectrum at which EQ is applied

  • q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth

  • gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Parameters
  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the percussive part of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.PitchShift(n_steps=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n_steps=1.0, p=1.0)
Parameters
  • n_steps (float) – each step is equal to one semitone

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
Parameters
  • reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics

  • hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies

  • room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room

  • stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels

  • pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail

  • wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix

  • wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds reverberation to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Speed(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)
Parameters
  • factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.Tempo(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)
Parameters
  • factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.TimeStretch(rate=1.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(rate=1.5, p=1.0)
Parameters
  • rate (float) – the time stretch factor

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Time-stretches the audio by a fixed rate

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.transforms.ToMono(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

Module contents

class augly.audio.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
Parameters
  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • snr_level_db (float) – signal-to-noise ratio in dB

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Mixes in a background sound into the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
Parameters
  • aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Apply a user-defined lambda to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ChangeVolume(volume_db=0.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(volume_db=0.0, p=1.0)
Parameters
  • volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the volume of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Clicks(seconds_between_clicks=0.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(seconds_between_clicks=0.5, p=1.0)
Parameters
  • seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, duration_factor=1.0, p=1.0)
Parameters
  • offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Compose(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies the list of transforms in order to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Parameters
  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the harmonic part of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.HighPassFilter(cutoff_hz=3000.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=3000.0, p=1.0)
Parameters
  • cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
Parameters
  • offset_factor (float) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that these results remain reproducible

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Non-overlapping insert audio in a background audio.

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.InvertChannels(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Inverts the channels of the audio.

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Loop(n=1, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n=1, p=1.0)
Parameters
  • n (int) – the number of times the audio will be looped

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Loops the audio ‘n’ times

Parameters
  • audio (ndarray) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.LowPassFilter(cutoff_hz=500.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(cutoff_hz=500.0, p=1.0)
Parameters
  • cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
Parameters
  • norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed

  • axis (int) – axis along which to compute the norm

  • threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized

  • fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.OneOf(transforms, p=1.0)

Bases: augly.audio.composition.BaseComposition

__call__(audio, sample_rate, metadata=None)

Applies one of the transforms to the audio (with probability p)

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

__init__(transforms, p=1.0)
Parameters
  • transforms (List[BaseTransform]) – a list of transforms to select from; one of which will be chosen to be applied to the audio

  • p (float) – the probability of the transform being applied; default value is 1.0

class augly.audio.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
Parameters
  • center_hz (float) – point in the frequency spectrum at which EQ is applied

  • q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth

  • gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Parameters
  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Extracts the percussive part of the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.PitchShift(n_steps=1.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(n_steps=1.0, p=1.0)
Parameters
  • n_steps (float) – each step is equal to one semitone

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
Parameters
  • reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics

  • hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies

  • room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room

  • stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels

  • pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail

  • wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix

  • wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adds reverberation to the audio

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Speed(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)
Parameters
  • factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.Tempo(factor=2.0, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(factor=2.0, p=1.0)
Parameters
  • factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.TimeStretch(rate=1.5, p=1.0)

Bases: augly.audio.transforms.BaseTransform

__init__(rate=1.5, p=1.0)
Parameters
  • rate (float) – the time stretch factor

  • p (float) – the probability of the transform being applied; default value is 1.0

apply_transform(audio, sample_rate, metadata=None)

Time-stretches the audio by a fixed rate

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

class augly.audio.ToMono(p=1.0)

Bases: augly.audio.transforms.BaseTransform

apply_transform(audio, sample_rate, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters
  • audio (ndarray) – the audio array to be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)

Mixes in a background sound into the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise

  • snr_level_db (float) – signal-to-noise ratio in dB

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.add_background_noise_intensity(snr_level_db=10.0, **kwargs)
Return type

float

augly.audio.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)

Apply a user-defined lambda to the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • aug_function (Callable[..., Tuple[ndarray, int]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

  • **kwargs

    the input attributes to be passed into aug_function

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.apply_lambda_intensity(aug_function, **kwargs)
Return type

float

augly.audio.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)

Changes the volume of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • volume_db (float) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.change_volume_intensity(volume_db=0.0, **kwargs)
Return type

float

augly.audio.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)

Adds clicks to the audio at a given regular interval

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • seconds_between_clicks (float) – the amount of time between each click that will be added to the audio, in seconds

  • snr_level_db (float) – signal-to-noise ratio in dB

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)
Return type

float

augly.audio.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)

Clips the audio using the specified offset and duration factors

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • offset_factor (float) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • duration_factor (float) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.clip_intensity(duration_factor=1.0, **kwargs)
Return type

float

augly.audio.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the harmonic part of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.harmonic_intensity(**kwargs)
Return type

float

augly.audio.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)

Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • cutoff_hz (float) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this point

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)
Return type

float

augly.audio.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)

Inserts audio into a background clip in a non-overlapping manner.

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • offset_factor (float) – insert point relative to the background duration (this parameter is multiplied by the background duration)

  • background_audio (Union[str, ndarray, None]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.

  • seed (Union[int, Any, None]) – a NumPy random generator (or seed) such that the results remain reproducible

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.insert_in_background_intensity(metadata, **kwargs)
Return type

float

augly.audio.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)

Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.invert_channels_intensity(metadata, **kwargs)
Return type

float

augly.audio.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)

Loops the audio ‘n’ times

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • n (int) – the number of times the audio will be looped

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.loop_intensity(n=1, **kwargs)
Return type

float

augly.audio.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)

Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • cutoff_hz (float) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this point

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)
Return type

float

augly.audio.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)

Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • norm (Optional[float]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performed

  • axis (int) – axis along which to compute the norm

  • threshold (Optional[float]) – if provided, only the columns (or rows) with norm of at least threshold are normalized

  • fill (Optional[bool]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.normalize_intensity(norm=inf, **kwargs)
Return type

float

augly.audio.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)

Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • center_hz (float) – point in the frequency spectrum at which EQ is applied

  • q (float) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidth

  • gain_db (float) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gain

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.peaking_equalizer_intensity(q, gain_db, **kwargs)
Return type

float

augly.audio.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)

Extracts the percussive part of the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • kernel_size (int) – kernel size for the median filters

  • power (float) – exponent for the Wiener filter when constructing soft mask matrices

  • margin (float) – margin size for the masks

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.percussive_intensity(**kwargs)
Return type

float

augly.audio.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)

Shifts the pitch of the audio by n_steps

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • n_steps (float) – each step is equal to one semitone

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.pitch_shift_intensity(n_steps=2.0, **kwargs)
Return type

float

augly.audio.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)

Adds reverberation to the audio

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • reverberance (float) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acoustics

  • hf_damping (float) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequencies

  • room_scale (float) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small room

  • stereo_depth (float) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channels

  • pre_delay (float) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tail

  • wet_gain (float) – (db) applies volume adjustment to the reverberation (“wet”) component in the mix

  • wet_only (bool) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removed

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)
Return type

float

augly.audio.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Changes the speed of the audio, affecting pitch as well

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • factor (float) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.speed_intensity(factor=2.0, **kwargs)
Return type

float

augly.audio.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)

Adjusts the tempo of the audio by a given factor

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • factor (float) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitch

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.tempo_intensity(factor=2.0, **kwargs)
Return type

float

augly.audio.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)

Time-stretches the audio by a fixed rate

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • rate (float) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.time_stretch_intensity(rate=1.5, **kwargs)
Return type

float

augly.audio.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)

Converts the audio from stereo to mono by averaging samples across channels

Parameters
  • audio (Union[str, ndarray]) – the path to the audio or a variable of type np.ndarray that will be augmented

  • sample_rate (int) – the audio sample rate of the inputted audio

  • output_path (Optional[str]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returned

  • metadata (Optional[List[Dict[str, Any]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended

Return type

Tuple[ndarray, int]

Returns

the augmented audio array and sample rate

augly.audio.to_mono_intensity(metadata, **kwargs)
Return type

float