augly.audio package
Submodules
augly.audio.composition module
- class augly.audio.composition.BaseComposition(transforms, p=1.0)
Bases:
object
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transformsp (
float
) – the probability of the transform being applied; default value is 1.0
- class augly.audio.composition.Compose(transforms, p=1.0)
Bases:
augly.audio.composition.BaseComposition
- __call__(audio, sample_rate, metadata=None)
Applies the list of transforms in order to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.composition.OneOf(transforms, p=1.0)
Bases:
augly.audio.composition.BaseComposition
- __call__(audio, sample_rate, metadata=None)
Applies one of the transforms to the audio (with probability p)
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transforms to select from; one of which will be chosen to be applied to the audiop (
float
) – the probability of the transform being applied; default value is 1.0
augly.audio.functional module
- augly.audio.functional.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)
Mixes in a background sound into the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiobackground_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noisesnr_level_db (
float
) – signal-to-noise ratio in dBseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that the results remain reproducibleoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)
Apply a user-defined lambda to the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioaug_function (
Callable
[...
,Tuple
[ndarray
,int
]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended**kwargs –
the input attributes to be passed into aug_function
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)
Changes the volume of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiovolume_db (
float
) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)
Adds clicks to the audio at a given regular interval
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioseconds_between_clicks (
float
) – the amount of time between each click that will be added to the audio, in secondssnr_level_db (
float
) – signal-to-noise ratio in dBoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)
Clips the audio using the specified offset and duration factors
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooffset_factor (
float
) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)duration_factor (
float
) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)
Extracts the harmonic part of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiokernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)
Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocutoff_hz (
float
) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this pointoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)
Inserts audio into a background clip in a non-overlapping manner.
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooffset_factor (
float
) – insert point relative to the background duration (this parameter is multiplied by the background duration)background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.seed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that the results remain reproducibleoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)
Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)
Loops the audio ‘n’ times
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audion (
int
) – the number of times the audio will be loopedoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)
Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocutoff_hz (
float
) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this pointoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)
Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audionorm (
Optional
[float
]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performedaxis (
int
) – axis along which to compute the normthreshold (
Optional
[float
]) – if provided, only the columns (or rows) with norm of at least threshold are normalizedfill (
Optional
[bool
]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)
Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocenter_hz (
float
) – point in the frequency spectrum at which EQ is appliedq (
float
) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidthgain_db (
float
) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gainoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)
Extracts the percussive part of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiokernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)
Shifts the pitch of the audio by n_steps
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audion_steps (
float
) – each step is equal to one semitoneoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)
Adds reverberation to the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioreverberance (
float
) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acousticshf_damping (
float
) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequenciesroom_scale (
float
) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small roomstereo_depth (
float
) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channelspre_delay (
float
) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tailwet_gain (
float
) – (db) applies volume adjustment to the reverberation (“wet”) component in the mixwet_only (
bool
) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removedoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)
Changes the speed of the audio, affecting pitch as well
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiofactor (
float
) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factoroutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)
Adjusts the tempo of the audio by a given factor
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiofactor (
float
) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitchoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)
Time-stretches the audio by a fixed rate
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiorate (
float
) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factoroutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.functional.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)
Converts the audio from stereo to mono by averaging samples across channels
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
augly.audio.intensity module
- augly.audio.intensity.add_background_noise_intensity(snr_level_db=10.0, **kwargs)
- Return type
float
- augly.audio.intensity.apply_lambda_intensity(aug_function, **kwargs)
- Return type
float
- augly.audio.intensity.change_volume_intensity(volume_db=0.0, **kwargs)
- Return type
float
- augly.audio.intensity.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)
- Return type
float
- augly.audio.intensity.clip_intensity(duration_factor=1.0, **kwargs)
- Return type
float
- augly.audio.intensity.harmonic_intensity(**kwargs)
- Return type
float
- augly.audio.intensity.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)
- Return type
float
- augly.audio.intensity.insert_in_background_intensity(metadata, **kwargs)
- Return type
float
- augly.audio.intensity.invert_channels_intensity(metadata, **kwargs)
- Return type
float
- augly.audio.intensity.loop_intensity(n=1, **kwargs)
- Return type
float
- augly.audio.intensity.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)
- Return type
float
- augly.audio.intensity.normalize_intensity(norm=inf, **kwargs)
- Return type
float
- augly.audio.intensity.peaking_equalizer_intensity(q, gain_db, **kwargs)
- Return type
float
- augly.audio.intensity.percussive_intensity(**kwargs)
- Return type
float
- augly.audio.intensity.pitch_shift_intensity(n_steps=2.0, **kwargs)
- Return type
float
- augly.audio.intensity.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)
- Return type
float
- augly.audio.intensity.speed_intensity(factor=2.0, **kwargs)
- Return type
float
- augly.audio.intensity.tempo_intensity(factor=2.0, **kwargs)
- Return type
float
- augly.audio.intensity.time_stretch_intensity(rate=1.5, **kwargs)
- Return type
float
- augly.audio.intensity.to_mono_intensity(metadata, **kwargs)
- Return type
float
augly.audio.transforms module
- class augly.audio.transforms.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
- Parameters
background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noisesnr_level_db (
float
) – signal-to-noise ratio in dBseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that these results remain reproduciblep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Mixes in a background sound into the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
- Parameters
aug_function (
Callable
[...
,Tuple
[ndarray
,int
]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Apply a user-defined lambda to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.BaseTransform(p=1.0)
Bases:
object
- __call__(audio, sample_rate=44100, metadata=None, force=False)
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appendedforce (
bool
) – if set to True, the transform will be applied. otherwise, application is determined by the probability set
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- __init__(p=1.0)
- Parameters
p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
This function is to be implemented in the child classes. From this function, call the augmentation function with the parameters specified
- Return type
Tuple
[ndarray
,int
]
- class augly.audio.transforms.ChangeVolume(volume_db=0.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(volume_db=0.0, p=1.0)
- Parameters
volume_db (
float
) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audiop (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Changes the volume of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Clicks(seconds_between_clicks=0.5, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(seconds_between_clicks=0.5, p=1.0)
- Parameters
seconds_between_clicks (
float
) – the amount of time between each click that will be added to the audio, in secondsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adds clicks to the audio at a given regular interval
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(offset_factor=0.0, duration_factor=1.0, p=1.0)
- Parameters
offset_factor (
float
) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)duration_factor (
float
) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Clips the audio using the specified offset and duration factors
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
- Parameters
kernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Extracts the harmonic part of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.HighPassFilter(cutoff_hz=3000.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(cutoff_hz=3000.0, p=1.0)
- Parameters
cutoff_hz (
float
) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this pointp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
- Parameters
offset_factor (
float
) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noiseseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that these results remain reproduciblep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Non-overlapping insert audio in a background audio.
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.InvertChannels(p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- apply_transform(audio, sample_rate, metadata=None)
Inverts the channels of the audio.
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Loop(n=1, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(n=1, p=1.0)
- Parameters
n (
int
) – the number of times the audio will be loopedp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Loops the audio ‘n’ times
- Parameters
audio (
ndarray
) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.LowPassFilter(cutoff_hz=500.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(cutoff_hz=500.0, p=1.0)
- Parameters
cutoff_hz (
float
) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this pointp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
- Parameters
norm (
Optional
[float
]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performedaxis (
int
) – axis along which to compute the normthreshold (
Optional
[float
]) – if provided, only the columns (or rows) with norm of at least threshold are normalizedfill (
Optional
[bool
]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
- Parameters
center_hz (
float
) – point in the frequency spectrum at which EQ is appliedq (
float
) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidthgain_db (
float
) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gainp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
- Parameters
kernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Extracts the percussive part of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.PitchShift(n_steps=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(n_steps=1.0, p=1.0)
- Parameters
n_steps (
float
) – each step is equal to one semitonep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Shifts the pitch of the audio by n_steps
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
- Parameters
reverberance (
float
) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acousticshf_damping (
float
) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequenciesroom_scale (
float
) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small roomstereo_depth (
float
) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channelspre_delay (
float
) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tailwet_gain (
float
) – (db) applies volume adjustment to the reverberation (“wet”) component in the mixwet_only (
bool
) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removedp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adds reverberation to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Speed(factor=2.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(factor=2.0, p=1.0)
- Parameters
factor (
float
) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factorp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Changes the speed of the audio, affecting pitch as well
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.Tempo(factor=2.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(factor=2.0, p=1.0)
- Parameters
factor (
float
) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitchp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adjusts the tempo of the audio by a given factor
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.TimeStretch(rate=1.5, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(rate=1.5, p=1.0)
- Parameters
rate (
float
) – the time stretch factorp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Time-stretches the audio by a fixed rate
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.transforms.ToMono(p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- apply_transform(audio, sample_rate, metadata=None)
Converts the audio from stereo to mono by averaging samples across channels
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
Module contents
- class augly.audio.AddBackgroundNoise(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(background_audio=None, snr_level_db=10.0, seed=None, p=1.0)
- Parameters
background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noisesnr_level_db (
float
) – signal-to-noise ratio in dBseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that these results remain reproduciblep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Mixes in a background sound into the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.ApplyLambda(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(aug_function=<function ApplyLambda.<lambda>>, p=1.0)
- Parameters
aug_function (
Callable
[...
,Tuple
[ndarray
,int
]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Apply a user-defined lambda to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.ChangeVolume(volume_db=0.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(volume_db=0.0, p=1.0)
- Parameters
volume_db (
float
) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audiop (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Changes the volume of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Clicks(seconds_between_clicks=0.5, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(seconds_between_clicks=0.5, p=1.0)
- Parameters
seconds_between_clicks (
float
) – the amount of time between each click that will be added to the audio, in secondsp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adds clicks to the audio at a given regular interval
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Clip(offset_factor=0.0, duration_factor=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(offset_factor=0.0, duration_factor=1.0, p=1.0)
- Parameters
offset_factor (
float
) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)duration_factor (
float
) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Clips the audio using the specified offset and duration factors
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Compose(transforms, p=1.0)
Bases:
augly.audio.composition.BaseComposition
- __call__(audio, sample_rate, metadata=None)
Applies the list of transforms in order to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Harmonic(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
- Parameters
kernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Extracts the harmonic part of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.HighPassFilter(cutoff_hz=3000.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(cutoff_hz=3000.0, p=1.0)
- Parameters
cutoff_hz (
float
) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this pointp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.InsertInBackground(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(offset_factor=0.0, background_audio=None, seed=None, p=1.0)
- Parameters
offset_factor (
float
) – start point of the crop relative to the background duration (this parameter is multiplied by the background duration)background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noiseseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that these results remain reproduciblep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Non-overlapping insert audio in a background audio.
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.InvertChannels(p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- apply_transform(audio, sample_rate, metadata=None)
Inverts the channels of the audio.
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Loop(n=1, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(n=1, p=1.0)
- Parameters
n (
int
) – the number of times the audio will be loopedp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Loops the audio ‘n’ times
- Parameters
audio (
ndarray
) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.LowPassFilter(cutoff_hz=500.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(cutoff_hz=500.0, p=1.0)
- Parameters
cutoff_hz (
float
) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this pointp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Normalize(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(norm=inf, axis=0, threshold=None, fill=None, p=1.0)
- Parameters
norm (
Optional
[float
]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performedaxis (
int
) – axis along which to compute the normthreshold (
Optional
[float
]) – if provided, only the columns (or rows) with norm of at least threshold are normalizedfill (
Optional
[bool
]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1p (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.OneOf(transforms, p=1.0)
Bases:
augly.audio.composition.BaseComposition
- __call__(audio, sample_rate, metadata=None)
Applies one of the transforms to the audio (with probability p)
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- __init__(transforms, p=1.0)
- Parameters
transforms (
List
[BaseTransform
]) – a list of transforms to select from; one of which will be chosen to be applied to the audiop (
float
) – the probability of the transform being applied; default value is 1.0
- class augly.audio.PeakingEqualizer(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(center_hz=500.0, q=1.0, gain_db=- 3.0, p=1.0)
- Parameters
center_hz (
float
) – point in the frequency spectrum at which EQ is appliedq (
float
) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidthgain_db (
float
) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gainp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Percussive(kernel_size=31, power=2.0, margin=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(kernel_size=31, power=2.0, margin=1.0, p=1.0)
- Parameters
kernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Extracts the percussive part of the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.PitchShift(n_steps=1.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(n_steps=1.0, p=1.0)
- Parameters
n_steps (
float
) – each step is equal to one semitonep (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Shifts the pitch of the audio by n_steps
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Reverb(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, p=1.0)
- Parameters
reverberance (
float
) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acousticshf_damping (
float
) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequenciesroom_scale (
float
) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small roomstereo_depth (
float
) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channelspre_delay (
float
) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tailwet_gain (
float
) – (db) applies volume adjustment to the reverberation (“wet”) component in the mixwet_only (
bool
) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removedp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adds reverberation to the audio
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Speed(factor=2.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(factor=2.0, p=1.0)
- Parameters
factor (
float
) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factorp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Changes the speed of the audio, affecting pitch as well
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.Tempo(factor=2.0, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(factor=2.0, p=1.0)
- Parameters
factor (
float
) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitchp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Adjusts the tempo of the audio by a given factor
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.TimeStretch(rate=1.5, p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- __init__(rate=1.5, p=1.0)
- Parameters
rate (
float
) – the time stretch factorp (
float
) – the probability of the transform being applied; default value is 1.0
- apply_transform(audio, sample_rate, metadata=None)
Time-stretches the audio by a fixed rate
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- class augly.audio.ToMono(p=1.0)
Bases:
augly.audio.transforms.BaseTransform
- apply_transform(audio, sample_rate, metadata=None)
Converts the audio from stereo to mono by averaging samples across channels
- Parameters
audio (
ndarray
) – the audio array to be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiometadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.add_background_noise(audio, sample_rate=44100, background_audio=None, snr_level_db=10.0, seed=None, output_path=None, metadata=None)
Mixes in a background sound into the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiobackground_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noisesnr_level_db (
float
) – signal-to-noise ratio in dBseed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that the results remain reproducibleoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.add_background_noise_intensity(snr_level_db=10.0, **kwargs)
- Return type
float
- augly.audio.apply_lambda(audio, sample_rate=44100, aug_function=<function <lambda>>, output_path=None, metadata=None, **kwargs)
Apply a user-defined lambda to the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioaug_function (
Callable
[...
,Tuple
[ndarray
,int
]]) – the augmentation function to be applied onto the audio (should expect the audio np.ndarray & sample rate int as input, and return the transformed audio & sample rate)output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended**kwargs –
the input attributes to be passed into aug_function
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.apply_lambda_intensity(aug_function, **kwargs)
- Return type
float
- augly.audio.change_volume(audio, sample_rate=44100, volume_db=0.0, output_path=None, metadata=None)
Changes the volume of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiovolume_db (
float
) – the decibel amount by which to either increase (positive value) or decrease (negative value) the volume of the audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.change_volume_intensity(volume_db=0.0, **kwargs)
- Return type
float
- augly.audio.clicks(audio, sample_rate=44100, seconds_between_clicks=0.5, snr_level_db=1.0, output_path=None, metadata=None)
Adds clicks to the audio at a given regular interval
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioseconds_between_clicks (
float
) – the amount of time between each click that will be added to the audio, in secondssnr_level_db (
float
) – signal-to-noise ratio in dBoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.clicks_intensity(seconds_between_clicks=0.5, snr_level_db=1.0, **kwargs)
- Return type
float
- augly.audio.clip(audio, sample_rate=44100, offset_factor=0.0, duration_factor=1.0, output_path=None, metadata=None)
Clips the audio using the specified offset and duration factors
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooffset_factor (
float
) – start point of the crop relative to the audio duration (this parameter is multiplied by the audio duration)duration_factor (
float
) – the length of the crop relative to the audio duration (this parameter is multiplied by the audio duration)output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.clip_intensity(duration_factor=1.0, **kwargs)
- Return type
float
- augly.audio.harmonic(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)
Extracts the harmonic part of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiokernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.harmonic_intensity(**kwargs)
- Return type
float
- augly.audio.high_pass_filter(audio, sample_rate=44100, cutoff_hz=3000.0, output_path=None, metadata=None)
Allows audio signals with a frequency higher than the given cutoff to pass through and attenuates signals with frequencies lower than the cutoff frequency
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocutoff_hz (
float
) – frequency (in Hz) where signals with lower frequencies will begin to be reduced by 6dB per octave (doubling in frequency) below this pointoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.high_pass_filter_intensity(cutoff_hz=3000.0, **kwargs)
- Return type
float
- augly.audio.insert_in_background(audio, sample_rate=44100, offset_factor=0.0, background_audio=None, seed=None, output_path=None, metadata=None)
Inserts audio into a background clip in a non-overlapping manner.
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooffset_factor (
float
) – insert point relative to the background duration (this parameter is multiplied by the background duration)background_audio (
Union
[str
,ndarray
,None
]) – the path to the background audio or a variable of type np.ndarray containing the background audio. If set to None, the background audio will be white noise, with the same duration as the audio.seed (
Union
[int
,Any
,None
]) – a NumPy random generator (or seed) such that the results remain reproducibleoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.insert_in_background_intensity(metadata, **kwargs)
- Return type
float
- augly.audio.invert_channels(audio, sample_rate=44100, output_path=None, metadata=None)
Inverts channels of the audio. If the audio has only one channel, no change is applied. Otherwise, it inverts the order of the channels, eg for 4 channels, it returns channels in order [3, 2, 1, 0].
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.invert_channels_intensity(metadata, **kwargs)
- Return type
float
- augly.audio.loop(audio, sample_rate=44100, n=1, output_path=None, metadata=None)
Loops the audio ‘n’ times
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audion (
int
) – the number of times the audio will be loopedoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.loop_intensity(n=1, **kwargs)
- Return type
float
- augly.audio.low_pass_filter(audio, sample_rate=44100, cutoff_hz=500.0, output_path=None, metadata=None)
Allows audio signals with a frequency lower than the given cutoff to pass through and attenuates signals with frequencies higher than the cutoff frequency
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocutoff_hz (
float
) – frequency (in Hz) where signals with higher frequencies will begin to be reduced by 6dB per octave (doubling in frequency) above this pointoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.low_pass_filter_intensity(cutoff_hz=500.0, **kwargs)
- Return type
float
- augly.audio.normalize(audio, sample_rate=44100, norm=inf, axis=0, threshold=None, fill=None, output_path=None, metadata=None)
Normalizes the audio array along the chosen axis (norm(audio, axis=axis) == 1)
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audionorm (
Optional
[float
]) – the type of norm to compute: - np.inf: maximum absolute value - -np.inf: minimum absolute value - 0: number of non-zeros (the support) - float: corresponding l_p norm - None: no normalization is performedaxis (
int
) – axis along which to compute the normthreshold (
Optional
[float
]) – if provided, only the columns (or rows) with norm of at least threshold are normalizedfill (
Optional
[bool
]) – if None, then columns (or rows) with norm below threshold are left as is. If False, then columns (rows) with norm below threshold are set to 0. If True, then columns (rows) with norm below threshold are filled uniformly such that the corresponding norm is 1output_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.normalize_intensity(norm=inf, **kwargs)
- Return type
float
- augly.audio.peaking_equalizer(audio, sample_rate=44100, center_hz=500.0, q=1.0, gain_db=- 3.0, output_path=None, metadata=None)
Applies a two-pole peaking equalization filter. The signal-level at and around center_hz can be increased or decreased, while all other frequencies are unchanged
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiocenter_hz (
float
) – point in the frequency spectrum at which EQ is appliedq (
float
) – ratio of center frequency to bandwidth; bandwidth is inversely proportional to Q, meaning that as you raise Q, you narrow the bandwidthgain_db (
float
) – amount of gain (boost) or reduction (cut) that is applied at a given frequency. Beware of clipping when using positive gainoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.peaking_equalizer_intensity(q, gain_db, **kwargs)
- Return type
float
- augly.audio.percussive(audio, sample_rate=44100, kernel_size=31, power=2.0, margin=1.0, output_path=None, metadata=None)
Extracts the percussive part of the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiokernel_size (
int
) – kernel size for the median filterspower (
float
) – exponent for the Wiener filter when constructing soft mask matricesmargin (
float
) – margin size for the masksoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.percussive_intensity(**kwargs)
- Return type
float
- augly.audio.pitch_shift(audio, sample_rate=44100, n_steps=1.0, output_path=None, metadata=None)
Shifts the pitch of the audio by n_steps
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audion_steps (
float
) – each step is equal to one semitoneoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.pitch_shift_intensity(n_steps=2.0, **kwargs)
- Return type
float
- augly.audio.reverb(audio, sample_rate=44100, reverberance=50.0, hf_damping=50.0, room_scale=100.0, stereo_depth=100.0, pre_delay=0.0, wet_gain=0.0, wet_only=False, output_path=None, metadata=None)
Adds reverberation to the audio
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audioreverberance (
float
) – (%) sets the length of the reverberation tail. This determines how long the reverberation continues for after the original sound being reverbed comes to an end, and so simulates the “liveliness” of the room acousticshf_damping (
float
) – (%) increasing the damping produces a more “muted” effect. The reverberation does not build up as much, and the high frequencies decay faster than the low frequenciesroom_scale (
float
) – (%) sets the size of the simulated room. A high value will simulate the reverberation effect of a large room and a low value will simulate the effect of a small roomstereo_depth (
float
) – (%) sets the apparent “width” of the reverb effect for stereo tracks only. Increasing this value applies more variation between left and right channels, creating a more “spacious” effect. When set at zero, the effect is applied independently to left and right channelspre_delay (
float
) – (ms) delays the onset of the reverberation for the set time after the start of the original input. This also delays the onset of the reverb tailwet_gain (
float
) – (db) applies volume adjustment to the reverberation (“wet”) component in the mixwet_only (
bool
) – only the wet signal (added reverberation) will be in the resulting output, and the original audio will be removedoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.reverb_intensity(reverberance=50.0, wet_only=False, room_scale=100.0, **kwargs)
- Return type
float
- augly.audio.speed(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)
Changes the speed of the audio, affecting pitch as well
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiofactor (
float
) – the speed factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factoroutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.speed_intensity(factor=2.0, **kwargs)
- Return type
float
- augly.audio.tempo(audio, sample_rate=44100, factor=2.0, output_path=None, metadata=None)
Adjusts the tempo of the audio by a given factor
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiofactor (
float
) – the tempo factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factor, without affecting the pitchoutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.tempo_intensity(factor=2.0, **kwargs)
- Return type
float
- augly.audio.time_stretch(audio, sample_rate=44100, rate=1.5, output_path=None, metadata=None)
Time-stretches the audio by a fixed rate
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiorate (
float
) – the time stretch factor. If rate > 1 the audio will be sped up by that factor; if rate < 1 the audio will be slowed down by that factoroutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.time_stretch_intensity(rate=1.5, **kwargs)
- Return type
float
- augly.audio.to_mono(audio, sample_rate=44100, output_path=None, metadata=None)
Converts the audio from stereo to mono by averaging samples across channels
- Parameters
audio (
Union
[str
,ndarray
]) – the path to the audio or a variable of type np.ndarray that will be augmentedsample_rate (
int
) – the audio sample rate of the inputted audiooutput_path (
Optional
[str
]) – the path in which the resulting audio will be stored. If None, the resulting np.ndarray will still be returnedmetadata (
Optional
[List
[Dict
[str
,Any
]]]) – if set to be a list, metadata about the function execution including its name, the source & dest duration, sample rates, etc. will be appended to the inputted list. If set to None, no metadata will be appended
- Return type
Tuple
[ndarray
,int
]- Returns
the augmented audio array and sample rate
- augly.audio.to_mono_intensity(metadata, **kwargs)
- Return type
float