Random Data
Generating random numbers
This is a random data tutorial.
NumPy provides a well-implemented platform for generating random numbers. There are many different functions within the module.
We explore the generation of random floats and random integers.
import numpy as np # Generate a random floating-point number between 0 and 1 random_float = np.random.rand() print(random_float) # Generate an array of random floating-point numbers between 0 and 1 random_float = np.random.rand(10) print(random_float) # Generate an array of random floating-point numbers between 0 and 1 of shape (10, 2) random_float_array = np.random.rand(10, 2) print(random_float_array) # Generate a random integer between 0 and 9 random_integer = np.random.randint(10) print(random_integer) # Generate an array of random integers between 0 and 9 random_integer = np.random.randint(10, size=10) print(random_integer) # Generate an array of random integers between 0 and 9 of shape (3, 3) random_array = np.random.randint(10, size=(3, 3)) print(random_array)
Seeding
In NumPy, seed implements a random state instance. In other words, it makes the random generation of numbers predictable.
This means that, we can generate the same random data every time. Seeding the random number generator ensures reproducibility, meaning that running the code with the same seed will produce the same sequence of random numbers. Let’s explore the process with an example.
Generation without seeding.
import numpy as np random_float = np.random.rand(3) print(random_float) # Output: [0.01290456 0.03397538 0.55942502] random_float = np.random.rand(3) print(random_float) # Output: [0.91551117 0.3495945 0.24628688]
The outcomes are different.
Generation with seeding.
import numpy as np # Set the seed value np.random.seed(42) random_float = np.random.rand(3) print(random_float) # Output: [0.37454012 0.95071431 0.73199394] # Set the seed value np.random.seed(42) random_float = np.random.rand(3) print(random_float) # Output: [0.37454012 0.95071431 0.73199394]
The outcomes are the same.
Random sampling
Random sampling refers to the process of selecting a subset of items or data points from a larger population. It occurs in a way that each item in the population has an equal chance of being selected. It is a fundamental concept in statistics and probability theory.
NumPy has several functions for random sampling. We explore random sample, random choice, and random shuffle.
Let’s dive into the coding part.
- Sample() – outputs random datapoints between 0 and 1.
import numpy as np # outputs 1 random datapoint random_sample = np.random.sample() print(random_sample) # Output: 0.7434698063973832 # outputs multiple random datapoints random_sample = np.random.sample(3) print(random_sample) # Output: [0.89479096 0.09273952 0.6416487]
- Choice() – outputs random datapoints from a set of items.
import numpy as np arr = np.array([1, 2, 3, 4, 5]) # outputs 1 random datapoint random_choice = np.random.choice(arr) print(random_choice) # Output: 2 # 'size' increases the number of datapoints from the set of items. random_choice = np.random.choice(arr, size=3) print(random_choice) # Output: [3 1 5] # 'p' sets the weight/probability of each datapoint. random_samples = np.random.choice(arr, size=10, p=[0.1, 0.2, 0.2, 0.4, 0.1]) print(random_samples) # Output: [1 4 4 2 2 4 4 1 1 2]
- Shuffle() – outputs the original set of items randomly shuffled.
import numpy as np arr = np.array([1, 2, 3, 4, 5]) # the original datapoitns are replaced random_shuffle = np.random.shuffle(arr) print(arr) # Output: [1 5 2 3 4]
This is an original random data educational material created by aicorr.com.
Next: Array Manipulation