How can I quickly generate a (large) list of random numbers given a list of seeds as input?

Показать описание

Below, you can find the text related to the question/problem. In the video, the question will be presented first, followed by the answers. If the video moves too fast, feel free to pause and review the answers. If you need more detailed information, you can find the necessary sources and links at the bottom of this description. I hope this video has been helpful, and even if it doesn't directly solve your problem, it will guide you to the source of the solution. I'd appreciate it if you like the video and subscribe to my channel!How can I quickly generate a (large) list of random numbers given a list of seeds as input?

I need to make a function that takes in an array of integers and returns a list of random numbers with the same length as the array. However, there is the restriction that the output random number corresponding to a given entry in the input array should always be the same based upon that entry.
For example, if input_a given below returns the following:
input_a
random_array(input_a)
[0.51689016 0.62747792 0.16585436 0.63928942 0.30514275]

random_array(input_a)
[0.51689016 0.62747792 0.16585436 0.63928942 0.30514275]

Then input_b given below should return the following:
input_b
random_array(input_b)
[0.16585436 0.62747792 0.16585436]

random_array(input_b)
[0.16585436 0.62747792 0.16585436]

Note that the output numbers that correspond to an input of 3 are all the same, and likewise for those that correspond to an input of 2. In effect, the values of the input array are used as seeds for the output array.
The main issue is that the input arrays may be very big, so I'd need something that can do the operation efficiently.
My naive implementation is as follows, making a list of random number generators with the input array as seeds.
import numpy as np

def random_array(input_array):

print(random_array(input_a)) # [0.5118216247002567, 0.2616121342493164, 0.08564916714362436]
print(random_array(input_b)) # [0.08564916714362436, 0.2616121342493164, 0.08564916714362436]

import numpy as np

def random_array(input_array):

print(random_array(input_a)) # [0.5118216247002567, 0.2616121342493164, 0.08564916714362436]
print(random_array(input_b)) # [0.08564916714362436, 0.2616121342493164, 0.08564916714362436]

It works as intended, but it's terribly slow for what I need it to do - unsurprising, given that it's doing a loop over array entries. This implementation takes 5 seconds or so to run on an input array of length 100,000, and I'll need to do this for much larger inputs than that.
How can I do this but more efficiently?
What I'm specifically trying to do is take a set of particles in the output of a simulation (totalling about 200 billion particles) and make a smaller set of particles randomly sampled ("downsampled") from the large set (the downsampled output should have about 1% of the total particles). The particles are each labelled with an ID, which is nonnegative but can be very large. The simulation output is split into a dozen or so "snapshots", each of which store each particle's position (among other things); each snapshot is split into "subsnapshots", individual files that store the IDs/positions/etc. of about 200 million particles eaSource of the question:

Question and source license information: