Asked  7 Months ago    Answers:  5   Viewed   35 times

How safe is it to use UUID to uniquely identify something (I'm using it for files uploaded to the server)? As I understand it, it is based off random numbers. However, it seems to me that given enough time, it would eventually repeat it self, just by pure chance. Is there a better system or a pattern of some type to alleviate this issue?

 Answers

70

Very safe:

the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.00000000006 (6 × 10?11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.

Caveat:

However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.

Source: The Random UUID probability of duplicates section of the Wikipedia article on Universally unique identifiers (link leads to a revision from December 2016 before editing reworked the section).

Also see the current section on the same subject on the same Universally unique identifier article, Collisions.

Tuesday, June 1, 2021
 
ajreal
answered 7 Months ago
12

The uuid module provides immutable UUID objects (the UUID class) and the functions uuid1(), uuid3(), uuid4(), uuid5() for generating version 1, 3, 4, and 5 UUIDs as specified in RFC 4122.

If all you want is a unique ID, you should probably call uuid1() or uuid4(). Note that uuid1() may compromise privacy since it creates a UUID containing the computer’s network address. uuid4() creates a random UUID.

Docs:

  • Python 2
  • Python 3

Examples (for both Python 2 and 3):

>>> import uuid

>>> # make a random UUID
>>> uuid.uuid4()
UUID('bd65600d-8669-4903-8a14-af88203add38')

>>> # Convert a UUID to a string of hex digits in standard form
>>> str(uuid.uuid4())
'f50ec0b7-f960-400d-91f0-c42a6d44e3d0'

>>> # Convert a UUID to a 32-character hexadecimal string
>>> uuid.uuid4().hex
'9fe2c4e93f654fdbb24c02b15259716c'
Tuesday, June 1, 2021
 
mario
answered 7 Months ago
56

I assume that UUIDs get generated equally through the whole range of the 128 Bit range of the UUID.

First off, your assumption may be incorrect, depending on the UUID type (1, 2, 3, or 4). From the Java UUID docs:

There exist different variants of these global identifiers. The methods of this class are for manipulating the Leach-Salz variant, although the constructors allow the creation of any variant of UUID (described below).

The layout of a variant 2 (Leach-Salz) UUID is as follows: The most significant long consists of the following unsigned fields:

0xFFFFFFFF00000000 time_low 
0x00000000FFFF0000 time_mid 
0x000000000000F000 version 
0x0000000000000FFF time_hi  

The least significant long consists of the following unsigned fields:

0xC000000000000000 variant 
0x3FFF000000000000 clock_seq 
0x0000FFFFFFFFFFFF node  

The variant field contains a value which identifies the layout of the UUID. The bit layout described above is valid only for a UUID with a variant value of 2, which indicates the Leach-Salz variant.

The version field holds a value that describes the type of this UUID. There are four different basic types of UUIDs: time-based, DCE security, name-based, and randomly generated UUIDs. These types have a version value of 1, 2, 3 and 4, respectively.

The best way to do what you're doing is to generate a random string with code that looks something like this (source):

public class RandomString {

          public static String randomstring(int lo, int hi){
                  int n = rand(lo, hi);
                  byte b[] = new byte[n];
                  for (int i = 0; i < n; i++)
                          b[i] = (byte)rand('a', 'z');
                  return new String(b, 0);
          }

          private static int rand(int lo, int hi){
                      java.util.Random rn = new java.util.Random();
                  int n = hi - lo + 1;
                  int i = rn.nextInt(n);
                  if (i < 0)
                          i = -i;
                  return lo + i;
          }

          public static String randomstring(){
                  return randomstring(5, 25);
          }

        /**
         * @param args
         */
        public static void main(String[] args) {
                System.out.println(randomstring());

        }

}

If you're incredibly worried about collisions or something, I suggest you base64 encode your UUID which should cut down on its size.

Moral of the story: don't rely on individual parts of UUIDs as they are holistically designed. If you do need to rely on individual parts of a UUID, make sure you familiarize yourself with the particular UUID type and implementation.

Tuesday, August 3, 2021
 
Packy
answered 4 Months ago
52

I would rely on built in functionality:

ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
bb.putLong(uuid.getMostSignificantBits());
bb.putLong(uuid.getLeastSignificantBits());
return bb.array();

or something like,

ByteArrayOutputStream ba = new ByteArrayOutputStream(16);
DataOutputStream da = new DataOutputStream(ba);
da.writeLong(uuid.getMostSignificantBits());
da.writeLong(uuid.getLeastSignificantBits());
return ba.toByteArray();

(Note, untested code!)

Tuesday, August 3, 2021
 
Akarun
answered 4 Months ago
14

As others have pointed out, uniqueness is not guaranteed. However you are probably seeing repeated numbers because you are using srand() and rand() incorrectly.

srand() is used to seed the random number generator. that means a series of calls to rand() after a call to srand will produce a particular series of values. If you call srand() with the same value then rand() will produce the same series of values (for a given implementation, there's no guarantee between different implementations)

int main() {
    srand(100);
    for(int i = 0; i<5; ++i)
        printf("%dn",rand());

    printf("nresetnn");

    srand(100);
    for(int i = 0; i<5; ++i)
        printf("%dn",rand());

}

for me this produces:

365
1216
5415
16704
24504

reset

365
1216
5415
16704
24504

time() and clock() return the time, but if you call them quickly enough then the value returned will be the same, so you will get the same series of values out of rand().

Additionally rand() is generally not a very good random number generator and using it usually means you have to transform the series of numbers to the distribution you actually need. You should find a different source of randomness and either learn the proper ways to produce the distribution you want or use a library that can do it for you. (for example one common method of producing a 'random' number between 0 and N is to do rand() % N but this is not really the best method.

C++ provides a much better random number library in <random>. It provides different PRNG algorithms, such as linear_congruential, mersennne_twister, and possibly even a cryptographically secure RNG (depending on the implementation). It also provides objects for producing a variety of distributions, such as uniform_int_distribution which should avoid the mistake make in rand() % N.

Saturday, August 28, 2021
 
BartmanEH
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share