python – Why is "1000000000000000 in range (1000000000000001)" so fast?

Question:

As I understand it, the range() function, which is actually an object type in Python 3, generates its content on the fly, just like a generator.

In this case, I expected the following line to take an inordinate amount of time, because a quadrillion values ​​would have to be generated to determine if 1 quadrillion is in the range:

1000000000000000 in range(1000000000000001)

Moreover, it seems that no matter how many zeros I add, the computation more or less takes the same amount of time (mostly instant).

I've also tried things like this, but the calculation is still almost instantaneous:

1000000000000000000000 in range(0,1000000000000000000001,10) # С шагом в десять

If I try to implement my own range function, the result is not so nice !!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What's going on under the hood of the range() object that makes it so fast?

translation of the question why is 1000000000000000 in range(1000000000000001) so fast in python 3 from @RicksupportsMonica

Answer:

In Python 3, the range() object does not immediately produce numbers. It is a smart sequence object that produces numbers on demand. All it contains is the start, stop, and step values, and then as you iterate over the object, the next integer is calculated in each iteration.

Object also implements object.__contains__ and calculates if your number is part of its range. Calculation is an operation with (almost) constant time *. It is not necessary to look at all possible integers in the range.

From the documentation on the range object

The advantage of the range type over a regular list or tuple is that a range object will always occupy the same (small) amount of memory, regardless of the size of the range it represents (since it only stores the start , stop and step values, evaluating separate elements and subranges as needed).

Your range object should look like this:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

There are still some things missing here that real range supports (like .index or .count , hashing, equality checking, or slicing), but they should give you an idea.

I've also simplified the __contains__ implementation to focus on integer tests only; if you give a real range object a non-integer value (including subclasses of int ), a slow scan is triggered to see if there is a match, just as if you were using a containment test for a list of all contained values. This was done to continue to support other numeric types that simply support equality with integers, but should not also support integer arithmetic. See the original Python issue that implemented a containment test.


* Nearly constant time because Python integers are unbounded and so the math operations also grow over time as N grows, making this operation O(log N) . Since this is all done in optimized C code, and Python stores integer values ​​in 30-bit chunks, you run out of memory before you notice any performance impact due to the size of the integers involved.

translation of answer about participant @MartijnPieters

Scroll to Top