Quick way to compare two python lists

***Please read the entire post if you are looking for a fast way to compare two lists***

If you need to compare two lists, you can use a list comprehension such as the following:

>>> a=[1,3,5]
>>> b=[1,2,3,4,5,6]
>>> print [c for c in b if c not in a]
[2, 4, 6]
>>>
    

…which is functionally equivalent to…

>>> for c in b:
...   if c not in a:
...     print c
...
2
4
6
>>>

List comprehensions are generally thought to be faster, even though I don’t think they are as readable. As always, test for performance and code for maintainability. If the performance difference is negligible, use the more readable version.

For example, in this case, the performance difference is actually negligible. We use the cProfile module to time our code when comparing much larger lists at 100,000 and 50,000 elements, respectively.

>>> import cProfile
>>> a=[]
>>> b=[]
>>> for i in range(100000):
...   a.append(i)
...
>>> for i in range(0,100000,2):
...   b.append(i)
...
>>> cProfile.run('for c in a:\n  if c not in b:\n    pass')
         2 function calls in 86.338 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   86.338   86.338   86.338   86.338 :1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>> cProfile.run('[c for c in a if c not in b]')
         2 function calls in 86.508 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   86.508   86.508   86.508   86.508 :1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>>

In this case, the more readable version is actually faster.

Now is why I asked you to read the entire post before grabbing a solution and using it. I realize there are faster algorithms for comparing lists, such as…

>>> cProfile.run('set(a) ^ set(b)')
         2 function calls in 0.020 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.020    0.020    0.020    0.020 :1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>>

…so, in this case, it is a no brainer, comparing two lists using sets is far faster than either of the two methods mentioned earlier, so use sets where applicable.

However, the axiom holds true. Test for performance, code for maintainability.

1 comment for “Quick way to compare two python lists

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.