Rewriting Python Benchmarks
To understand the performance of RPython, we are trying to rewrite some Python benchmarks with RPython. In this section, we use several examples (fibonacci number and binary tree) to illustrate tips to rewrite Python code with RPython. As you will see, rewriting Python with RPython doesn’t require much effort, and we will gain a lot of performance improvements.
Fibonacci Number
The following code is to calculate fibonacci number recursively.
def fib(n):
if n == 0 or n == 1:
return 1
return fib(n - 1) + fib(n - 2)
def main():
fib(40)
if __name__ == '__main__':
main()
To run this benchmark, we should add a target function to tell RPython the entry
point. Also, we change to get n
from arguments instead of hard-coding a
constant to avoid optimization. The following code can be compiled by RPython.
1def fib(n):
2 if n == 0 or n == 1:
3 return 1
4 return fib(n - 1) + fib(n - 2)
5
6# n should not be a constant, otherwise RPython will optimize it
7def main(n):
8 ret = fib(n)
9 # output the result, otherwise RPython will optimize out the above call
10 # (the `transform_dead_op_vars` pass)
11 print ret
12
13def target(*args): # we need a target function
14 return entry_point, None
15
16def entry_point(argv):
17 main(int(argv[1])) # get n from the arguments
18 return 0
Note
There are some passes to simplify flow graph
for optimization. These passes include
transform_dead_op_vars
, eliminate_empty_blocks
, remove_assertion_errors
,
remove_identical_vars_SSA
, constfold_exitswitch
, remove_trivial_links
,
SSA_to_SSI
, coalesce_bool
, transform_ovfcheck
, simplify_exceptions
,
transform_xxxitem
, and remove_dead_exceptions
.
Here, we simply use the GNU time
command to measure the execution time and
maximum resident set size during the process’s lifetime.
$ /usr/bin/time -f "%C\t%U\t%M" python fib.py > /dev/null
python fib.py 18.21 5956
$ /usr/bin/time -f "%C\t%U\t%M" pypy fib.py > /dev/null
pypy fib.py 4.77 112960
$ rpython fib_rpy.py
$ /usr/bin/time -f "%C\t%U\t%M" ./fib-c 40 > /dev/null
./fib_rpy-c 40 0.40 1704
Binary Tree
Rewriting the binary tree benchmark needs a little more efforts. In the Python
version, it uses a tuple of tuples to represent nodes the the tree. Since
RPython doesn’t allow variable length tuples and mixed types tuples, we write a
new Node
class to represent nodes in binary trees. In addition, we
rewrite the sum
built-in function. The following code only shows diff
of
original Python code and modified RPython version.
--- /Users/mssun/git/mssun/rpython-by-example/code/benchmarks/binary-tree.py
+++ /Users/mssun/git/mssun/rpython-by-example/code/benchmarks/binary-tree_rpy.py
@@ -5,18 +5,29 @@
# contributed by Antoine Pitrou
# modified by Dominique Wahli and Daniel Nanz
+import math
+
+class Node(object):
+ """
+ Represent nodes in a binary tree.
+ """
+ def __init__(self, value, left, right):
+ self.value = value
+ self.left = left # left node
+ self.right = right # right node
+
def make_tree(i, d):
if d > 0:
i2 = i + i
d -= 1
- return (i, make_tree(i2 - 1, d), make_tree(i2, d))
- return (i, None, None)
+ return Node(i, make_tree(i2 - 1, d), make_tree(i2, d))
+ return Node(i, None, None) # use Node instead of tuple
def check_tree(node):
- (i, l, r) = node
+ (i, l, r) = (node.value, node.left, node.right)
if l is None:
return i
else:
@@ -54,15 +65,24 @@
mmd = max_depth + min_depth
for d in range(min_depth, stretch_depth, 2):
- i = 2 ** (mmd - d)
+ i = int(math.pow(2, (mmd - d))) # use math.pow instead of built-in pow
cs = 0
- for argchunk in get_argchunks(i,d):
- cs += sum(map(make_check, argchunk))
- print '%d\t trees of depth %d\t check: %d' % (i * 2, d, cs)
+ for argchunk in get_argchunks(i,d): # write the sum built-in function
+ s = 0
+ for a in argchunk:
+ s += make_check(a)
+ cs += s
+ print '%d\t trees of depth %d\t check: %d' % (i*2, d, cs)
print 'long lived tree of depth %d\t check: %d' % (
max_depth, check_tree(long_lived_tree))
-
if __name__ == '__main__':
main(17)
+
+def entry_point(argv):
+ main(int(argv[1])) # get argument from input to avoid optimization
+ return 0
+
+# add a target
+def target(*args): return entry_point, None
Let us measure execution times of the binary tree benchmark with Python and PyPy, and RPython rewrite as well.
$ /usr/bin/time -f "%C\t%U\t%M" python binary-tree.py > /dev/null
python binary-tree.py 10.45 60432
$ /usr/bin/time -f "%C\t%U\t%M" pypy binary-tree.py > /dev/null
pypy binary-tree.py 1.60 187256
$ /usr/bin/time -f "%C\t%U\t%M" ./binary-tree_rpy-c 17 > /dev/null
./binary-tree_rpy-c 17 0.38 68312
Spectral Norm
For the spectral norm benchmark, we made the following changes:
rewrite map function
rewrite generators
use list instead of tuple
use loop to rewrite izip function
--- /Users/mssun/git/mssun/rpython-by-example/code/benchmarks/spectral-norm.py
+++ /Users/mssun/git/mssun/rpython-by-example/code/benchmarks/spectral-norm_rpy.py
@@ -7,18 +7,27 @@
# Dirtily sped up by Simon Descarpentries
# Concurrency by Jason Stitt
-from itertools import izip
-
def eval_A (i, j):
return 1.0 / ((i + j) * (i + j + 1) / 2 + i + 1)
def eval_A_times_u (u):
- args = ((i,u) for i in xrange(len(u)))
- return map(part_A_times_u, args)
+ args = []
+ for i in range(len(u)):
+ args.append((i, u))
+ ret = []
+ for i in args:
+ ret.append(part_A_times_u(i))
+ return ret
def eval_At_times_u (u):
- args = ((i,u) for i in xrange(len(u)))
- return map(part_At_times_u, args)
+ args = []
+ for i in range(len(u)):
+ args.append((i, u))
+ ret = []
+ for i in args:
+ tmp = part_At_times_u(i)
+ ret.append(tmp)
+ return ret
def eval_AtA_times_u (u):
return eval_At_times_u (eval_A_times_u (u))
@@ -39,6 +48,7 @@
def main(n):
for i in range(n):
+ v = []
u = [1] * DEFAULT_N
for dummy in xrange (10):
@@ -47,9 +57,20 @@
vBv = vv = 0
- for ue, ve in izip (u, v):
- vBv += ue * ve
- vv += ve * ve
+ l = 0
+ if len(u) > len(v): l = len(u)
+ else: l = len(v)
+ for i in range(l):
+ vBv += u[i] * v[i]
+ vv += v[i] * v[i]
+ print vBv, vv
if __name__ == "__main__":
main(400)
+
+def entry_point(argv):
+ main(int(argv[1]))
+ return 0
+
+def target(*args):
+ return entry_point, None
$ /usr/bin/time -f "%C\t%U\t%M" python spectral-norm.py > /dev/null
python spectral-norm.py 34.83 6140
$ /usr/bin/time -f "%C\t%U\t%M" pypy spectral-norm.py > /dev/null
pypy spectral-norm.py 1.20 81016
$ /usr/bin/time -f "%C\t%U\t%M" ./spectral-norm_rpy-c 17 > /dev/null
./spectral-norm_rpy-c 400 0.26 7892
Benchmark Results
In addition to “fibonacci number” and “binary tree”, we also rewrite some other benchmarks (source code). The following table summarize the benchmark results.
Python 2.7.15 |
PyPy2 v6.0.0 |
RPython (PyPy2 v6.0.0) |
|||||
---|---|---|---|---|---|---|---|
Time (s) |
Time (s) |
Speedup |
Time (s) |
Speedup |
Size (KB) |
||
fib.py |
18.21 |
4.77 |
3.82 |
0.40 |
45.53 |
282 |
|
binary-tree.py |
10.45 |
1.60 |
6.53 |
0.38 |
27.50 |
301 |
|
float.py |
11.47 |
1.51 |
7.60 |
0.57 |
20.12 |
277 |
|
fannkuch.py |
3.91 |
0.54 |
7.24 |
0.32 |
12.22 |
282 |
|
buffer.py |
1.23 |
0.64 |
1.92 |
0.52 |
2.37 |
284 |
|
spectral-norm.py |
34.83 |
1.20 |
29.03 |
0.26 |
133.96 |
307 |
As you can see, the average speedup of RPython compared to Python 2.7.15 is about 21. Moreover, compared to the very large Python interpreter, the size of RPython binary is pretty small.