The upcoming EPD 6.1, released on March 2nd, 2010, links NumPy and SciPy against MKL (Intel Math Kernel Library). This means that on all platforms where the MKL is available, namely Windows, OSX and Linux, we link NumPy and SciPy dynamically against MKL.
The following results were obtained by running a benchmark program,
All results show the execution time (in seconds) of the function,
using an N x N random matrix as input, selecting the fasted of three runs.
We have also included timings for the numpy included in older EPD version,
which was linked against the ATLAS library on Windows and Linux,
and to the Accelerate framework on OSX. All tables contain execution
times (in seconds) for different functions in numpy.linalg.
The ATLAS benchmark results were obtained by running the benchmark program on an EPD 5.1.1 install.
func threads 500 1000 1500 2000
========================================================
det
ATLAS 0.046 0.327 1.061 2.464
MKL 1 0.015 0.125 0.390 0.842
2 0.011 0.078 0.234 0.546
eig
ATLAS 3.105 24.772 83.522 195.780
MKL 1 0.764 4.960 15.756 35.708
2 0.592 4.056 12.979 28.813
eigh
ATLAS 0.530 3.588 11.716 26.972
MKL 1 0.172 1.076 3.510 8.096
2 0.109 0.671 2.246 5.350
eigvals
ATLAS 1.185 9.484 31.574 74.240
MKL 1 0.436 2.543 7.426 15.616
2 0.405 2.184 6.068 12.714
eigvalsh
ATLAS 0.437 3.510 12.464 29.983
MKL 1 0.062 0.405 1.653 3.869
2 0.032 0.265 1.092 3.026
inv
ATLAS 0.187 1.341 4.306 9.937
MKL 1 0.062 0.452 1.435 3.230
2 0.031 0.280 0.842 1.887
svd
ATLAS 1.045 7.442 24.024 55.427
MKL 1 0.374 2.683 9.220 21.263
2 0.265 1.779 6.428 15.740
The Accelerate framework always uses the maximum number of threads, i.e. 2 in the table below.
func threads 200 500 1000 1500 2000
======================================================================
det
Accelerate 0.003881 0.031205 0.159673 0.402085 0.825475
MKL 1 0.002485 0.026606 0.156718 0.460423 0.996469
2 0.002082 0.020490 0.111589 0.322130 0.685631
eig
Accelerate 0.151646 2.034871 16.147479 54.945526 129.610227
MKL 1 0.100915 0.791869 5.536657 17.653579 40.505470
2 0.089906 0.684320 4.738974 15.291821 34.538202
eigh
Accelerate 0.036075 0.278806 1.579896 4.928605 11.026954
MKL 1 0.021456 0.193353 1.160592 3.860195 9.211294
2 0.015369 0.133706 0.801191 2.635106 6.269004
eigvals
Accelerate 0.066050 0.755720 6.156935 20.873578 51.116664
MKL 1 0.062604 0.441933 2.737576 7.781239 17.362116
2 0.058361 0.403419 2.445986 6.893551 15.415134
eigvalsh
Accelerate 0.015713 0.170914 1.295294 5.317904 13.737133
MKL 1 0.007585 0.071506 0.450339 1.801380 4.544002
2 0.006159 0.051889 0.315961 1.321673 3.666880
inv
Accelerate 0.011015 0.087874 0.457264 1.256251 2.642413
MKL 1 0.008459 0.090927 0.573598 1.673382 3.686808
2 0.006394 0.067684 0.369199 1.072284 2.349210
svd
Accelerate 0.063233 0.481699 3.072801 9.891473 22.554218
MKL 1 0.042538 0.409626 3.039787 10.266544 24.344418
2 0.036743 0.304195 2.121625 7.680251 18.703893
The ATLAS routines always use one thread only.
func threads 200 500 1000 1500 2000
======================================================================
det
ATLAS 0.001978 0.019611 0.119601 0.371773 0.805141
MKL 1 0.001423 0.015214 0.100316 0.319583 0.700174
2 0.001041 0.010443 0.058342 0.181791 0.395840
3 0.001087 0.008494 0.044751 0.137921 0.302166
4 0.001120 0.007905 0.041605 0.120641 0.255480
eig
ATLAS 0.146478 2.427373 21.558434 73.906174 186.378927
MKL 1 0.080537 0.632089 4.160941 13.015139 29.167038
2 0.087990 0.583108 3.597658 10.953161 24.393856
3 0.088137 0.542243 3.596853 11.027849 24.375258
4 0.087050 0.538182 3.282770 10.007089 22.272444
eigh
ATLAS 0.028070 0.214199 1.306518 4.253866 9.625678
MKL 1 0.017454 0.150351 0.904984 2.973798 6.900463
2 0.012303 0.099450 0.536283 1.604131 3.838004
3 0.011904 0.088765 0.464113 1.436107 3.457642
4 0.011014 0.078823 0.395830 1.198582 2.961812
eigvals
ATLAS 0.084425 0.969582 9.271397 35.484099 86.982823
MKL 1 0.055514 0.382756 2.278501 6.140537 13.106408
2 0.057068 0.380757 2.003261 5.291290 11.005101
3 0.056770 0.373002 2.067459 5.336213 11.115501
4 0.060181 0.365830 1.923270 5.063103 10.328758
eigvalsh
ATLAS 0.009983 0.105646 0.785348 3.212356 8.172313
MKL 1 0.006320 0.052221 0.330132 1.301236 3.204751
2 0.005251 0.040802 0.214501 0.669560 1.795785
3 0.004931 0.036338 0.177316 0.600432 1.735426
4 0.005339 0.034990 0.164250 0.522566 1.568605
inv
ATLAS 0.006228 0.066323 0.423782 1.392812 3.053785
MKL 1 0.003913 0.050813 0.355910 1.186602 2.715534
2 0.002944 0.031845 0.198924 0.663284 1.486224
3 0.002529 0.024905 0.153636 0.513211 1.114543
4 0.002375 0.021784 0.131506 0.437813 0.938628
svd
ATLAS 0.047632 0.383549 2.719373 8.960831 20.414181
MKL 1 0.029945 0.283172 2.192700 7.574861 17.597544
2 0.028022 0.220260 1.305435 4.674676 11.568503
3 0.027393 0.207454 1.244003 4.425071 11.162057
4 0.027328 0.199932 1.073903 3.985288 10.409413
Note that these results were obtained on an AMD processor. The ATLAS routines always use one thread only.
func threads 200 500 1000 1500 2000
======================================================================
det
ATLAS 0.004542 0.052334 0.344503 1.095520 2.448515
MKL 1 0.003537 0.042843 0.291940 0.933989 2.084115
2 0.002697 0.026416 0.166778 0.531760 1.181758
3 0.002251 0.019843 0.123858 0.386968 0.842950
4 0.002169 0.019846 0.125505 0.386224 0.840673
eig
ATLAS 0.259763 4.393425 37.621182 135.050718 323.719892
MKL 1 0.170918 1.779667 11.314629 33.659668 75.974679
2 0.167206 1.473480 9.096502 26.676907 60.188044
3 0.158445 1.386303 8.309512 24.419109 54.721468
4 0.168030 1.354281 8.414269 24.262779 54.756768
eigh
ATLAS 0.055759 0.478036 3.157796 9.529790 21.862535
MKL 1 0.031404 0.339818 2.456596 7.765839 18.292306
2 0.020818 0.204601 1.437134 4.721491 10.401662
3 0.018524 0.165718 1.108386 3.743436 8.138211
4 0.018394 0.163665 1.093884 3.839852 8.619979
eigvals
ATLAS 0.122135 1.721329 15.441173 56.874484 139.097476
MKL 1 0.100709 0.905109 5.638054 15.726575 34.627801
2 0.093802 0.836252 4.697184 12.484315 26.458316
3 0.094680 0.792386 4.620656 11.530036 24.523773
4 0.094250 0.812030 4.414533 11.633799 24.092832
eigvalsh
ATLAS 0.016672 0.248108 1.844125 6.192654 14.587824
MKL 1 0.010020 0.120006 1.097865 3.446052 8.606070
2 0.007688 0.078474 0.658583 2.407154 5.103678
3 0.007492 0.065678 0.496656 2.129318 4.908329
4 0.007383 0.066361 0.491305 1.814141 4.905298
inv
ATLAS 0.016100 0.180923 1.267138 4.280442 9.528451
MKL 1 0.012346 0.158630 1.106797 3.631172 8.235542
2 0.007875 0.090208 0.599726 2.020295 4.473653
3 0.006477 0.067686 0.448351 1.451953 3.236595
4 0.006439 0.068643 0.452317 1.441642 3.255666
svd
ATLAS 0.103712 0.930641 6.289478 19.882772 46.678009
MKL 1 0.060373 0.779341 6.264379 19.597140 46.323370
2 0.049049 0.467218 3.705661 11.203994 26.130302
3 0.045428 0.371707 2.980605 8.911639 21.034134
4 0.045731 0.377631 2.938632 9.917269 20.900861
Below we show the speed-up over ATLAS (Linux, Windows) and Accelerate
(OSX) offered by MKL. All data pertains to benchmarking data from
the eig function.
Along with the MKL runtime libraries, the MKL package in EPD also contains a small module which exposes functions. The main reason we have added this interface module is because it allows setting the number of computational threads in MKL. This is done as follows:
>>> import mkl >>> mkl.get_max_threads() 2 >>> mkl.set_num_threads(1) >>> mkl.get_max_threads() 1
The mkl interface module, currently contains the
following functions: get_cpu_clocks, get_cpu_frequency,
get_max_threads, get_version_string, set_num_threads,
thread_free_buffers.
These functions call the corresponding MKL service functions,
which are declared in mkl_service.h, e.g. the function
mkl.get_version_string calls
mkl_get_version_string.
For more details, see the function docstrings, as well as the MKL
documentation.