BEAST on BEAGLE Benchmarks

We did some benchmarking of our implementation of BEAST (1.6.1) on the BEAGLE framework. All benchmarking runs were performed on the Trestles Supercomputer.
To determine how to run BEAST with the BEAGLE library efficiently, we asked the users who requested BEAST to contribute exemplar data sets. The four data sets we received had too few unique sites per partition to benefit from using GPUs, much like the benchmark1.xml data set distributed with BEAST. Accordingly we implemented BEAST with BEAGLE on Trestles at SDSC using the beagle_SSE option (using CPUs instead of GPUs).

Two types of threaded parallelization are available and we used these together to optimize resource use.

  1. BEAST allows the user to run a separate thread for each partition.
  2. BEAGLE allows the user to run an arbitrary number of threads within a given partition.

The following table shows the run times we measured on Trestles using the native BEAST kernel and the beagle_SSE kernel for various combinations of threads and CPU cores.  The best performance was generally obtained using the beagle_SSE kernel with BEAST threads first and BEAGLE threads second. Since the speed of a given run does not increase linearly with the number of cores used, there is a tradeoff in cost between decreasing the run time and increasing resource use.  Runs made on Trestles using eight cores (highlighted in green in the table below) seemed to optimize these two criteria, giving speedups from 2 to 7-fold depending upon the data set. Higher speedups are possible, but with the data sets we looked at, these speedups came only at much higher cost. (Please see the Cost column in the table below for examples).

Run times and speedups of BEAST/BEAGLE on Trestles for four user data sets
 
Data set
ntax
nchar
Partitions
Unique sites /partition
Time steps
Kernel
BEAST threads
BEAGLE threads
Cores
Run time
Speedup
Cost
(min)
(cpu min)
DS 1
48
1,577
2
237 - 276
100k
native
1
1
2.16
1.00
2.16
native
2
2
1.95
1.11
3.90
beagle_SSE
1
1
1
2.60
0.83
2.60
beagle_SSE
2
4
8
1.06
2.03
8.48
beagle_SSE
2
8
16
1.03
2.10
16.48
DS 2
131
3,095
4
122 - 752
10k
native
1
1
6.15
1.00
6.15
native
4
4
3.55
1.73
14.20
beagle_SSE
1
1
1
3.45
1.78
3.45
beagle_SSE
4
2
8
1.14
5.41
9.09
beagle_SSE
4
4
16
0.87
7.04
13.07
DS 3
348
6,954
16
37 - 814
10k
native
1
1
141.22
1.00
141.22
native
16
16
33.09
4.27
529.22
beagle_SSE
1
1
1
66.83
2.11
66.83
beagle_SSE
8
1
8
20.37
6.93
162.93
beagle_SSE
16
2
32
14.66
9.63
469.05
DS 4
271
11,440
27
26 - 245
10k
native
1
1
45.07
1.00
45.07
native
27
27
10.81
4.17
291.81
beagle_SSE
1
1
1
31.02
1.45
31.02
beagle_SSE
8
1
8
11.40
3.95
91.18
beagle_SSE
27
1
27
9.81
4.59
264.89

If you feel your data set differs dramatically from those given above, you can send us a copy, and we will look at possible new configurations for your data set. We are always happy to receive input on the speedups you see using our BEAST implementation, and advice on how to make BEAST more useful to the community.

If there is a tool or a feature you need, please let us know.