Cipres Science Gateway | BEAST Information

BEAST on BEAGLE Benchmarks

We did some benchmarking of our implementation of BEAST (1.6.1) on the BEAGLE framework. All benchmarking runs were performed on the Trestles Supercomputer.
To determine how to run BEAST with the BEAGLE library efficiently, we asked the users who requested BEAST to contribute exemplar data sets. The four data sets we received had too few unique sites per partition to benefit from using GPUs, much like the benchmark1.xml data set distributed with BEAST. Accordingly we implemented BEAST with BEAGLE on Trestles at SDSC using the beagle_SSE option (using CPUs instead of GPUs).

Two types of threaded parallelization are available and we used these together to optimize resource use.

BEAST allows the user to run a separate thread for each partition.
BEAGLE allows the user to run an arbitrary number of threads within a given partition.

The following table shows the run times we measured on Trestles using the native BEAST kernel and the beagle_SSE kernel for various combinations of threads and CPU cores. The best performance was generally obtained using the beagle_SSE kernel with BEAST threads first and BEAGLE threads second. Since the speed of a given run does not increase linearly with the number of cores used, there is a tradeoff in cost between decreasing the run time and increasing resource use. Runs made on Trestles using eight cores (highlighted in green in the table below) seemed to optimize these two criteria, giving speedups from 2 to 7-fold depending upon the data set. Higher speedups are possible, but with the data sets we looked at, these speedups came only at much higher cost. (Please see the Cost column in the table below for examples).

Run times and speedups of BEAST/BEAGLE on Trestles for four user data sets

Data set	ntax	nchar	Partitions	Unique sites /partition	Time steps	Kernel	BEAST threads	BEAGLE threads	Cores	Run time	Speedup	Cost
										(min)		(cpu min)
DS 1	48	1,577	2	237 - 276	100k	native	1		1	2.16	1.00	2.16
						native	2		2	1.95	1.11	3.90
						beagle_SSE	1	1	1	2.60	0.83	2.60
						beagle_SSE	2	4	8	1.06	2.03	8.48
						beagle_SSE	2	8	16	1.03	2.10	16.48

DS 2	131	3,095	4	122 - 752	10k	native	1		1	6.15	1.00	6.15
						native	4		4	3.55	1.73	14.20
						beagle_SSE	1	1	1	3.45	1.78	3.45
						beagle_SSE	4	2	8	1.14	5.41	9.09
						beagle_SSE	4	4	16	0.87	7.04	13.07

DS 3	348	6,954	16	37 - 814	10k	native	1		1	141.22	1.00	141.22
						native	16		16	33.09	4.27	529.22
						beagle_SSE	1	1	1	66.83	2.11	66.83
						beagle_SSE	8	1	8	20.37	6.93	162.93
						beagle_SSE	16	2	32	14.66	9.63	469.05

DS 4	271	11,440	27	26 - 245	10k	native	1		1	45.07	1.00	45.07
						native	27		27	10.81	4.17	291.81
						beagle_SSE	1	1	1	31.02	1.45	31.02
						beagle_SSE	8	1	8	11.40	3.95	91.18
						beagle_SSE	27	1	27	9.81	4.59	264.89

If you feel your data set differs dramatically from those given above, you can send us a copy, and we will look at possible new configurations for your data set. We are always happy to receive input on the speedups you see using our BEAST implementation, and advice on how to make BEAST more useful to the community.

If there is a tool or a feature you need, please let us know.