Major TODO Items For Next Release
The DaCapo benchmark suite is a community project, developed by the research community, for the research community. The quality of the workloads depends on community critque and community contributions. There is a large amount of work to be done. Please feel free to help by contributing to one or more of the following tasks (use the mailing list or email Steve Blackburn directly to co-ordinate your efforts).
Revise This List For Next Release
The workload was originally developed for DaCapo under MacOS X using the default JVM on that platform, Sun's 1.5.0 HotSpot VM. Once completed we started testing on a other platforms, with other production JVMs and made two significant observations:
- The workloads are stable on most VMs using H2 database, but were noted as unstable with Derby.
- Performance. Under Mac OS X (using Derby), CPU utilization is near 100% (top shows "180%" or so on a dual-core machine), however under ubuntu, running similar HotSpot JVM on similar hardware, CPU utilization drops precipitiously. However, shifting to H2 the performance seems stable so this is no longer a priority.
Feedback On Existing Benchmarks
Feedback on the existing benchmarks will help us determine which benchmarks to drop from the upcoming release (the release will include new benchmarks and drop some of the existing benchmarks). If you have any comments, please let us know.
Current candidates for exclusion from the upcoming suite include
bloat (has some notable idiosyncrasies, and is not extensively deployed),
hsqldb (superseded by
derby), and possibly
Incorporate New Benchmarks
The new suite includes a number of new benchmarks.
Done Jun 2009
Done Dec 2007
Done Jan 2007
Done. H2 driven with the Derby TPC-C like workloads.
Done Dec 2009
Sergey Salishev of Intel has created a workload that simulates a cloud computing environment, including a wiki. It would be great to evaluate this workload and incorporate it into the next release.
There are a number of ways we could improve the DaCapo benchmark harness. These include:
Thread Creation Callbacks
Andrew Tick of HP suggested on the mailing list in Sept 07 that we include a callback on thread creation, along the lines of the callback we currently have at each iteration start and end.
Inclusion of the Fragger Tool
Cliff Click pointed me to the fragger tool which injects fragmentation into the Java heap. It would be nice to include fragger as a commandline option on DaCapo.
Each of the new benchmarks needs to be documented and the documentation on building etc needs to be updated to reflect changes over the past two years.
Update Existing Benchmarks
We have been incrementally updating version numbers in the svn head. Each benchmark needs to be rechecked to ensure it is at the most recent stable release for that workload.
We have used a very particular version of Xalan (2.4.1), on advice from Kev Jones of Intel (who provided us with our current version of the workload). Kev's rationale may have become dated with newer releases of Xalan. We should revisit this decision for the next release.
In addition to updating the Eclipse version, it would be great to investigate, and if necessary address a problem identified by Matt Arnold in June 2007:
I have some info about the Dacapo Eclipse benchmark that you may be
interested in. If I should contact someone else instead, feel free to let
If you run eclipse for a large number of iterations, the performance forms
a saw tooth, degrading significantly (a factor of 10 or more), then jumps
back to normal. An excel graph of the performance over time is attached
below. (Dacapo version 2006-10.jar). It happens on both Sun's and IBM
During the slow iterations the program is spending most of its time in
jitted code. The problem is the following two methods:
Both of these methods have a linear search through some kind of linked list
of weak references. Here's my guess at what is happening: This list
grows over time and the linear searches eventually becomes a huge
bottleneck. Eventually some memory threshold is crossed and the VM clears
the weak references, and performance goes back to normal.
This is clearly crappy code (linear searching) but it could also be a
benchmark bug. Is it possible that this data structure should have been
re-initialized between iterations, and the iterating nature of the driver
is creating a problem is unlikely to exist in the real application?
For questions or comments please use the researchers mailing list.
Copyright 2001-2008 by the DaCapo
Last modified: Sun Jun 21 13:46:36 EST 2009
All Rights Reserved.