I'm currently pondering the benefits of GPGPU on AMD's Fusion processors from a performance and power consumption perspective. To scratch this itch, I have starting to look into how well the OpenCL applications in the Phoronix Benchmark Suite run on my AMD E-350 netbook. I'm looking at the performance from both a throughput as well as from a power consumption perspective.
So far from a few quick tests, the performance looks like about 60%-100% better on the integrated GPU core than running the applications on the CPUs. Power-wise, the system's power consumption (as measured by the battery drain through ACPI) seems to be about 2 watts better when running on the GPU versus the CPU (overall system power ranges from ~12w-19w.
I decided to start looking into this for several reasons:
1. AMD's Fusion products have relatively better GPUs than they do CPUs. As a result, GPGPU could potentially have a large benefit on Fusion systems.
2. The mobile Fusion GPUs are designed for relatively low-power systems; as a result, the GPU's power consumption should be relatively modest as compared to a lot of discrete GPU applications. The low power of the GPU opens up the potential to utilize GPGPU to save power.
3. GPGPU performance is frequently heavily limited by memory bandwidth; as a result, it would be interesting to see if a GPGPU application has an advantage over a CPU-based application when they have access to exactly the same memory.
Before I spend too much time studying this question on my netbook, I'm wondering what other people think.
Tuesday, June 14, 2011
Tuesday, February 1, 2011
I'm thinking about, if I continue this work beyond my Ph.D. work, to develop a version of OCCAM that uses OpenCL. Here are some potential benefits that may result from using OCCAM with OpenCL:
- OpenCL is a stream-like language, similar to the programming model that OCCAM already targets. As a result, an OpenCL version of OCCAM would be relatively straightforward to do.
- OpenCL is/is becoming a widespread system for running parallel code on multicore as well as other platforms like GPGPU and CELL. As a result, moving to this platform would greatly improve the applicability of OCCAM.
- OpenCL is somewhat opaque in how it works under the hood, due to vendors' various proprietary implementations of it for their GPUs, etc. As a result, trying to ensure portable real-time performance on a variety of OpenCL platforms will be likely highly challenging.
- OCCAM's modeling of the application and computer system's behavior as a stochastic system can allow the application designer to work around this issue by adapting the application as needed. Providing such functionality may not require the full, multiple degrees of freedom implementation that OCCAM currently provides. Instead, it may only be necessary to provide a stripped-down version that just adapts the application to meet its timing deadline.
Thursday, April 22, 2010
Sunday, April 11, 2010
Right now, I'm trying to formally model OCCAM's behavior as a Markov Decision Process (MDP). Doing so will allow me to use a large variety of standard tools for testing MDPs, such as various MDP toolboxes for MATLAB. This will allow me to evaluate the suitability of different types of MDP solvers (Q-learning, value iteration, policy iteration, etc.) for OCCAM's controller.
INRA's MDP Toolbox is what I'm using at the moment. It seems to be working out pretty well. One interesting thing is that describing the problem as an MDP is very memory intensive, but solving this MDP it is not particularly CPU intensive. Currently, MATLAB is using about 3.2GB of memory (note that I *am* using sparse matrices to reduce memory consumption), but I can find a policy with the tools within a few seconds. I'll have to see if this interesting property holds when I start working on problems that truly require a long/infinite horizon solution.
Friday, December 18, 2009
In order to make my work portable between my laptop, desktop, and any other machines I might happen to be using, I put all of my work and a lot of the applications I needed on a 32GB USB thumb drive. After 11 months of use, the drive started acting weird. Basically, the partitions/filesystem completely disappeared, and the drive was reported as being 26GB.
I had the providence of having backed up everything that morning, so I only really lost about an hour of work. After replacing the flash drive (it was an OCZ ATV drive) with a 32GB Patriot XT drive that I bought at MicroCenter in the Denver Tech Center, I downloaded a utility OCZ uses for diagnosing and updating the firmware on their drives. Yup, the tests showed lots of bad blocks. Also interesting was that they used MLC (multi-level cell) flash on the drive (MLC is usually only good for about 10k rewrites, while SLC is good for 100k-1million+ rewrites).
I'm actually really impressed with the build quality of my old OCZ drive. That thing suffered *a lot* of abuse, and what took it down in the end was flash wearout, not any sort of mechanical breakage.
Sunday, December 13, 2009
Parallel Python (http://www.parallelpython.com/) is a lightweight library for providing task-level parallelism in Python. It supports both SMP systems as well as clusters. Basically, to use it, you schedule a function for execution with some parameters and context and the scheduler farms out the work to Parallel Python "servers" running on other machines. Most of the annoying details of parallel programming (scheduling, etc.) are taken care of by the library.
Used it today to run a particularly long-running Python script on 27 different processor cores.