PeakStream unveils multicore and CPU/GPU programming solution

Today marks the official launch of PeakStream, a software start-up that has been operating in stealth mode for over a year now while developing a new type of software platform aimed at making multiprocessor systems easier to program. PeakStream's product, which I'll describe in more detail in a moment, is basically a set of tools—APIs, a virtual machine, a system profiler, a JIT compiler, etc.—that present a standardized, stream-processing-based programming model for which programmers can develop multithreaded applications. A program written to PeakStream's APIs can be compiled once and run on a variety of multiprocessing platforms, including multicore x86 processors, CPU + GPU combinations, and eventually even IBM's Cell. The PeakStream Platform's VM handles all of the scheduling and resource allocation behind the scenes, fitting the application to each system's particular mix of hardware at runtime.

PeakStream's product is aimed at the high-performance computing market, and in particular at customers in the oil and gas, defense, and finance industries, as well as in academia. As an HPC play, PeakStream hopes to capitalize on the steady encroachment of commodity Linux + x86 clusters into niches once dominated by more expensive and specialized mainframe systems. The idea is to improve the compute performance of each x86 node in an HPC cluster by making it easy to harness the power of the GPU, and of any other commodity coprocessors that can be cheaply placed in the system.

The new company is entering the public eye with $17 million in funding and a leadership roster that includes former executives and tech guys from Sun, VMWare, NVIDIA, and NetApp. PeakStream's Chief Scientist is Prof. Pat Hanrahan of Stanford, who was formerly involved with Stanford's stream processing research endeavor, the Brook project. The Brook project's work on using GPUs as stream processors formed the foundation on which PeakStream has built their newly announced product.

Stream processing in a nutshell

The core concept behind the PeakStream Platform is a technique called stream processing. Stream processing is quite similar to SIMD processing, but where SIMD processors use single instructions to operate on vectors, stream processors use kernels to operate on streams.

An input stream is an array of data elements that can be operated on in parallel. Input streams are fed into a stream processor one stream at a time, where they're operated on by collections of instructions called kernels. A kernels is a sequence of instructions that are to be applied to each element in a stream. Thus a kernel function acts like a small loop that iterates once once for each stream element.

SISD, SIMD, and stream processing compared

In the diagram above, I've tried to illustrate the fundamental differences between standard serial (SISD), SIMD, and stream programming models using a type of diagram that I've employed in previous articles. As you can see, the kernels flow into the stream processor along with input streams, while the output streams flow from the processor and back into storage.

The other important difference to note between stream processing and SISD or SIMD processing is that the later's data and instructions are stored in registers (the program counter and data registers), whereas kernels and streams are both stored in the cache. Thus stream processing makes use of locality of reference by explicitly grouping related code and data together for easy fetching into the cache.

As you might expect, stream processing shows great promise for workloads that exhibit high degrees of data parallelism. Researchers have been able to use GPUs as stream processors by coding kernels in a graphics language as vertex shaders and storing streams as textures. Of course, this kind of coding requires that the programmer know a graphics language and be well acquainted with the details of the particular GPU architecture for which he or she is coding. For programmers who'd like to write stream programs but don't want this kind of hassle, PeakStream feels they have the answer.

The PeakStream platform

The PeakStream platform uses the software tools mentioned above (APIs, a VM, a debugger, etc.) to enable the programmer to easily write programs to a generalized stream processor. These programs can then be run on any supported mulitprocessor system, with the PeakStream VM handling the job of scheduling code to run on the CPU, the GPU, and/or other coprocessors. This way, programmers don't have to know anything about graphics programming in order to be able to take advantage of the GPU as a coprocessor.

All of the standard math libraries used in HPC are implemented and give correct results, and the APIs are created to be as close to standard HPC APIs as possible. Programmers can code for the VM in C/C++ using their existing development environments (e.g., gcc, Visual Studio, Eclipse, etc.), linking in the PeakStream libraries where needed.

The initial version of the PeakStream Platform is focused on CPU + GPU integration, which makes sense considering that the product has its roots in a GPU-based research project. In fact, PeakStream worked closely with ATI in developing this product, just as Stanford's earlier Brook project made use of ATI expertise.

Speaking of ATI, I think it's probably a stroke of good luck for the new company that the AMD/ATI merger is taking place. The PeakStream model will work best on hardware platforms with plenty of bandwidth and a more tightly coupled CPU and GPU combination. Also, AMD's Torrenza initiative dovetails well with PeakStream's plans to support new types of coprocessors in the future.

Ultimately, it's way too early to tell what kind of impact PeakStream will have in the HPC market. My guess is that it's likely to be a factor in the continuing uptake of commodity hardware-based clusters in the HPC market, now that each node can gain a massive speedup in some types of data-parallel floating-point codes with a minimum amount of programmer effort.

Speaking strictly in terms of performance, there are two factors to consider. The first factor is PeakStream's raw performance on the kind of code that it does best, and how that performance stands up to more custom-fitted CPU/GPU code. I think it's a safe bet that the PeakStream platform, with its JIT and its VM, will incur some significant overhead that place it well behind hand-coded implementations. So businesses that use PeakStream will be trading programmer time for GFLOPs. If you live and die by a few cycles worth of speedup on an algorithm, then you're going to want to skip this product, but if you don't, then it seems like it will be worth it from due to ease and speed of implementation, reusability, and scalability.

The other part of the performance question will be, how does bringing the GPU into the picture via PeakStream affect the overall competitiveness of commodity hardware clusters vs. more traditional big iron. The answer to that question is going to depend entirely on the number of important HPC applications that are amenable to stream processing, and to floating-point stream processing in particular. If the answer is "quite a few," then PeakStream will make commodity x86 clusters even more attractive for HPC from an overall price/performance standpoint.

Tech —

PeakStream unveils multicore and CPU/GPU programming solution

Today marks the official launch of PeakStream, a software start-up that has …

Stream processing in a nutshell

The PeakStream platform

Channel Ars Technica

Stream processing in a nutshell

The PeakStream platform

reader comments

Channel Ars Technica