Metaprogramming and Autotuning
Principal Investigators: David Padua, María Garzarán
When the target machine is a multicore, productivity suffers not only due to the increased likelihood of defects, but also because of the need to make these programs efficient and scalable. Targeting multicores complicates optimization since programmers must deal with issues that do not arise in the sequential world such as load balancing and communication. The natural way to address this problem is automation. Optimization tools have always been important, but their importance is now even greater since they are our only hope to compensate for the increased difficulty brought on by parallelism. In the spirit of separation of concerns and following tradition, we are developing tools whose only objective is performance optimization.
The most important such tool is of course the program optimization passes of the compiler, but our experience indicates that compilers at least with today's technology are not sufficient to address the productivity problem and even with the support of the best compilers the development of efficient and scalable programs remains laborious. The library generators implemented using autotuning techniques constitute a promising new class of tools. These produce codes that achieve impressive efficiency across a wide range of machines. Library generators can be conceived as metaprograms which embody in a single code all the versions that are to be empirically evaluated. Although the vast majority of widely known autotuning metaprograms are library routine generators, metaprograms implementing complete applications are also of great importance when the bulk of the computation cannot be implemented in terms of existing libraries.
The autotuning approach has the advantage over compilers that it can make use of semantic information that typically would not be available to compilers, but this information must in some cases be provided by the programmer and this means extra work. However, the effectiveness with which these autotuning systems enable portability across machines and machine generations has made the extra effort worthwhile in the past because although the initial effort is higher than that required to develop a highly tuned version for a single machine, porting to new machines becomes much simpler. We expect the impact of these systems on productivity to be even greater when dealing with parallelism. Furthermore, we believe that many applications can benefit from autotuning without the intervention of programmers if the application is written in terms of routines for which there exist a generator or in terms of other primitives such as codelets or data parallel operations.
Building on our earlier work on library generators, we are working to make metaprogramming for autotuning a more useful and effective methodology so that it can become one of the foundations of productivity for multicores. In particular, we are studying and developing abstractions and tools to facilitate the implementation of parallel self-tuning codes, which includes the following:
- Continue advancing our understanding of data-dependent autotuning
- Develop languages for metaprogramming and autotuning
- Implement autotuning versions of parallel operators
- Design and development of a codelet-based optimization strategy
- Develop search strategies and implement them to support autotuning