| <!--#include virtual="../../header.incl" --> |
| |
| <div class="www_sectiontitle">LLVM Performance Workshop at CGO</div> |
| |
| <ul> |
| <li><b>What</b>: LLVM Performance Workshop at CGO</li> |
| <li><b>When</b>: Saturday February 4th, 2017</li> |
| <li><b>Where</b>: Austin, Texas, USA</li> |
| </ul> |
| |
| <p> |
| An LLVM Performance Workshop will be held at CGO 2017. The workshop |
| is co-located with CC, HPCA, and PPoPP. If you are interested in |
| attending the workshop, please register at the |
| <a href="http://cgo.org/cgo2017/workshops.html">CGO website</a>. |
| </p> |
| |
| <div class="www_sectiontitle">Program</div> |
| <p> |
| The workshop takes place at the <a href="http://cgo.org/cgo2017/travel-information.html">Hilton Hotel</a> in |
| downtown Austin (500 East 4th St). |
| </p> |
| <p> |
| <font color="red"><b>Update:</b></font> If you indicated this morning that you wanted to join us for dinner, here's the location of the restaurant: <a href="http://www.manuels.com/">Manuel's Downtown</a>, 310 Congress Avenue, Austin, TX 78701. We have a reservation at <b>5pm</b> (dinner is at your own expense). The restaurant is within walking distance from the hotel. |
| </p> |
| <p> |
| <table border="1"> |
| <tr><th>Time</th> <th>Room</th> <th>Speaker</th> <th>Title</th> <th> </th></tr> |
| <tr> |
| <td>7:30-8:30</td> |
| <td>616AB</td> |
| <td colspan=3>Breakfast</td> |
| </tr> |
| <tr> |
| <td> </td> |
| <td> </td> |
| <td colspan=3><b>Session 1: Parallel Code Generation</b></td> |
| </tr> |
| <tr> |
| <td>8:30am</td> |
| <td>400/402</td> |
| <td>Johannes Doerfert (Saarland University)</td> |
| <td>Polyhedral "Driven" Optimizations on Real Codes</td> |
| <td><a href="#doerfert">[Abstract]</a> [<a href="Polyhedral-Driven-Optimizations-on-Real-Codes.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>9:00am</td> |
| <td>400/402</td> |
| <td>Tobias Grosser (ETH Zurich)</td> |
| <td>Polly-ACC - Accelerator support with Polly-ACC</td> |
| <td><a href="#grosser">[Abstract]</a> [<a href="Polly-ACC-Transparent-Compilation-to-Heterogeneous-Hardware.pptx">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>9:30am</td> |
| <td>400/402</td> |
| <td>Tao Schardl and William Moses (MIT)</td> |
| <td>The Tapir Extension to LLVM's Intermediate Representation for Fork-Join Parallelism</td> |
| <td><a href="#schardl">[Abstract]</a> [<a href="tapir-llvm.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>10:00-10:30</td> |
| <td>616AB</td> |
| <td colspan=3>Break</td> |
| </tr> |
| <tr> |
| <td> </td> |
| <td> </td> |
| <td colspan=3><b>Session 2: Performance in Libraries and Languages</b></td> |
| </tr> |
| <tr> |
| <td>10:30am</td> |
| <td>400/402</td> |
| <td>Hal Finkel (Argonne National Laboratory)</td> |
| <td>Modeling restrict-qualified pointers in LLVM</td> |
| <td><a href="#finkel">[Abstract]</a> [<a href="Restrict-Qualified-Pointers-in-LLVM.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>11am</td> |
| <td>400/402</td> |
| <td>Pranav Bhandarkar, Anshuman Dasgupta, Ron Lieberman, Dan Palermo (Qualcomm Innovation Center) Dillon Sharlet and Andrew Adams (Google)</td> |
| <td>Halide for Hexagon DSP with Hexagon Vector eXtensions (HVX) using LLVM</td> |
| <td><a href="#bhandarkar">[Abstract]</a> [<a href="Halide-for-Hexagon-DSP-with-Hexagon-Vector-eXtensions-HVX-using-LLVM.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>11:30am</td> |
| <td>400/402</td> |
| <td>Aditya Kumar and Sebastian Pop (Samsung Austin R&D Center)</td> |
| <td>Performance analysis of libcxx</td> |
| <td><a href="#kumar">[Abstract]</a> [<a href="Performance-analysis-of-libcxx.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>12:00-1:30</td> |
| <td> </td> |
| <td colspan=3>Lunch</td> |
| </tr> |
| <tr> |
| <td> </td> |
| <td> </td> |
| <td colspan=3><b>Session 3: Whole-application performance tuning</b></td> |
| </tr> |
| <tr> |
| <td>1:30pm</td> |
| <td>400/402</td> |
| <td>Brian Railing (CMU)</td> |
| <td>Improving LLVM Instrumentation Overheads</td> |
| <td><a href="#railing">[Abstract]</a> [<a href="Improving-LLVM-Instrumentation-Overheads.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>2pm</td> |
| <td>400/402</td> |
| <td>Sergei Larin, Harsha Jagasia and Tobias Edler von Koch (Qualcomm Innovation Center)</td> |
| <td>Impact of the current LLVM inlining strategy on complex embedded application memory utilization and performance</td> |
| <td><a href="#larin">[Abstract]</a> [<a href="Impact-of-the-current-LLVM-inlining-strategy.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>2:30pm</td> |
| <td>400/402</td> |
| <td>Mehdi Amini (Apple)</td> |
| <td>LTO/ThinLTO BoF</td> |
| <td><a href="#amini">[Abstract]</a></td> |
| </tr> |
| <tr> |
| <td>3:00-3:30</td> |
| <td>616AB</td> |
| <td colspan=3>Break</td> |
| </tr> |
| <tr> |
| <td> </td> |
| <td> </td> |
| <td colspan=3><b>Session 4: Backend optimizations</b></td> |
| </tr> |
| <tr> |
| <td>3:30pm</td> |
| <td>400/402</td> |
| <td>Krzysztof Parzyszek (Qualcomm Innovation Center)</td> |
| <td>Register Data Flow framework</td> |
| <td><a href="#krzy">[Abstract]</a> [<a href="Register-Data-Flow-Framework.pptx">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>4pm</td> |
| <td>400/402</td> |
| <td>Evandro Menezes, Sebastian Pop and Aditya Kumar (Samsung Austin R&D Center)</td> |
| <td>Efficient clustering of case statements for indirect branch predictors</td> |
| <td><a href="#menezes">[Abstract]</a> [<a href="Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf">Slides</a>]</td> |
| </tr> |
| <tr> |
| <td>4:30pm</td> |
| <td> </td> |
| <td colspan=3>Workshop ends.</td> |
| </tr> |
| </table> |
| </p> |
| |
| <div class="www_sectiontitle">Abstracts</div> |
| <p> |
| <ul> |
| <li> <a id="krzy"><b>Krzysztof Parzyszek</b>: Register Data Flow framework</a> |
| <p> |
| Register Data Flow is a framework implemented in LLVM that enables |
| data-flow optimizations on machine IR after register allocation. While |
| most of the data-flow optimizations on machine IR take place during the |
| SSA phase, when virtual registers obey the static single assignment |
| form, passes like pseudo-instruction expansion or frame index |
| replacement may expose opportunities for further optimizations. At the |
| same time, data-flow analysis is much more complicated after register |
| allocation, and implementing compiler passes that require it may not |
| seem like a worthwhile investment. The intent of RDF is to abstract this |
| analysis and provide access to it through a familiar and convenient |
| interface. |
| </p> |
| <p> |
| The central concept in RDF is a data-flow graph, which emulates SSA. In |
| contrast to the SSA-based optimization phase where SSA is a part of the |
| program representation, the RDF data-flow graph is a separate, auxiliary |
| structure. It can be built on demand and it does not require any |
| modifications to the program. Traversal of the graph can provide |
| information about reaching definitions of any given register access, as |
| well as reached definitions and reached uses for register |
| definitions. The graph provides connections for easily locating the |
| corresponding elements of the machine IR. A utility class that |
| recalculates basic block live-in information is implemented to make |
| writing whole-function optimizations easier. In this talk, I will give |
| an overview of RDF and its use in the Hexagon backend. |
| </p> |
| </li> |
| <li> <a id="schardl"><b>Tao Schardl and William Moses</b>: The Tapir Extension to LLVM's Intermediate Representation for Fork-Join Parallelism</a> |
| <p> |
| This talk explores how fork-join parallelism, as supported by |
| dynamic-multithreading concurrency platforms such as Cilk and |
| OpenMP, can be embedded into a compiler's intermediate |
| representation (IR). Mainstream compilers typically treat parallel |
| linguistic constructs as syntactic sugar for function calls into a |
| parallel runtime. These calls prevent the compiler from performing |
| optimizations across parallel control flow. Remedying this |
| situation, however, is generally thought to require an extensive |
| reworking of compiler analyses and code transformations to handle |
| parallel semantics. |
| </p> |
| <p> |
| Tapir is a compiler IR that represents logically parallel tasks |
| asymmetrically in the program's control flow graph. Tapir allows |
| the compiler to optimize across parallel control flow with only |
| minor changes to its existing analyses and code transformations. To |
| prototype Tapir in the LLVM compiler, for example, we added or |
| modified approximately 5000 lines of LLVM's approximately |
| 3-million-line codebase. Tapir enables many traditional compiler |
| optimizations for serial code, including loop-invariant-code motion, |
| common-subexpression elimination, and tail-recursion elimination, to |
| optimize across parallel control flow, as well as purely parallel |
| optimizations. |
| </p> |
| <p> |
| This work was conducted in collaboration with Charles E. Leiserson. |
| The proposal is a preliminary copy of our paper on Tapir, which will |
| appear at PPoPP 2017. This talk will focus on the technical details |
| of implementing Tapir in LLVM. |
| </p> |
| </li> |
| <li> <a id="kumar"><b>Aditya Kumar, Sebastian Pop, and Laxman Sole</b>: Performance analysis of libcxx</a> |
| <p> |
| We will discuss the improvements and future work on libcxx. This |
| includes the improvements on standard library algorithms like |
| string::find and basic_streambuf::xsgetn. These algorithms were |
| suboptimal and we got huge improvements after optimizing |
| them. Similarly, we enabled the inlining of constructor and destructor |
| of std::string. We will present a systematic analysis of function |
| attributes in libc++ and the places where we added missing |
| attributes. We will present a comparative analysis of clang-libc++ |
| vs. gcc-libstdc++ on representative benchmarks. Finally we will talk |
| about our contributions to google-benchmark, which comes with libc++, to |
| help keep track of performance regressions. |
| </p> |
| </li> |
| <li> <a id="finkel"><b>Hal Finkel</b>: Modeling restrict-qualified pointers in LLVM</a> |
| <p> |
| It is not always possible for a compiler to statically determine enough |
| about the pointer-aliasing properties of a program, especially for |
| functions which need to be considered in isolation, to generate the |
| highest-performance code possible. Multiversioning can be employed but |
| its effectiveness is limited by the combinatorially-large number of |
| potential configurations. To address these practical problems, the C |
| standard introduced the restrict keyword which can adorn pointer |
| variables. The restrict keyword can be used by the programmer to convey |
| pointer-aliasing information to the optimizer. Often, this is |
| information that is difficult or impossible for the optimizer to deduce |
| on its own. |
| </p> |
| <p> |
| The semantics of restrict, however, are subtle and rely on source-level |
| constructs that are not generally represented within LLVM's |
| IR. Maximally maintaining the aliasing information correctly in the face |
| of function inlining and other code-motion transformations, without |
| interfering with those transformations, is not trivial. While LLVM has |
| long used strict-qualified pointers that are function arguments, and an |
| initial phase of this work provided a way to preserve this information |
| in the face of function inlining, I'll describe a new scheme in LLVM |
| that allows the representation of aliasing information from block-local |
| restrict-qualified pointers as well. This more-general class of |
| restrict-qualified pointers is widely used in scientific code. |
| </p> |
| <p> |
| In this talk, I'll cover the use cases for restrict-qualified pointers, |
| the difficulties in representing their semantics at the IR level, why |
| the existing aliasing metadata cannot represent restrict-qualified |
| pointers effectively, how the proposed representation allows the |
| preservation of these semantics with minimal impact to the optimizer, |
| and how the optimizer can use this information to generate |
| higher-performance code. I'll also discuss how this scheme relates to |
| others related to pointer variables (e.g. TBAA and alignment |
| assumptions). |
| </p> |
| </li> |
| <li> <a id="amini"><b>Mehdi Amini</b>: LTO/ThinLTO BoF</a> |
| <p> |
| LTO is an important technique for getting the maximum performance from |
| the compiler. We presented the ThinLTO model and implementation in LLVM |
| at the last LLVM Dev Meeting. This provided the audience with a good |
| overview of the high-level flow of ThinLTO and the 3-phases split |
| involved. |
| </p> |
| <p> |
| The proposal for this BoF is to gather and discuss the existing |
| user-experience, the current limitations and what features folks are |
| expecting the most out of ThinLTO. We can go over the current |
| optimizations currently in development upstream. |
| </p> |
| </li> |
| <li> <a id="doerfert"><b>Johannes Doerfert</b>: Polyhedral "Driven" Optimizations on Real Codes</a> |
| <p>In this talk I will present polyhedral "driven" optimizations on real |
| codes. The term polyhedral "driven" is used as there are two flavors of |
| optimization I want to discuss (depending on my progress and the |
| duration of the talk). |
| </p> |
| <p> |
| The first follows the classical approach applied by LLVM/Polly but with |
| special consideration of general benchmarks like SPEC. I will show how |
| LLVM/Polly can be used to perform beneficial optimizations in (at least) |
| libquantum, hmmer, lbm and bzip2. I will also discuss what I think is |
| needed to identify such optimization opportunities automatically. |
| </p> |
| <p> |
| The second polyhedral driven optimization I want to present is a |
| conceptual follow-up of the "Polyhedral Info" GSoC project. This project |
| was the first try to augment LLVM analysis and transformation passes |
| with polyhedral information. While the project was build on top of |
| LLVM/Polly, I will present an alternative approach. First I will |
| introduce a modular, demand driven and caching polyhedral program |
| analysis that natively integrates into the existing LLVM pipeline. Then |
| I will show how to utilize this analysis in existing LLVM optimizations |
| to improve performance. Finally, I will use the polyhedral analysis to |
| derive new, complex control flow optimizations that are not, or only in |
| a simpler form, present in LLVM. |
| </p> |
| </li> |
| <li> <a id="grosser"><b>Tobias Grosser</b>: Polly-ACC - Accelerator support with Polly-ACC</a> |
| <p> |
| Programming today's increasingly complex heterogeneous hardware is |
| difficult, as it commonly requires the use of data-parallel languages, |
| pragma annotations, specialized libraries, or DSL compilers. Adding |
| explicit accelerator support into a larger code base is not only costly, |
| but also introduces additional complexity that hinders long-term |
| maintenance. We propose a new heterogeneous compiler that brings us |
| closer to the dream of automatic accelerator mapping. Starting from a |
| sequential compiler IR, we automatically generate a hybrid executable |
| that - in combination with a new data management system - transparently |
| offloads suitable code regions. Our approach is almost regression free |
| for a wide range of applications while improving a range of compute |
| kernels as well as two full SPEC CPU applications. We expect our work to |
| reduce the initial cost of accelerator usage and to free developer time |
| to investigate algorithmic changes. |
| </p> |
| </li> |
| <li> <a id="railing"><b>Brian Railing</b>: Improving LLVM Instrumentation Overheads</a> |
| <p> |
| The behavior and structure of a shared-memory parallel program can be |
| characterized by a task graph that encodes the instructions, memory |
| accesses, and dependencies of each piece of parallel work. The task |
| graph representation can encode the actions of any threading library and |
| is agnostic to the target architecture. Contech [1] is an LLVM-based |
| tool that generates a task graph representation, by instrumenting the |
| program when it is compiled such that it ultimately outputs a task graph |
| when executed. This paper describes several approaches to improving the |
| overhead of Contech's instrumentation by augmenting the static compiler |
| analysis. |
| </p> |
| <p> |
| The additional analyses are able to first determine similar memory |
| address calculations in the LLVM intermediate representation and elide |
| them from the instrumentation to reduce the data recorded, an approach |
| only previously attempted with dynamic binary instrumentation based on |
| common registers [2] [3]. Second, this analysis is supplemented by |
| performing tail duplication which increases the memory operations in a |
| single basic block and therefore may provide further opportunities to |
| elide instrumentation, without compromising the accuracy or detail of |
| the data recorded. These optimizations reduce the data recorded by 22%, |
| which has a proportionate decrease in overhead from 3.7x to 3.3x for |
| PARSEC benchmarks. |
| </p> |
| <p> |
| [1] B. P. Railing, E. R. Hein, and T. M. Conte. "Contech: Efficiently |
| Generating Dynamic Task Graphs for Arbitrary Parallel Programs". In: ACM |
| Trans. Archit. Code Optim. 12.2 (July 2015), 25:1-25:24. |
| </p> |
| <p> |
| [2] Q. Zhao, I. Cutcutache, and W.-F. Wong. "Pipa: Pipelined Profiling |
| and Analysis on Multi-core Systems". In: Proceedings of the 6th Annual |
| IEEE/ACM International Symposium on Code Generation and |
| Optimization. CGO '08. Boston, MA, USA: ACM, 2008, pp. 185-194. |
| </p> |
| <p> |
| [3] K. Jee et al. "ShadowReplica: Efficient Parallelization of Dynamic |
| Data Flow Tracking". In: Proceedings of the 2013 ACM SIGSAC Conference |
| on Computer & Communications Security. CCS '13. Berlin, Germany: |
| ACM, 2013, pp. 235-246. |
| </p> |
| </li> |
| <li> <a id="menezes"><b>Evandro Menezes, Sebastian Pop, and Aditya Kumar</b>: Efficient clustering of case statements for indirect branch predictors</a> |
| <p> |
| We present an O(nlogn) algorithm as implemented in LLVM to compile a |
| switch statement into jump tables. To generate jump tables that can be |
| efficiently predicted by current hardware branch predictors, we added an |
| upper bound on the number of entries in each generated jump table. This |
| modification of the previously best known algorithm reduces the |
| complexity from O(n^2) to O(nlogn). We illustrate the performance |
| achieved by the improved algorithm on the Samsung Exynos-M1 processor |
| running several benchmarks. |
| </p> |
| </li> |
| <li> <a id="bhandarkar"><b>Pranav Bhandarkar, Anshuman Dasgupta, Ron Lieberman, Dan Palermo, Dillon Sharlet, and Andrew Adams</b>: Halide for Hexagon DSP with Hexagon Vector eXtensions (HVX) using LLVM</a> |
| <p> |
| Halide is a domain specific language that endeavors to make it easier to |
| construct large and composite image processing applications. Halide is |
| unique in its design approach to decoupling the algorithm from the |
| organization (schedule) of the computation. Algorithms once written and |
| tested for correctness can then be continually tuned for performance as |
| Halide allows for easily changing the schedule - tiling, parallelizing, |
| prefetching or vectorizing different dimensions of the loop nest that |
| form the structure of the algorithm. |
| </p> |
| <p> |
| Halide programs are transformed into the Halide Intermediate |
| Representation (IR) by the Halide compiler. This IR is analyzed and |
| optimized before generating LLVM bitcode for the target |
| requested. Halide links with the LLVM optimizer and codegen libraries |
| for supported targets, and uses these to generate object code. |
| </p> |
| <p> |
| In this workshop, we will present our work on retargeting Halide to the |
| Hexagon DSP with focus on the Hexagon Vector eXtensions (HVX). |
| </p> |
| <p> |
| Our workshop will present the halide constructs used in a simple blur |
| 5x5, the corresponding Halide IR, and a few of the important LLVM |
| Hexagon passes which generate HVX vector instructions. |
| </p> |
| <p> |
| We will demonstrate compilation using LLVM.org and Halide.org tools, and |
| execution of the blur 5x5 pipeline on a Snapdragon 820 development board |
| using the Halide Hexagon offloader. In particular we will demonstrate |
| the various improvements which can be realized with scheduling changes. |
| </p> |
| </li> |
| <li> <a id="larin"><b>Sergei Larin, Harsha Jagasia and Tobias Edler von Koch</b>: Impact of the current LLVM inlining strategy on complex embedded application memory utilization and performance</a> |
| <p> |
| Sophisticated embedded applications with extensive and fine degree of |
| memory management are presenting a unique challenge to contemporary tool |
| chains. Like many open source projects LLVM optimizes its core |
| optimization tradeoffs for common cases and a set of common |
| architectures. Even with back end specific hooks, it is not always |
| possible to exert appropriate degree of control over some key |
| optimizations. We propose a case study on "in-depth" analysis of LLVM |
| PGO assisted inlining in a complex embedded application. |
| </p> |
| <p> |
| The program in question is a large scale embedded networking application |
| designed to be custom tuned for a variety of actual embedded platforms |
| with a range of memory and performance constrains. It makes a high use |
| of linker scripts to configure and fine tune memory assignment to |
| ultimately guarantee optimal performance in constrained memory |
| environment while being extremely power conscious. |
| </p> |
| <p> |
| The moment a tool chain is addressing a non-uniform memory model, "one |
| size fits all" approach to optimizations like inlining stops being |
| optimal. For instance, based on section assignment, completely unknown |
| to the compiler, inlining takes place in areas that are facing different |
| cost/benefit tradeoffs. The content of L1 and L2 Icache should not be |
| "enlarged" even if performance can theoretically improve. Inlining |
| across such section boundaries are also ill-advisable, since control |
| flow exchange (jump) between sections destined to different levels of |
| memory hierarchy can produce unexpected performance |
| implications. Finally, tightly budgeted low-level and high-performance |
| memories might swell beyond their physical limits. |
| </p> |
| <p> |
| The current state of LLVM inline is somewhat transitional in |
| anticipation of structural updates to the pass manager, and as such it |
| still strongly relies on heuristic + PGO based inline cost |
| computation. In such situation the introduction of back-end hooks might |
| allow targets to fine-tune inlining decisions to some degree but they |
| still fall far short to the degree of control needed by the |
| above-described systems. Additional challenge is posed by high degree of |
| complexity to capture actual system run-time behavior, and even |
| collecting appropriate traces to generate meaningful PGO data. Battery |
| powered embedded chips rarely have sophisticated tracing capabilities, |
| yet present extremely complex run time environments. |
| </p> |
| </li> |
| </ul> |
| </p> |
| |
| <div class="www_sectiontitle">Call for Speakers</div> |
| |
| <p> |
| We invite speakers from academia and industry to present their work on the |
| following list of topics (including and not limited to:) |
| <ul> |
| <li>improving performance and size of code generated by LLVM,</li> |
| <li>improving performance of LLVM's runtime libraries,</li> |
| <li>tools developed with LLVM for performance analysis of compiler generated code,</li> |
| <li>bots and trackers of performance over time,</li> |
| <li>improving the security of generated code,</li> |
| <li>any other topic related to improving and maintaining the performance and quality of LLVM generated code.</li> |
| </ul> |
| While the primary focus of the workshop is on these topics, we welcome any |
| submission related to the LLVM compiler infrastructure, its sub-projects |
| (Clang, Linker, libraries), and its use in industry and academia. |
| </p> |
| |
| <p> |
| We are looking for: |
| </p> |
| <ul> |
| <li>keynote speakers,</li> |
| <li>technical presentations: 30 minutes plus questions and discussion,</li> |
| <li>tutorials,</li> |
| <li>BOFs.</li> |
| </ul> |
| |
| <p> |
| Proposals should provide enough information for the review committee to be |
| able to judge the quality of the submission. Proposals can be submitted under |
| the form of an extended abstract, full paper, or slides. Proposals should be |
| submitted to |
| <a href="https://easychair.org/conferences/?conf=llvmcgo2017">Easychair |
| LLVM-CGO 2017</a>. The deadline for receiving submissions is December 1st, |
| 2016. Speakers will be notified of acceptance or rejection by December 15. |
| </p> |
| |
| <p> |
| Workshop organization: Sebastian Pop, Aditya Kumar, Tobias Edler von Koch, and |
| Tanya Lattner. |
| </p> |
| |
| <!-- *********************************************************************** --> |
| <hr> |
| |
| <!--#include virtual="../../footer.incl" --> |