No announcement yet.

Let's talk about: HSA·101 - INTRODUCTION

  • Filter
  • Time
  • Show
Clear All
new posts

  • Let's talk about: HSA·101 - INTRODUCTION

    Originaly posted by my partner f5inet in

    (Sorry for Bing Translator from spanish languaje)

    That means HSA:

    That is HSA:
    HSA is a new ISA (Instruction Set Architecture) or computing architecture, whose main function is to integrate CPU and GPU on the same bus, with lists and pools unified and shared memory.

    The main objective of the HSA platform is to reduce communication latency between the CPU, GPU and other devices compatible computer HSA, and make all these devices, so different from each other, they are seen the same way by the developer, relieving the latter task of planning the movement of data between devices with different pools of memory, how it should currently be done with OpenCL or CUDA.

    HSA does not replace OpenCL or CUDA Moreover, since HSA is nothing more than an architecture on which OpenCL and CUDA can be executed (with a corresponding chain of interpreters and compilers JIT), these languages ​​can rely on HSA to increase performance , not having to worry about moving memory buffers between different pools.

    Some 'memory':
    originally introduced by the Cell Broadband Engine, the ability to share memory directly by several computer units other than the main CPU architecture became popular.
    Heterogeneous computing refers to systems containing multiple computational units (CPUs As, GPUs, DSPs, ASICs or different) and how these units are unified memory in a pool where all accessible simultaneously.

    However, since each computer unit (CU - Compute Unit) needs, for performance reasons, recent memory where operating at full speed, and various computational units (from now, CUs) may have specific needs regarding speed, bandwidth or latency, HSA architecture achieves this goal by offering a UNIFIED VIRTUAL ADDRESS SPACE.

    CPUs already had this address space unified, in fact, day today, a CPU can access all installed on your computer wherever you are, for example, access to GPU memory through the memory bus PCI-Express.

    The MMUs, the major players:
    This was achieved through an additional element, the memory management unit (from now on, MMU - Memory Management Unit). The MMU (for the last time, memory management unit), offered the CPU a "virtually unlimited", or good space, limited by the maximum space addressable CPU (32bits = 4GB, 64bit = 16EB), 'mapping' within this virtual space, other existing additional memory in the system.

    For example, on a system with 2GB of RAM and 1GB of VRAM (video RAM), the MMU 'mapeaba' GB of VRAM just below the Main RAM. When the CPU referred to the byte that is in the '2.5GB' position, the MMU knows he has to make that request, via PCI-Express, on the GPU.

    However, that need not be so, and indeed, it is not. The MMU is programmable, and the operating system can 'model' team reports in the setting you want. In the case of 32-bit Windows (from XP onwards), the GB of VRAM, is located (more correctly, 'is mapped') between 3GB and 4GB position, leaving the first 2 GB for the main RAM, and having a gigas gap between 2 and 3. it is what is known as 'virtual memory settings'.

    For that, the MMU has some tables called 'Table Entries Pages', or in its original English, PTE (Page Table Entry). These tables are responsible for 'match' the logical (or virtual) addresses to physical (or real) location information.

    What if an application tries to access the 'gap' between 2 and 3GB? MMU will not know the physical address corresponding to the logical address, and throw an exception to the CPU, which in turn, the operating system involved. The operating system will record the hard disk blocks of 4KB that are in the RAM and long unused ago, tell the MMU those blocks of 4KB are free, and that the direction "logic" (or virtual) that it was accessing 'what you map' in those blocks that are now free. The MMU now updates its PTE, specifying the new 'virtual' configuration

    What if they are needed again 4KB blocks that are now hard disk? The MMU will return to throw an exception, because this 'logic / virtual' location is not in RAM, the operating system will try that exception, you will see which 'pages / blocks' 4KB longer unused ago, I pulled disk hard, pages will read the old hard disk to RAM, and communicate to the MMU work performed and update its PTE to reflect the new configuration of RAM

    This happens millions and billions of times in the normal process of an operating system, is what is known as 'virtual memory' or 'virtual address', in which all memory accesses are supervised by an MMU is the that 'home / matches' the logical address to which a program wants to access the physical address at which the data is located. In fact, programs today, never see physical addresses, only virtual / logical, but from the point of view of the programmer that does not matter. In fact, only the operating system and the MMU see physical addresses. No one else sees.

    As you can imagine, this process of virtual memory, with all scriptures, readings and reallocations and from disc, ends the totally disordered 'physical' memory in a real mess. It's something that does not matter because the RAM is random access, and has the same speed access as accessed (or at least theoretically, actually, if accessed in blocks of 4KB access is faster because you need only a resolution of the MMU and its PTE).

    It is important that you understand the whole issue of the MMUs and virtual addressing, because HSA is SO MUCH based on it.

    No MMU, no party:
    However, the GPU is unable to access the main memory of the CPU because it does not understand all the rigmarole of physical and logical addresses. GPU VRAM has always been its own and is a giant pit where do their thing without problems of any kind. In this regard, GPUs have behaved like a fool accelerator, which you told 'do this here' or 'do that there' and just did.

    The GPU is simply not able to access the main RAM because it directly does not understand also what at any time, the operating system, following a request from the MMU of the CPU, could write randomly anywhere in the RAM put new data, causing many collisions with the accesses of the GPU.

    No, just, all access to RAM must be managed by the CPU with MMU and its supported operating system.

    This worked pool while the GPU was just a slave accelerator CPU, but with the current ability to run code on the GPU which offers DX10 and unified shaders as pool as CUDA, OpenCL and DirectCompute, GPU is leaving little to some being a "slave accelerator 'to become a complete unit of computation.

    HSA to the rescue:
    And that just speaks HSA. of an architecture for a heterogeneous system, where any computer unit (CU) can access RAM anywhere, and where a single (or more) MMUs have a global view of how memory is configured

    Delve deeply into these and other issues in the future, at the moment, I do have a lot to read and to understand.

    Original spanish version at
    Last edited by BiG Porras; 03-28-2016, 10:16 AM.

  • #2
    Nice write up.


    • BiG Porras
      Editing a comment
      Thanks dude ^_^

  • #3
    Great explanation!