What is NUMA?
Short for Non-Uniform Memory Access, a type of parallel processing architecture in which each processor has its own local memory but can also access memory owned by other processors. It's called non-uniform because the memory access times are faster when a processor accesses its own memory than when it borrows memory from another processor. NUMA computers offer the scalability of MPP and the programming ease of SMP.
To understand this concept, let’s understand parallel memory
architecture.
Parallel memory architecture is of two type:
- Shared memory
- Distributed Memory
Shared memory: All
CPU shared a same memory and treated it as global address space (Refer below
diagram). Cache coherence is the main issue in this kind of architecture. Normally
this architecture is used for general purpose CPUs, example Laptop and Desktop.
Note: Cache coherence is the consistency of
shared resource data that ends up stored in multiple local caches. When clients
in a system maintain caches of a common memory resource, problems may arise
with inconsistent data, which is particularly the case with CPUs in a
multiprocessing system.
Distributed Memory:
In this architecture each CPU has its own local memory and all CPUs are
connected over network. As each CPU has its own local memory so in this scenario
there is no concept of cache coherent or Global Address space. When CPU wants
to access memory associated with other CPU, it communicate explicitly and this way
of communication takes lots of memory cycle. This kind of architecture used in
cluster, where cluster nodes are connected to each other on network
Two types of Shared
memory:
- UMA(Uniform Memory Access)
- NUMA (Non Uniform Memory Access
Below is the diagram of SMP system, in this diagram we can see memory is shared by CPUs and these CPUs are connected over a single bus. This kind of architecture faces bus contention problem after 8 to 12 number of CPUs as those are sharing the same bus
Why NUMA is needed?
Increasing the number of clock
speed and it’s very difficult to reduce memory latency.
These two issues are
addressed by NUMA architecture. Let’s understand the architecture of NUMA.
In this architecture set
of CPUs and memory are sharing the same memory and I/O. In this, set of CPUs,
Memory and I/O are called NUMA node. NUMA node connected to each other over
scalable network. In this scenario CPU
can access memory associated with other CPU in coherent way. However, it takes long
way if CPU accessing memory of other node, off course it will be faster if node
access its own local memory. This is the reason why it is called NUMA (Non Uniform Memory Access)
Terminology Used in NUMA:
- Local memory
- Foreign memory
- NUMA ratio (If NUMA ratio is 1, then the system is SMP)
If the system is running
its thread on node A CPU, then memory associated with node A CPU is local
memory. If CPU of node A accessing the memory of Node B. then it’s called remote
or foreign memory. NUMA ratio is the ratio of cost accessing the foreign memory
to the cost accessing local memory. Greater the ratio greater is the cost
accessing foreign memory.
Hope you understand the
concept, If NUMA ratio is 1, then that system is called SMP.
Note:-NUMA architecture uses MESIF cache coherent protocol for the
cache coherence.
Software optimization to improve performance on NUMA aware systems
Two measures to be
considered to improve the performance of the Systems supporting NUMA
architecture
- Processor affinity
- Data placements
In processor affinity,
in multi threading systems, system assign a resource to a thread and thread switches
between cores to ensure timely execution. However in case of NUMA, switching of
thread from Node A to B takes longer to access the memory. Example, a thread
has started on Node A and later switches to Node B, in this case memory on Node
A will became a foreign to a thread which has switched to Node B, when memory
becomes foreign then it takes longer time to access memory. So system is responsible
to ensure the thread should run within a NUMA node
Data placements, same
is the case with data placement. If system is capable of keeping the data local
as long as possible then it increases the throughput of the system.
Supported NUMA architecture
OS/Database:
- Microsoft Windows 7, Windows VISTA etc
- Oracle 8i, Oracle 10g, SQL Server 2008 etc
No comments:
Post a Comment