cpumemory Ulrich Drepper What Every Programmer Should Know About Memory

[ Pobierz całość w formacie PDF ]

As noted above, the tag used to access the TLB is a part
Flushing the TLB is effective but expensive. When exe-
of the virtual address. If the tag has a match in the cache,
cuting a system call, for instance, the kernel code might
the final physical address is computed by adding the page
be restricted to a few thousand instructions which touch,
offset from the virtual address to the cached value. This
perhaps, a handful of new pages (or one huge page, as
is a very fast process; it has to be since the physical ad-
is the case for Linux on some architectures). This work
dress must be available for every instruction using abso-
would replace only as many TLB entries as pages are
lute addresses and, in some cases, for L2 look-ups which
touched. For Intel s Core2 architecture with its 128 ITLB
use the physical address as the key. If the TLB lookup
and 256 DTLB entries, a full flush would mean that more
misses the processor has to perform a page table walk;
than 100 and 200 entries (respectively) would be flushed
this can be quite costly.
unnecessarily. When the system call returns to the same
process, all those flushed TLB entries can be used again,
Prefetching code or data through software or hardware
but they will be gone. The same is true for often-used
could implicitly prefetch entries for the TLB if the ad-
code in the kernel or VMM. On each entry into the ker-
dress is on another page. This cannot be allowed for
nel the TLB has to be filled from scratch even though
hardware prefetching because the hardware could initiate
the page tables for the kernel and VMM usually do not
page table walks that are invalid. Programmers therefore
change and, therefore, TLB entries could, in theory, be
cannot rely on hardware prefetching to prefetch TLB en-
preserved for a very long time. This also explains why
tries. It has to be done explicitly using prefetch instruc-
the TLB caches in today s processors are not bigger: pro-
tions. TLBs, just like data and instruction caches, can
grams most likely will not run long enough to fill all these
appear in multiple levels. Just as for the data cache, the
entries.
TLB usually appears in two flavors: an instruction TLB
(ITLB) and a data TLB (DTLB). Higher-level TLBs such
This fact, of course, did not escape the CPU architects.
as the L2TLB are usually unified, as is the case with the
One possibility to optimize the cache flushes is to indi-
other caches.
vidually invalidate TLB entries. For instance, if the ker-
nel code and data falls into a specific address range, only
the pages falling into this address range have to evicted
4.3.1 Caveats Of Using A TLB
from the TLB. This only requires comparing tags and,
therefore, is not very expensive. This method is also use-
The TLB is a processor-core global resource. All threads
ful in case a part of the address space is changed, for
and processes executed on the processor core use the
instance, through a call tomunmap.
same TLB. Since the translation of virtual to physical ad-
dresses depends on which page table tree is installed, the A much better solution is to extend the tag used for the
CPU cannot blindly reuse the cached entries if the page TLB access. If, in addition to the part of the virtual ad-
table is changed. Each process has a different page ta- dress, a unique identifier for each page table tree (i.e., a
ble tree (but not the threads in the same process) as does process s address space) is added, the TLB does not have
the kernel and the VMM (hypervisor) if present. It is to be completely flushed at all. The kernel, VMM, and
Ulrich Drepper Version 1.0 39
the individual processes all can have unique identifiers. The use of large page sizes brings some problems with
The only issue with this scheme is that the number of it, though. The memory regions used for the large pages
bits available for the TLB tag is severely limited, while must be contiguous in physical memory. If the unit size
the number of address spaces is not. This means some for the administration of physical memory is raised to the
identifier reuse is necessary. When this happens the TLB size of the virtual memory pages, the amount of wasted
has to be partially flushed (if this is possible). All en- memory will grow. All kinds of memory operations (like
tries with the reused identifier must be flushed but this is, loading executables) require alignment to page bound-
hopefully, a much smaller set. aries. This means, on average, that each mapping wastes
half the page size in physical memory for each mapping.
This extended TLB tagging is of advantage outside the
This waste can easily add up; it thus puts an upper limit
realm of virtualization when multiple processes are run-
on the reasonable unit size for physical memory alloca-
ning on the system. If the memory use (and hence TLB
tion.
entry use) of each of the runnable processes is limited,
there is a good chance the most recently used TLB entries It is certainly not practical to increase the unit size to
for a process are still in the TLB when it gets scheduled 2MB to accommodate large pages on x86-64. This is
again. But there are two additional advantages: just too large a size. But this in turn means that each
large page has to be comprised of many smaller pages.
And these small pages have to be contiguous in physical
1. Special address spaces, such as those used by the
memory. Allocating 2MB of contiguous physical mem-
kernel and VMM, are often only entered for a short
ory with a unit page size of 4kB can be challenging. It
time; afterward control is often returned to the ad-
requires finding a free area with 512 contiguous pages.
dress space which initiated the entry. Without tags,
This can be extremely difficult (or impossible) after the
one or two TLB flushes are performed. With tags
system runs for a while and physical memory becomes
the calling address space s cached translations are
fragmented.
preserved and, since the kernel and VMM address
space do not often change TLB entries at all, the
On Linux it is therefore necessary to allocate these big
translations from previous system calls, etc. can
pages at system start time using the specialhugetlbfs
still be used.
filesystem. A fixed number of physical pages are re-
served for exclusive use as big virtual pages. This ties
2. When switching between two threads of the same
process no TLB flush is necessary at all. With- down resources which might not always be used. It also
is a limited pool; increasing it normally means restart-
out extended TLB tags the entry into the kernel [ Pobierz całość w formacie PDF ]

Search

Odnośniki