Huge pages

Huge Pages is a feature available in later Linux kernels that provides two important benefits:

1. Locks the memory available to huge pages, so it cannot be paged to disk
2. Make the TLB (translation lookaside buffer) much smaller on the processor, as the number of entries is much smaller. This is due to the fact the standard page size in Linux is 4K, whereas a huge page is either 2MB of 4MB in size. This makes managing virtual memory much more efficient, as the processor does not have to work as hard switching pages in and out.

The issue for databases such as Oracle, especially those with a large SGA, is that the sheer number of operations the processor is forced to perform can overwhelm the system. With an SGA such as 32GB, it will need to manage over eight million pages, whereas with huge pages, only 8,192 pages will be managed in the TLB. This is far less stress on the CPU.
This explains why utilities such as sar can’t post their output during periods of high load when huge pages aren’t used, since the system CPU utilization goes through the roof.
Below is a test case for the technology. See Appendix A for the source code used in the test.
We first allocate our pages in the OS.

1
sudo sysctl -w vm.nr_hugepages=50

Notice we first show we have 50 pages available to us.

1
2
[root@linux5 ~]# gcc -o hugealloc hugealloc.c
[root@linux5 ~]# ./hugealloc

Huge pages prior to starting program…

1
2
3
4
5
HugePages_Total:    50
HugePages_Free:     50
HugePages_Rsvd:      0
Hugepagesize:     4096 kB
PageTables:       4584 kB

We then issue a shmget() system call to allocate 100MB (25 pages * 4MB per page). Notice the pages are merely reserved, and still part of the “free” pool. Also notice that you cannot just add Free and Rsvd together, as this will total more than the Total. Whenever you find this condition, you can look at it from the standpoint that while you have reserved the pages, they still show up in the free pool until you actually write into them.  You can also look at it from the view that the total minus what is free is currently in use.  The reserved pages are then added to this number to get what is currently allocated (either in use or just reserved).  Any difference between this number and the total means your application is not requesting all the memory for which you have configured huge pages.

1
2
3
4
5
6
7
8
9
created huge pages shared memory segment (shmid 0xa000f in ipcs -m)
 
Showing huge pages memory usage...
 
HugePages_Total:    50
HugePages_Free:     50
HugePages_Rsvd:     25
Hugepagesize:     4096 kB
PageTables:       4584 kB

We then issue a shmat() call, which attaches the memory to our process. Notice the pages are still only reserved, and still part of the “free” pool. Linux does this on the outside chance a process will request a ton of memory, but not
actually use it. It will allocate it on demand, as shown in the next section.

1
2
3
4
5
6
7
8
9
Attached shared memory segment at address 0xb1800000...
 
Showing huge pages memory usage...
 
HugePages_Total:    50
HugePages_Free:     50
HugePages_Rsvd:     25
Hugepagesize:     4096 kB
PageTables:       4584 kB

We now write data into our memory pages, and see the pages no longer show as reserved, but are actually used (the free pool has been reduced)

1
2
3
4
5
6
7
8
9
Wrote data into shared memory segment...
 
Showing huge pages memory usage...
 
HugePages_Total:    50
HugePages_Free:     25
HugePages_Rsvd:      0
Hugepagesize:     4096 kB
PageTables:       4540 kB

We finally delete our shared memory segment, which shows the pages as all available, and 0 reserved.

1
2
3
4
5
6
7
8
9
10
Deleted shared memory segment...
 
Showing huge pages memory usage...
HugePages_Total:    50
HugePages_Free:     50
HugePages_Rsvd:      0
Hugepagesize:     4096 kB
PageTables:       4540 kB
 
[root@linux5 ~]#

It should be noted that software can still function without huge pages. When an strace is run against an Oracle instance at startup, you will see both the shmget() and shmat() calls, each with the SHM_HUGETLB flag passed. If the shmget() fails, another shmget() call is issued without the SHM_HUGETLB flag. This allows the instance to start, although it may fail under load for the reasons noted earlier.

Our source code used below shows an example of this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/mman.h>
 
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
 
#define LENGTH (100UL*1024*1024)
 
#define ADDR (void *)(0x0UL)
#define SHMAT_FLAGS (0)
 
int main(void)
{
        int shmid;
        unsigned long i;
        char *shmaddr;
 
        printf("nnHuge pages prior to starting program...nn");
        system("/bin/grep -i huge /proc/meminfo");
        system("/bin/grep -i pagetable /proc/meminfo");
        printf("n");
 
        if ((shmid = shmget(IPC_PRIVATE, LENGTH, SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) &lt; 0) {
          if ((shmid = shmget(IPC_PRIVATE, LENGTH, IPC_CREAT | SHM_R | SHM_W)) &lt; 0) {
            perror("shmget");
            exit(1);
          }
          else {
            printf("created normal shared memory segment, as we couldn't use huge pages.n");
            printf("shmid: 0x%xn", shmid);
          }
        }
        else {
          printf("created huge pages shared memory segment (shmid 0x%x in ipcs -m)n", shmid);
        }
 
        printf("nnShowing huge pages memory usage...nn");
        system("/bin/grep -i huge /proc/meminfo");
        system("/bin/grep -i pagetable /proc/meminfo");
        printf("n");
 
        shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
        if (shmaddr == (char *)-1) {
                perror("Shared memory attach failure");
                shmctl(shmid, IPC_RMID, NULL);
                exit(2);
        }
 
        printf("nnAttached shared memory segment at address %p...n",shmaddr);
        printf("nShowing huge pages memory usage...nn");
        system("/bin/grep -i huge /proc/meminfo");
        system("/bin/grep -i pagetable /proc/meminfo");
        printf("n");
 
        for (i = 0; i &lt; LENGTH; i++) {
          shmaddr[i] = (char)(i);
          /*
          if (!(i % (1024 * 1024)))
            printf(".");
          */
        }
        printf("n");
 
        /*
        printf("Starting the Check...");
        for (i = 0; i &lt; LENGTH; i++)
                if (shmaddr[i] != (char)i)
                        printf("nIndex %lu mismatchedn", i);
        printf("Done.n");
        */
 
        printf("nnWrote data into shared memory segment...n");
        printf("nShowing huge pages memory usage...nn");
        system("/bin/grep -i huge /proc/meminfo");
        system("/bin/grep -i pagetable /proc/meminfo");
        printf("n");
        if (shmdt((const void *)shmaddr) != 0) {
                perror("Detach failure");
                shmctl(shmid, IPC_RMID, NULL);
                exit(3);
        }
 
        shmctl(shmid, IPC_RMID, NULL);
 
        printf("nnDeleted shared memory segment...n");
        printf("nShowing huge pages memory usage...n");
        system("/bin/grep -i huge /proc/meminfo");
        system("/bin/grep -i pagetable /proc/meminfo");
        printf("n");
 
        return 0;
}