Dynamic Memory Allocation in Computer Systems Programming

ece 454 n.w
1 / 72
Embed
Share

Explore the concepts of dynamic memory allocation, memory management mechanisms, and the importance of efficient memory allocation strategies in computer systems programming. Learn about simple allocators, data structures, policies, and advanced allocation techniques through practical examples and insights into system memory handling.

  • Dynamic Memory Allocation
  • Computer Systems
  • Memory Management
  • Programming
  • University

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 454 Computer Systems Programming Dynamic Memory (Part I: Simple Allocator) Ding Yuan ECE Dept., University of Toronto http://www.eecg.toronto.edu/~yuan

  2. Contents Intro. to dynamic memory management Simple allocators Data structures Mechanisms Policies Advanced allocators Doubly-linked free lists Segregated free lists Garbage collection 2025-03-22

  3. Why Dynamic Memory Allocation? Even today DRAM (main memory) is precious Want programs to request more memory when needed and give it back when no longer needed, to be re-used! Why learn about dynamic memory allocation? Very representative of allocation/deallocation methods Common/crucial aspect of OSs, databases, webservers, games, Programmer guru: don t use malloc, manage dynamic memory yourself! Think you know pointers? Oh, you ll learn pointers Gain a full understanding of systems under-the-hood Dynamic memory allocation is challenging/interesting As we will learn, good mem-alloc algs are quite involved A good mem-alloc library is essential: free the programmer

  4. Dynamic Memory Allocators Provide an abstraction of memory as a set of blocks Doles out free memory blocks to application Explicit: application allocates and frees space E.g., malloc and free in C, new and delete in C++ Implicit: application allocates, but does not free space E.g. garbage collection in Java, ML or Lisp

  5. Typical Process Memory Image memory invisible to user code kernel virtual memory stack %esp Memory mapped region for shared libraries Allocators request additional heap memory from the operating system using the sbrk function. the brk ptr run-time heap (via malloc) uninitialized data (.bss) initialized data (.data) program text (.text) 0

  6. Background: alignment 2025-03-22

  7. What is alignment? Demo Address must be multiple of K K typically is the multiple of WORD size 32-bit system: word is 4 bytes, malloc has 8 bytes alignment 64-bit system: word is 8 bytes, malloc has 16 bytes alignment 2025-03-22

  8. Why alignment? Let s assume there is no alignment requirement i.e., a data structure can have any address, not necessarily multiple of 8 or 16 E.g., have an integer (value 0) at address 0x923d3f How many cache blocks we need to read for this int? why? Assume each cache block can hold 64 bytes addr.(binary, last 8 bits) content addr.(hex) 0x923d3c 0011 1100 XXXX XXXX 0011 1101 0x923d3d XXXX XXXX 0011 1110 0x923d3e XXXX XXXX 0x923d3f 0x923d40 0x923d41 0x923d42 0x923d43 0011 1111 0100 0000 0100 0001 0100 0010 0100 0011 0000 0000 0000 0000 an integer var. 0000 0000 0000 0000 XXXX XXXX

  9. Why alignment? (cont.) 2 cache blocks! Under the cache design of modern processors, 64B blocks means the starting address of a block has 0 in lower 4 bits For better cache performance: do not cross the boundary addr.(binary, last 8 bits) content addr.(hex) 0x923d3c 0011 1100 XXXX XXXX 0011 1101 0x923d3d XXXX XXXX 0011 1110 0x923d3e XXXX XXXX 0x923d3f 0x923d40 0x923d41 0x923d42 0x923d43 0011 1111 0100 0000 0100 0001 0100 0010 0100 0011 0000 0000 0000 0000 an integer var. 0000 0000 0000 0000 XXXX XXXX

  10. Why alignment? (cont.) Similar to cache that reads at 64B size, CPU always reads at its WORD size on 32-bit machines, read 4 bytes from an addr. with lower 2 bits to be 0 (00, 01, 10, 11) If not aligned at WORD size, reading a simple data structure (e.g., short, int, pointer, etc.) can take two CPU reads addr.(binary, last 8 bits) content addr.(hex) 0x923d3c 0011 1100 XXXX XXXX 0011 1101 0x923d3d XXXX XXXX 0011 1110 0x923d3e XXXX XXXX 0x923d3f 0x923d40 0x923d41 0x923d42 0x923d43 0011 1111 0100 0000 0100 0001 0100 0010 0100 0011 0000 0000 0000 0000 an integer var. 0000 0000 0000 0000 XXXX XXXX

  11. How to align? Make sure data structures have aligned addresses Compiler Inserts gaps in structure to ensure correct alignment of fields Library and OS (e.g., malloc) Return aligned addresses only 2025-03-22

  12. Specific Cases of Alignment By Data Type: 1 byte (e.g., char) no restrictions on address 2 bytes (e.g., short) lowest 1 bit of address is 0(2) i.e., 2-byte aligned 4 bytes (e.g., int, float, char *, etc.) lowest 2 bits of address are 00(2) i.e., 4-byte aligned 8 bytes (e.g., double) lowest 3 bits of address are 000(2) i.e., 8-byte aligned

  13. Satisfying Alignment with Structures Offsets of elements within structure Must satisfy element s alignment requirement Overall Structure Placement Each structure has alignment requirement K Largest alignment of any element Initial address & structure length must be multiples of K!!!!!

  14. Example1 struct S1 { char c; int i[2]; } *p; 1B 4B x 2 K=4 9B total c i[0] i[1] p+0 p+4 p+8 p+12 Multiple of 4 Multiple of 4 Multiple of 4 12B total considering alignment

  15. Example2 struct S1 { char c; int i[2]; double v; } *p; 1B 4B x 2 8B 17B total K=8 c i[0] p+4 i[1] p+8 v 3 bytes 4 bytes p+0 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8 24B total considering alignment

  16. Array of Structures struct S3 { short i; int v; short j; } a[10]; Principle Allocated by repeating allocation for element type a[1].i a[1].v a[1].j a+12 a+16 a+20 a+24 a[0] a[1] a[2] a+0 a+12 a+24 a+36

  17. Saving Space Does the order of elements matter? struct S1 { int x; double v; int y; } *p; struct S2 { double v; int x; int y; } *p; x v y 4 bytes 4 bytes p+20 p+24 p+0 p+4 p+8 p+16 v x y p+16 p+12 p+0 p+8

  18. Basic Dynamic Memory Allocation

  19. Malloc Package #include <stdlib.h> void *malloc(size_t size) If successful: Returns a pointer to a memory block of at least size bytes (typically) aligned to 8-byte boundary on 32 bits machine, 16-byte on 64 bits machine. If size == 0, returns NULL If unsuccessful: returns NULL (0) and sets errno. Note: a well-written program will check for unsuccessful mallocs! void free(void *p) Returns the block pointed at by p to pool of available memory p must come from a previous call to malloc or realloc. void *realloc(void *p, size_t size) Changes size of block p and returns pointer to new block. Contents of new block unchanged up to min of old and new size.

  20. Malloc Example void foo(int n, int m) { int i, *p; /* allocate a block of n ints */ if ((p = (int *) malloc(n * sizeof(int))) == NULL) { perror("malloc"); exit(0); } for (i=0; i<n; i++) p[i] = i; /* add m bytes to end of p block */ if ((p = (int *) realloc(p, (n+m) * sizeof(int))) == NULL) { perror("realloc"); exit(0); } for (i=n; i < n+m; i++) p[i] = i; /* print new array */ for (i=0; i<n+m; i++) printf("%d\n", p[i]); free(p); /* return p to available memory pool */ }

  21. Assumptions Assumptions made in this lecture Memory is word addressable (each word can hold a pointer) Malloc returns word-aligned addresses (unless specified otherwise) In practice GNU malloc returns double-word aligned addr. Free word Allocated block (4 words) Free block (3 words) Allocated word

  22. Allocation Examples p1 = malloc(4 * sizeof (void *)) p2 = malloc(5 * sizeof (void *)) p3 = malloc(6 * sizeof (void *)) free(p2) p4 = malloc(2 * sizeof (void *))

  23. Constraints Applications: Can issue arbitrary sequence of allocation and free requests Free requests must correspond to an allocated block Allocators Must respond immediately to all allocation requests i.e., can t reorder or buffer requests Must allocate blocks from free memory Must align blocks so they satisfy all alignment requirements Can only manipulate and modify free memory Can t move the allocated blocks once they are allocated i.e., compaction is not allowed

  24. Goals of Good malloc/free Primary goals Good time-performance for malloc and free Ideally should take constant time (not always possible) Should certainly not take linear time in the number of blocks Good space utilization User allocated structures should be large fraction of the heap Want to minimize fragmentation One extreme example malloc (N): find the next available N free blocks free: do nothing Extremely GOOD time performance, BAD space utilization

  25. Performance Goals: Throughput Given some sequence of malloc and free requests: R0, R1, ..., Rk, ... , Rn-1 Want to maximize throughput and peak memory utilization. These goals are often conflicting Throughput: Number of completed requests per unit time Example: 5,000 malloc calls and 5,000 free calls in 10 seconds Throughput is 1,000 operations/second.

  26. Performance Goals: Peak Memory Utilization Given some sequence of malloc and free requests: R0, R1, ..., Rk, ... , Rn-1 Def: Aggregate payload Pk: malloc(p) results in a block with a payload of p bytes.. After request Rk has completed, the aggregate payloadPk is the sum of currently allocated payloads a free request will decrease the aggregate payload Def: Current heap size is denoted by Hk Assume that Hk is monotonically nondecreasing i.e., Hk only gets bigger (brk pointer can only get bigger) Def: Peak memory utilization: After k requests, peak memory utilization is: Uk = ( maxi<k Pi ) / Hk

  27. Internal Fragmentation Poor memory utilization caused by fragmentation. Comes in two forms: internal and external fragmentation Internal fragmentation (i.e., padding) padding for alignment purposes explicit policy decisions (e.g., not to free the block). Depends only on the pattern of previous requests, and thus is easy to measure. Note: in-use header space is not counted as I.F.

  28. External Fragmentation Occurs when there is enough aggregate heap memory, but no single free block is large enough p1 = malloc(4 * sizeof (void *)) p2 = malloc(5 * sizeof (void *)) p3 = malloc(6 * sizeof (void *)) free(p2) p4 = malloc(6 * sizeof (void *)) oops! External fragmentation depends on the pattern of future requests, and thus is difficult to measure. Note: ever defragmented your harddrive?

  29. Implementation Issues How do we know how much memory to free just given a pointer? How do we keep track of the free blocks? How do we pick a block to use for allocation -- many might fit? How do we reinsert freed block? p0 free(p0) p1 = malloc(1)

  30. Knowing How Much to Free Standard method Keep the length of a block in the word preceding the block. This word is often called the header field or header Requires an extra word for every allocated block p0 p0 = malloc(4) 5 Block size data free(p0)

  31. Keeping Track of Free Blocks Method 1: Implicit list using lengths -- links all blocks 4 6 2 5 Method 2: Explicit list among the free blocks using pointers within the free blocks 5 4 6 2 Method 3: Segregated free list Different free lists for different size classes

  32. Method 1: Implicit List Need to identify whether each block is free or allocated Can use extra bit Bit can be put in the same word as the size if block sizes are always multiples of two (mask out low order bit when reading size). Masking: size = sizeword & -2; // -2 == 0b1111 1110 1 word a = 1: allocated block a = 0: free block size a Format of allocated and free blocks size: block size payload payload: application data (allocated blocks only) optional padding

  33. Implicit List: Finding a Free Block First fit: Search list from beginning, choose first free block that fits Can take linear time in total number of blocks (allocated and free) In practice it can cause splinters at beginning of list Next fit: Like first-fit, but search list from location of end of previous search Research suggests that fragmentation is worse Best fit: Search the list, choose the free block with the closest size that fits Keeps fragments small --- usually helps fragmentation Will typically run slower than first-fit

  34. Implicit List: Allocating in Free Block Allocating within a free block - splitting Since allocated space might be smaller than free space, we might want to split the block 4 4 8 Split malloc(4 * sizeof (void *)) 4 4 5 3 p

  35. Implicit List: Freeing a Block Simplest implementation: Only need to clear allocated flag But can lead to false fragmentation 4 4 4 2 2 p free(p) 4 4 4 2 2 malloc(5 * sizeof (void *)) Oops! There is enough free space, but the allocator won t be able to find it

  36. Implicit List: Coalescing Join (coelesce) with next and/or previous block if they are free Coalescing with next block 4 4 4 2 2 p free(p) 4 4 6 2 But how do we coalesce with previous block?

  37. Implicit List: Bidirectional Coalescing Boundary tags [Knuth73] Replicate size/allocated word at bottom of free blocks Allows us to traverse the list backwards, but requires extra space Important and general technique! 1 word Header size a a = 1: allocated block a = 0: free block Format of allocated and free blocks payload and padding size: total block size payload: application data (allocated blocks only) Boundary tag (footer) size a 4 4 4 4 6 6 4 4

  38. Constant Time Coalescing Case 1 Case 2 Case 3 Case 4 allocated allocated free free block being freed allocated free allocated free

  39. Constant Time Coalescing (Case 1) m1 1 m1 n 1 1 n m2 1 1 m2 1

  40. Constant Time Coalescing (Case 1) m1 1 m1 1 m1 n 1 1 m1 n 1 0 Check if allocated n m2 1 1 n m2 0 1 m2 1 m2 1

  41. Constant Time Coalescing (Case 2) m1 1 m1 n 1 1 n m2 1 0 m2 0

  42. Constant Time Coalescing (Case 2) m1 1 m1 1 m1 n 1 1 m1 n+m2 1 0 Check if allocated n m2 1 0 m2 0 n+m2 0

  43. Constant Time Coalescing (Case 3) m1 0 m1 n 0 1 n m2 1 1 m2 1

  44. Constant Time Coalescing (Case 3) m1 0 n+m1 0 m1 n 0 1 Check if allocated n m2 1 1 n+m1 m2 0 1 m2 1 m2 1

  45. Constant Time Coalescing (Case 4) m1 0 m1 n 0 1 n m2 1 0 m2 0

  46. Constant Time Coalescing (Case 4) m1 0 n+m1+m2 0 m1 n 0 1 Check if allocated n m2 1 0 m2 0 n+m1+m2 0

  47. Summary of Key Allocator Policies Placement policy: First fit, next fit, best fit, etc. Coalescing policy: Immediate coalescing: coalesce adjacent blocks each time free is called Deferred coalescing: try to improve performance of free by deferring coalescing until needed. e.g., Coalesce as you scan the free list for malloc. Coalesce when the amount of external fragmentation reaches some threshold.

  48. Implicit Lists: Summary Implementation: very simple Allocate: linear time worst case Free: constant time worst case -- even with coalescing Memory usage: will depend on placement policy First fit, next fit or best fit Not used in practice for malloc/free because of linear time allocate. However, the concepts of splitting and boundary tag coalescing are general to all allocators.

  49. Keeping Track of Free Blocks Method 1: Implicit list using lengths -- links all blocks 4 6 2 5 Method 2: Explicit list among the free blocks using pointers within the free blocks 4 6 2 5 Method 3: Segregated free list Different free lists for different size classes

  50. Explicit Free Lists A B C Use data space for link pointers Free anyway Typically doubly linked Still need boundary tags for coalescing Successor links B A 4 4 4 4 6 6 4 4 4 4 C Predecessor links links can point anywhere, not necessarily to adjacent block

More Related Content