Hardware/Software Interface Insights

c omputer o rganization and d esign the hardware n.w
1 / 25
Embed
Share

Gain valuable insights into computer organization and design, exploring the impact of language and algorithms on performance indicators, compiler optimizations, and the use of arrays versus pointers. Delve into examples showcasing the importance of efficient coding practices for optimal program execution.

  • Computer organization
  • Hardware design
  • Software interface
  • Algorithms
  • Programming

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. COMPUTER ORGANIZATIONAND DESIGN The Hardware/Software Interface Chapter 2 Chapter 2 Instructions: Language of the Computer

  2. Effect of Language and Algorithm 2

  3. Effect of Language and Algorithm Bubblesort Relative Performance 3 2.5 2 1.5 1 0.5 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT Quicksort Relative Performance 2.5 2 1.5 1 0.5 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT Quicksort vs. Bubblesort Speedup 3000 2500 2000 1500 1000 500 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT 3

  4. Lessons Learnt Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly faster than JVM interpreted Comparable to optimized C in some cases Nothing can fix a dumb algorithm! 4

  5. Arrays vs. Pointers Array indexing involves Multiplying index by element size Adding to array base address Pointers correspond directly to memory addresses Can avoid indexing complexity 5

  6. Example: Clearing an Array clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; } MOV X9,XZR // i = 0 loop1: LSL X10,X9,#3 // X10 = i * 8 ADD X11,X0,X10 // X11 = address // of array[i] STUR XZR,[X11,#0] // array[i] = 0 ADDI X9,X9,#1 // i = i + 1 CMP X9,X1 // compare i to // size B.LT loop1 // if (i < size) // go to loop1 MOV X9,X0 // p = address of // array[0] LSL X10,X1,#3 // X10 = size * 8 ADD X11,X0,X10 // X11 = address // of array[size] loop2: STUR XZR,0[X9,#0] // Memory[p] = 0 ADDI X9,X9,#8 // p = p + 8 CMP X9,X11 // compare p to < // &array[size] B.LT loop2 // if (p < // &array[size]) // go to loop2 6

  7. Comparison of Array vs. Ptr Multiply strength reduced to shift (compiler optimization) Array version requires shift to be inside loop Part of index calculation for incremented i c.f. incrementing pointer Compiler can achieve same effect as manual use of pointers Induction variable elimination (eliminating array address calculations within loops) Better to make program clearer and safer 7

  8. ARM & MIPS Similarities ARM: the most popular embedded core Similar basic set of instructions to MIPS ARM 1985 32 bits 32-bit flat Aligned 9 15 32-bit Memory mapped MIPS 1985 32 bits 32-bit flat Aligned 3 31 32-bit Memory mapped Date announced Instruction size Address space Data alignment Data addressing modes Registers Input/output 8

  9. Instruction Encoding 9

  10. The Intel x86 ISA Evolution with backward compatibility 8080 (1974): 8-bit microprocessor Accumulator, plus 3 index-register pairs 8086 (1978): 16-bit extension to 8080 Complex instruction set (CISC) 8087 (1980): floating-point coprocessor Adds FP instructions and register stack 80286 (1982): 24-bit addresses, MMU Segmented memory mapping and protection 80386 (1985): 32-bit extension (now IA-32) Additional addressing modes and operations Paged memory mapping as well as segments 10

  11. The Intel x86 ISA Further evolution i486 (1989): pipelined, on-chip caches and FPU Compatible competitors: AMD, Cyrix, Pentium (1993): superscalar, 64-bit datapath Later versions added MMX (Multi-Media eXtension) instructions The infamous FDIV bug Pentium Pro (1995), Pentium II (1997) New microarchitecture (see Colwell, The Pentium Chronicles) Pentium III (1999) Added SSE (Streaming SIMD Extensions) and associated registers Pentium 4 (2001) New microarchitecture Added SSE2 instructions 11

  12. The Intel x86 ISA And further AMD64 (2003): extended architecture to 64 bits EM64T Extended Memory 64 Technology (2004) AMD64 adopted by Intel (with refinements) Added SSE3 instructions Intel Core (2006) Added SSE4 instructions, virtual machine support AMD64 (announced 2007): SSE5 instructions Intel declined to follow, instead Advanced Vector Extension (announced 2008) Longer SSE registers, more instructions If Intel didn t extend with compatibility, its competitors would! Technical elegance market success 12

  13. Basic x86 Registers 13

  14. Basic x86 Addressing Modes Two operands per instruction Source/dest operand Register Register Register Memory Memory Second source operand Register Immediate Memory Register Immediate Memory addressing modes Address in register Address = Rbase + displacement Address = Rbase + 2scale Rindex (scale = 0, 1, 2, or 3) Address = Rbase + 2scale Rindex + displacement 14

  15. X86 Integer operations The 8086 provides support for both 8-bit (byte) and 16-bit (word) data types. The 80386 adds 32-bit addresses and data (doublewords) in the x86. The x86 integer operations can be divided into four major classes: 1. Data movement instructions, including move, push, and pop. 2. Arithmetic and logic instructions, including test, integer, and decimal arithmetic operations. 3. Control flow, including conditional branches, unconditional branches, calls, and returns. 4. String instructions, including string move and string compare. 15

  16. X86 Integer operations Some typical x86 instructions and their functions: 16

  17. X86 Integer operations Conditional branches on the x86 are based on condition codes or flags, like ARMv7. some of the integer x86 instructions: 17

  18. x86 Instruction Encoding Variable length encoding (1 15 bytes) Postfix bytes specify addressing mode Prefix bytes modify operation Operand length, repetition, locking, The opcode may include the addressing mode and the register A postbyte labeled mod, reg, r/m, contains the addressing mode information. 18

  19. Implementing IA-32 Complex instruction set makes implementation difficult Hardware translates instructions to simpler microoperations Simple instructions: 1 1 Complex instructions: 1 many Microengine similar to RISC Market share makes this economically viable Comparable performance to RISC Compilers avoid complex instructions 19

  20. Fallacies Powerful instruction higher performance Fewer instructions required But complex instructions are hard to implement May slow down all instructions, including simple ones Compilers are good at making fast code from simple instructions Use assembly code for high performance But modern compilers are better at dealing with modern processors More lines of code more errors and less productivity Dangers of writing in assembly language are the protracted time spent coding and debugging, the loss in portability, and the difficulty of maintaining such code. 20

  21. Fallacies Backward compatibility instruction set doesn t change But they do create more instructions x86 instruction set 21

  22. Pitfalls Sequential words or doubleword addresses in machines with byte addressing do not differ by one Increment by 4, not by 1! Keeping a pointer to an automatic variable after procedure returns e.g., passing pointer back via an argument Pointer becomes invalid when stack popped 22

  23. Concluding Remarks Design principles 1. Simplicity favors regularity 2. Smaller is faster (ARMv8 has 32 registers ) 3. Make the common case fast PC-relative addressing for conditional branches and immediate addressing for larger constant operands 4. Good design demands good compromises compromise between providing for larger addresses and constants in instructions and keeping all instructions the same length Layers of software/hardware Compiler, assembler, hardware LEGv8: typical of RISC ISAs c.f. x86 23

  24. Concluding Remarks Additional ARMv8 features: Flexible second operand Additional addressing modes Conditional instructions (e.g. CSET, CINC) 24

  25. Concluding Remarks Each category of ARMv8 instructions is associated with constructs that appear in programming languages The popularity of each class of instructions for SPEC CPU2006 is shown below 25

Related


More Related Content