



# COMP 22 Spring 2023

Rev 2-21-23

# **Computer Organization** (Architecture)

## **Lecture 1A: Intro**

Dr Jeff Drobman

website drjeffsoftware.com/classroom.html

email | jeffrey.drobman@csun.edu



## Index (vol. 1A)



- ❖ Numbers → slide 3
- ❖ Logic → slide 9
- ❖ VLSI/ASIC/FPGA → slide 25
- Chip Design
  - ☐ Transistors → slide 31
  - $\Box$  Fab  $\rightarrow$  slide 40
- ❖ CPU cores → slide 61
- ❖ Pipelines → slide 69



## Section



# Numbers



# Binary/Hex vs Decimal



# Is there any point where using binary, hexadecimal or octal instead of decimal is better or worse from a computational standpoint?



#### Jeff Drobman

Lecturer at California State University, Northridge (2016-present) · Just now · (\$)

Binary is the radix of choice due to digital logic being in 2 states. Decimal is usually binary encoded in BCD and has no benefit.

Binary is a base 2 number system. Octal is merely representing binary as 3-bit digits, and hex as 4-bit digits. Sort of like zooming in or out of same data.



## **Ordinals**



COMP222

Powers of 2 <> 10: 10:3 -

#### Technical ordinals

| 10^(-24)  | yacto   |       |
|-----------|---------|-------|
| 10^(-21)  | zepto   |       |
| 10^(-18)  | atto    |       |
| 10^(-15)  | femto   |       |
| 10^(-12)  | pico    |       |
| 10^(-9)   | nano    |       |
| 10^(-6)   | micro   |       |
| 10^(-3)   | milli   |       |
|           | centi   |       |
| 10^(-1)   |         |       |
| 10^(+1)   |         |       |
| 10^(+2)   |         |       |
| 10^(+3)/2 |         | kilo  |
| 10^(+6)/2 |         | mega  |
| 10^(+9)/  |         | giga  |
| 10^(+12)  | /2^(40) | tera  |
| 10^(+15)  |         | peta  |
| 10^(+18), |         | exa   |
| 10^(+21)  |         | zetta |
| 10^(+24)  | /2^(80) | yotta |

#### <u>Gazillions</u>

| 10^(+6) million                       |
|---------------------------------------|
| 10^(+9) billion                       |
| 10^(+12) trillion                     |
| 10^(+15) quadrillion                  |
| 10^(+18) quintillion                  |
| 10 <sup>(+21)</sup> sexillion         |
| 10 <sup>(+24)</sup> septillion        |
| 10 <sup>(+27)</sup> octillion         |
| 10^(+30) nonillion                    |
| 10^(+33) decillion                    |
| 10^(+36) undecillion                  |
| 10 <sup>(+39)</sup> duodecillion      |
| 10 <sup>^</sup> (+42) tredecillion    |
| 10 <sup>(+45)</sup> quattuordecillion |
| 10 <sup>^</sup> (+48) quindecillion   |
| 10^(+51) sexdecillion                 |
| 10 <sup>(+54)</sup> septendecillion   |
| 10 <sup>^</sup> (+57) octodecillion   |
| 10 <sup>^</sup> (+60) novemdecillion  |
| 10^(+63) vigintillion                 |
| 10^(+100) googol                      |
| 10 <sup>(+303)</sup> centillion       |
| 10^(10^(+100))                        |
| googolplex                            |
|                                       |

| Ordin<br>al | Power of 2             | Power of 10     | Actual                  |
|-------------|------------------------|-----------------|-------------------------|
| 1K          | 2 <sup>10</sup>        | 10 <sup>3</sup> | 1024                    |
| 1M          | <b>2</b> <sup>20</sup> | 10 <sup>6</sup> | 1,048,576               |
| 1G          | <b>2</b> <sup>30</sup> | 10 <sup>9</sup> | 1.074x10 <sup>9</sup>   |
| 1T          | 2 <sup>40</sup>        | 1012            | 1.0995x10 <sup>12</sup> |

| Name    | <b>2</b> <sup>n</sup>   | M/G    | Actual                |
|---------|-------------------------|--------|-----------------------|
| byte    | <b>2</b> <sup>8</sup>   |        | 256                   |
| short   | 2 <sup>16</sup>         | 64K    | 65,536                |
| integer | <b>2</b> <sup>32</sup>  | 4B     | 4.3x10 <sup>9</sup>   |
| long    | 2 <sup>64</sup>         | 16 Q   | 1.84x10 <sup>19</sup> |
| IPv6    | <b>2</b> <sup>128</sup> | 340 uD | $3.4x10^{38}$         |





ctual

| CALIFORNIA STATE UNIVERSITY NORTHRIDGE COMP222 |              | G               | SiB/TiB (2 <sup>30</sup> /2 <sup>40</sup> ) |              |                 |          | Dr |
|------------------------------------------------|--------------|-----------------|---------------------------------------------|--------------|-----------------|----------|----|
| Decimal                                        | Abbreviation | Value           | Binary term                                 | Abbreviation | Value           | % Larger | A  |
| kilobyte                                       | KB           | 10 <sup>3</sup> | kibibyte                                    | KiB          | 2 <sup>10</sup> | 2%       |    |
|                                                |              |                 |                                             |              |                 |          |    |

1024  $2^{20}$ 1,048,576 megabyte MB  $10^{6}$ mebibyte MiB 5%  $10^{9}$  $2^{30}$ 1.074x109 gibibyte gigabyte GB GiB 7%  $10^{12}$  $2^{40}$ tebibyte TiB 1.0995 x10<sup>12</sup> TΒ 10% terabyte

|           |    |                  | , , , , , , , , , , , , , , , , , , , , |     |                 |             |                        | 1.00.                  | NIO NIO                 |
|-----------|----|------------------|-----------------------------------------|-----|-----------------|-------------|------------------------|------------------------|-------------------------|
| petabyte  | РВ | 10 <sup>15</sup> | pebibyte                                | PiB | 2 <sup>50</sup> | 13%         |                        |                        |                         |
| exabyte   | EB | 10 <sup>18</sup> | exbibyte                                | EiB | 2 <sup>60</sup> | 15%         |                        |                        |                         |
| zettabyte | ZB | 10 <sup>21</sup> | zebibyte                                | ZiB | 2 <sup>70</sup> | 18%         |                        |                        |                         |
| yottabyte | YB | 10 <sup>24</sup> | yobibyte                                | YiB | 280             | Ordin<br>al | Power of 2             | Power<br>of 10         | Actual                  |
|           |    |                  |                                         |     |                 | 1K          | <b>2</b> <sup>10</sup> | 10 <sup>3</sup>        | 1024                    |
|           |    |                  |                                         |     |                 | 1M          | <b>2</b> <sup>20</sup> | 10 <sup>6</sup>        | 1,048,576               |
|           |    |                  |                                         |     |                 | 1G          | <b>2</b> <sup>30</sup> | <b>10</b> <sup>9</sup> | 1.074x10 <sup>9</sup>   |
|           |    |                  |                                         |     |                 | 1T          | <b>2</b> <sup>40</sup> | 10 <sup>12</sup>       | 1.0995x10 <sup>12</sup> |



# Signed Numbers



|          |          | Sign      |            | Two's      |
|----------|----------|-----------|------------|------------|
| Binary   | Unsigned | Magnitude | Excess-127 | Complement |
| 00000000 | 0        | 0         | -127       | 0          |
| 00000001 | 1        | 1         | -126       | 1          |
| :        | :        | :         | :          | :          |
| 01111110 | 126      | 126       | -1         | 126        |
| 01111111 | 127      | 127       | 0          | 127        |
| 10000000 | 128      | -0        | 1          | -128       |
| 10000001 | 129      | -1        | 2          | -127       |
| :        | :        | :         | :          | ÷          |
| 11111110 | 254      | -126      | 127        | -2         |
| 11111111 | 255      | -127      | 128        | -1         |



## **ASCII Codes**



Binary, hexadecimal, and decimal equivalents for each character in "Hello World"

| Character | Binary   | Hexadecimal | Decimal |
|-----------|----------|-------------|---------|
| Н         | 01001000 | 48          | 72      |
| е         | 01100101 | 65          | 101     |
| 1         | 01101100 | 6C          | 108     |
| 1         | 01101100 | 6C          | 108     |
| О         | 01101111 | 6F          | 111     |
|           | 00100000 | 20          | 32      |
| W         | 01010111 | 57          | 87      |
| О         | 01101111 | 6F          | 111     |
| r         | 01110010 | 62          | 98      |
| 1         | 01101100 | 6C          | 108     |
| d         | 01100100 | 64          | 100     |
| NUL       | 00000000 | 00          | 0       |



## Section







## Computer Architecture



## 4-Layer Stack Model





## Transistors to Chips: Levels



Architecture Level Macro-arch

**SUB-levels** 

ISA

**Computer Org** 

Micro-architecture

Below "see" level

Logic Function Level

Device/Xtor Physical Level LSI: ICU/FSM

MSI: ALU/Reg

SSI: Random Logic

**Inverter/Gates** 

**Digital: MOSFET** 

Analog: R/C, PLL



## Logic Universal Set



COMP222

Manga Guide

https://nostarch.com/download/MangaGuidetoMicroprocessors\_sample\_Chapter2.pdf









# Logic Gates: Polarity







That's it! It also means that we can use De Morgan's laws to show our circuits in different ways. Using this technique, it's easy to simplify schematics when necessary.



BOTH OF THESE ARE NAND GATES!



BOTH OF THESE ARE NOR GATES!





## **NAND** Gates



COMP222



Tom Crosley · Follow

Embedded systems programmer for 45 years





## **CMOS** Gates



COMP222 \_\_\_\_\_

## **MOSFET**











## **MUX**





The following 4-to-1 multiplexer is constructed from 3-state buffers and AND gates





## Decoder







#### **Truth Table**

| $\mathbf{A}_1$ | $A_0$ | $\mathbf{D}_3$ | $D_2$ | $D_1$ | $D_0$ |
|----------------|-------|----------------|-------|-------|-------|
| 0              | 0     | 0              |       | 0     | 1     |
| 0              | 1     | 0              | 0     | 1     | 0     |
| 1              | 0     | 0              | 1     | 0     | 0     |
| 1              | 1     | 1              | 0     | 0     | 0     |

#### **Minterm Equations**

$$D_0 = \overline{A_1} \cdot \overline{A_0}$$
 
$$D_1 = \overline{A_1} \cdot A_0$$
 
$$D_2 = A_1 \cdot \overline{A_0}$$
 
$$D_3 = A_1 \cdot A_0$$



## Mealy-Moore FSM





#### Moore Machine





# Register File on MARS







## MIPS on MARS



#### REGISTER TYPE INSTRUCTION





## Adders







## COMP222 Quora

## Multiplication





Jeff Drobman · Just now

multiplication is usually done completely in hardware, via a 2D array of "XY(i) + C" multiplier modules, whereby each row generates a partial product of the next signed digit of the multiplier times the multiplicand. shifting occurs in the hardware placement of each row. this array can also be pipelined, so multiple operations can be performed in sequential concurrency.

(See the 1971 **Am2505** 2x4-bit multiplier slice, and my personal MS thesis.)



## Am2505 Multiplier



COMP222

Bit-slice

1971-80



2x4-bit slices



8-bit x 8-bit multiply





## Division



Non-Restoring Div

## How do calculators calculate binary division?



Jeff Drobman, Lecturer at California State University, Northridge (2016-present)

Answered just now

the most common division algorithm used in the past was "non-restoring". but there are others, as listed in Wikipedia:

"Division algorithms fall into two main categories: slow division and fast division. Slow division algorithms produce one digit of the final quotient per iteration. Examples of slow division include restoring , non-performing restoring, non-restoring , and SRT division. Fast division methods start with a close approximation to the final quotient and produce twice as many digits of the final quotient on each iteration. Newton-Raphson and Goldschmidt algorithms fall into this category."



## Section



## VLSI vs ASIC vs FPGA



## VLSI vs ASIC vs FPGA



3 options for chips

## **VLSI**

- ☐ Fully Custom
  - Building blocks: <u>Designer</u> IP (all levels)
  - Tools: Licensed from EDA vendors

#### **\***ASIC

- Semi Custom
  - Building blocks: Manufacturer IP
  - Tools: Provided by Mfr (ASIC vendor)

## **❖FPGA**

- Programmable Custom (SRAM)
  - Building blocks: logic gates (NAND, NOR)
  - Tools: Lab Programmers, software



## FPGA vs PLD



## **❖FPGA**

- ☐ Field Programmable (SRAM based)
  - Building blocks: logic gates (NAND, NOR)
  - Tools: Lab Programmers, software

#### **⇔PLD**

- □ PLA
  - AMD Mach family (merged with PAL)
- **□** PAL
  - MMI invented, bought by AMD
  - AMD spun off as Vantis (bought by Lattice)
- $\Box$  CPLD
  - Complex PLD



## Programmable Logic – FPGA



From Wikipedia, the free encyclopedia

**FPGAs** 

**FPGA** 

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term field-programmable. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that

HDL







#### DR JEFF SOFTWARE INDIE APP DEVELOPER © Jeff Drobman 2020-23

Quora\_\_\_\_

## **FPGA**

For example this is generic gaming console built using single FPGA:





## CPU/GPU vs ASIC





## What is the difference between ASICs and GPUs/CPUs?



#### Jeff Drobman

Lecturer at California State University, Northridge (2016-present) · Just now

in general, ASICs use smaller functional building blocks than "cores", which are much larger, complete units. ASIC's are "application specific" — to a single application, and are not *programmable*.

Whereas CPU and GPU cores are software *programmable* — so *general* purpose (not application specific).



## Section



# Chip Design Transistors



## **Making Transistors**



Doping P & N + -

# How does impurity mix with a material to form positive and negative semiconductor material?



#### **Jeff Drobman**

Lecturer at California State University, Northridge (2016–present) · Just now · (\$

P and N dopants (III and V valence) are chemically *diffused* as gases into the silicon substrate in ovens. sometimes *ion implantation* is used instead.



## Silicon Semi



COMP222 Holes Donor N Ш Periodic Table of the Elements 1IA VIIIA 11A 13 BA. VA VIA IIIA IVA VIIA 2A 3A 4A 5A 7A В Ne Si Na Ar HIB IVB VB. VIB VIIB IB IIB. 38 48 **7B** 28 Sc Cu Ga Ge Тс Rb Ru Sn Sb Sr Cd In Ba Hf Re Os Bi Po Rn Cs Ďь Bh Rg Sg Hs Rf Uuo Pm Sm Eu Lanthanide Nd Gd Ho Er Τm Series Actinide Series



Quora



**Chris Bevis · Follow** 

Lives in Silicon Valley (1997-present) · Updated Sep 9

#### Related Why is silicon mostly used in tech companies than germanium?

The earliest semiconductor devices were made of germanium. This started to change when the first planar IC's were developed. It is easy to grow a stable, insulating oxide film on silicon but not on germanium, so silicon was the material of choice. One of the key breakthroughs in silicon processing was made by Andy Grove and a colleague: the Grove-Deal model of SiO2 growth kinetics.

Ga

These days, most advanced devices have channels made of an alloy of Si and Ge doped with boron (SiGeB). The percentage of Si and Ge is carefully controlled and varied over the height of the channel to produce strain which enhances the mobility of electrons or holes.



# **Bipolar Transistors**





**BJT PLANAR** Structure [edit] nSimplified cross section of a planar NPN bipolar junction transistor



# Physical Level: MOSFET



Device/Xtor
Physical
Level

**Inverter/Gates** 

Digital: MOSFET



Structure of a MOSFET in the integrated circuit.

(see separate slide set *Transistors*)



#### **MOS Transistor**







#### **CMOS Transistors**



COMP222

MOSFET





Last 4 steps



#### **IC Process & Interface**





TTL compatible — 5V → 3V

**5V** 

#### Bipolar

RTL→DTL→TTL → Schottky TTL→ LS TTL



effect transistor, also known as the metaloxide-silicon transistor, is a type of fieldeffect transistor that is fabricated by the controlled oxidation of a semiconductor, typically silicon. It has an insulated gate, whose voltage determines the conductive



#### Section



# Chip Design Fab



## Chip Specs



- Architectural
- Functional
- Mechanical
- ❖ Electrical (DC)
- ❖Timing (AC)
- ❖Thermal (theta JA, JC, CA)



#### Wafers: Yield





yellow shows bad dice

(a function of defect density)



# Chip Specializations





On-chip specialization or microarchitctural changes merely account for a fraction of the performance gains.



# Chip Fab



7 nm

**TSMC** 

What's a 5nm FinFET transistor look like?

Well... I didn't find a good pic for 5nm, but I found a pic for 7nm here: ☑



We can assume 5nm is slightly smaller on all axes.

The first thing to note is that the gate length ( $L_g$ ) is 16.5nm. The width of the gate will vary, but I am going to go out on a limb and guess it is no smaller than  $3W_{\rm fin}$  . or 18nm. And the overall structure is tall: 52nm plus another few nm



# Making FinFET Transistors



CMOS = P & N

#### Samsung's 5nm (HD)



#### Samsung's 5nm (UHD)



Above are CMOS gates having N and P transistors, roughly 5nm node has transistor size of 130 x 90 nm. Far, far, far away from 5nm name.



#### **New Processes**



COMP222



➤ It is all about the **Gate** 



## Wafer Fabs Today



| 1968 | ** | Intel |
|------|----|-------|
|------|----|-------|

<sup>1978</sup> **❖** Micron\*\*

¹980 ❖ Samsung

1987 **❖** TSMC\* (1<sup>st</sup> foundry)

2009 ❖ AMD, IBM → Global Foundries\*

2010 ❖ Chartered → Global Foundries\*

**❖** SMIC (China)

\*Pure Foundry

\*\*Internal use only



#### **WW Fab Shares**



COMP222



Source: LA Times/SIA 1/22/22



## Foundry Sizes 2020-1







#### **TSMC**





#### 1Q21 Revenue by Technology







#### **TSMC Customers**



COMP222

|            | 2019  | 2020  | 2021  |
|------------|-------|-------|-------|
| Apple      | 24.0% | 24.2% | 25.4% |
| Hi-Silicon | 15.0% | 12.8% | 0.0%  |
| Qualcomm   | 6.1%  | 9.8%  | 7.6%  |
| NVIDIA     | 7.6%  | 7.7%  | 5.8%  |
| Broadcom   | 7.7%  | 7.6%  | 8.1%  |
| AMD        | 4.0%  | 7.3%  | 9.2%  |
| Intel      | 5.2%  | 6.0%  | 7.2%  |
| Mediatek   | 4.3%  | 5.9%  | 8.2%  |

Source: The Information Network (www.theinformationnet.com)



## Foundry Supply Chain







**Harvey King** 

born in Taiwan, grew up in US, family rooted in mainland China for centuries.





#### Fab Timescales







#### Semiconductor Fab Production Timescales

Required to increase fab utilization

| Up  | Time) | Package  |            | Distribution |            |
|-----|-------|----------|------------|--------------|------------|
| ~24 | ~12   | ~6       |            |              |            |
|     | Up ·  | Up Time) | ~24 ~12 ~6 | ~24 ~12 ~6   | ~24 ~12 ~6 |



#### Global 72 New Fabs



COMP222





#### Metal Interconnect



**Quora** 

Aluminum → Copper → Cobalt





#### Metal Interconnect



Quora

Aluminum → Copper → Cobalt

The transistors connect to each other using metal wires called interconnects. There are several layers of them (more then 8). Usually the space between them is filled with an insulator to provide structural rigidity, and expel air which can expand when heated and damage the chip, but IBM very kindly took the insulator out of one of their chips to show what the interconnect looks like:





#### **New Processes**



COMP222





#### Chip Packaging







#### Package Technology Evolution

1960 - 1985 CDIP, PDIP+ > 50 pkg types 1986 – 1995 SOIC, PLCC, QFP+ > 250 1996 – 2000 BGA, QFN, SiP+ > 1000 2001 – 2005 Modules, Cards, Stack > 1500

























#### Chip Packaging



# Ball Grid Packaging and Chip Scale Packaging (1990s – 2000s)

As the demands of semiconductor speed continue to pick up, so does the need for better packaging. While QFN (quad-flat no-leads) and other Surface Mounted technologies clearly continue to proliferate, I want to introduce you to the beginning of a package design that we will have to know about in the future. This is the beginning of the solder balls – or broadly Ball Grid Array (BGA) packaging.



Those balls or bumps are called solder bumps/balls



#### Chip Packaging (MCM)



COMP222





#### Section



## CPU Cores



#### **CPU Function**



# What is the most purely mathematical description you could give of the workings of a CPU?



#### Jeff Drobman

Lecturer at California State University, Northridge (2016-present) · Just now

CPU\_state = function(instruction, last\_state)

FSM definition

CPU\_state = Function(instruction, last\_state)



#### CPU = Data + Control







# MIPS/MARS ALU







# 4 Levels of CPU Architecture

DR JEFF © Jeff Drobman 2020-23

COMP222



System= Multi-core SoC





COMP122

**❖ Org + ISA** *CPU Core* internals



COMP222 Micro **Pipeline** level





## 3 Levels of Integration



јејј Drobma 2020-23





#### SoC = CPU + GPU



COMP222

**CPU** cores



**GPU** cores

This SOC has four CPU cores (ARM Cortex) and 192 GPU cores (Kepler).



#### CPU Cores: P & E





Desktop (Server)





Performance

Efficiency (Power)



#### Section



# Pipelines



# MIPS RISC Pipeline



5 Stages

Each stage takes only 1/5 of instruction cycle: **clock F => 5x** 





MIPS Pipelined Org



