Lecture 2

From Binary to Programming Languages

Machine Code

  • Computer manufacturers make CPUs or Central Processing Units which recognize certain patterns of bits. Thus, these patterns are computer or CPU specific.

  • CPUs understand machine code. These are the zeroes and ones that tell the machine what to do. Machine code might look like this: 01111111 01000101 01001100 01000110 00000010 00000001 00000001 00000000.

Assembly Code

  • It’s quite difficult for us to code in machine code, so assembly code was created.

  • Assembly code includes more english-like syntax. Assembly code is an example of source code.

  • Source code is code with a more english-like syntax that can be translated to machine code.

  • Some sequences of characters in assembly code include these: movl, addq, popq, and callq, which we might be able to assign meaning to. For example, perhaps addq means to add or callq means to call a function. What values are we doing these operations on? Well, registers!

  • The smallest unit of useful memory is called a register. These are the smallest units that we can do some operation on. These registers have names, and we can find them in assembly code as well, such as %ecx, %eax, %rsp, and %rsb.

  • Languages with easier to understand syntax than assembly code were created. Below is a program called hello.c that prints “hello, world” in the programming language C.

     #include <stdio.h>
    
     int main(void)
     {
         printf("hello, world\n");
     }
    

Compilers and Interpreters

  • With hello.c from earlier, we have to convert the program to the zeroes and ones the computer can understand.

  • To do this, we can use compilers, pieces of software that know both how to understand source code and the patterns of zeroes and ones in machine code and can translate one language to another.

    • To compile hello.c, we can use something installed on our computers called CC, or C Compiler.

    • To use the compiler, we go to our terminal window and type at the prompt.
      • A terminal window is a keyboard only interface to tell your computer what to do.
      • The prompt is represented by a dollar sign, $.
    • We type cc -o hello hello.c. This creates a new file called hello.

    • To run this program called hello, we type ./hello at the prompt where . represents the folder or directory that this file is in.

    • A sample of the terminal window might look like this:

      $ cc -o hello hello.c
      $ ./hello
      hello, world
      
  • Some languages skip the step of compilers and instead use interpreters. Interpreters take in source code and run the source code, line by line, from top to bottom and left to right.

  • Interpreters are created with the zeroes and ones that the CPU understands. These zeroes and ones can recognize keywords and functions in the source code.

  • Python is an interpreted language. To say “hello, world” in Python, we write the following line in hello.py.

      print("hello, world")
    
    • To interpret this source code, at the terminal, we simply type python hello.py, where python is the name of the interpreter.

    • The program python, in this case, opens up the file hello.py, reads it top to bottom, recognized the function print and knew what to do, namely print “hello, world” on the screen and quit.

    • A sample of the terminal window might look like this:

        $ python hello.py
        hello, world
      
  • Comparing compilers and interpreters, we might note that interpreters skip the step of having a compiled program before running it. This causes a performance penalty for interpreter languages, since each time, the interpreter will have to re-interpret the code.

  • To combat this issue, Python now generates bytecode, where it has already compiled the code and saved the results in a temporary file. When running the program again, Python will not interpret the code again but instead look at the pre-compiled version.

    • Bytecode looks something like this:

          0 LOAD_GLOBAL       0 (print)
          3 LOAD_CONST        1 ('hello, world')
          6 CALL_FUNCTION     1 (1 positional, 0 keyword pair)
          9 POP_TOP
          10 LOAD_CONST       0 (None)
          13 RETURN_VALUE
      

Virtual Machines

What if we want to run these programs on different computers, with different CPUs?

  • A virtual machine is a software that mimics the behavior of an imaginary machine.

  • With a virtual machine, instead of compiling the same code over and over again for different platforms, if each platform has this virtual machine installed, the exact same code can be run.

Python

Input and Printing

  • To greet our human, we might write this in hello1.py:

      name = input("What is your name? ")
      print("hello, " + name)
    
    • In Python, input is a function to get user input.

    • This function takes in a string (this string prompts the human for an input) and returns a string.

    • After returning this string, we would like to store it somewhere for access in the future. We can store these values in variables.

    • To set a variable equal to a value, we use one single equal sign, often called the assignment operator.

    • When printing, we can use the + operator to concatenate two strings.

  • In the terminal, we would then have this, using “David” as input:

      $ python hello1.py
      What is your name? David
      hello, David
    
  • We can print in multiple ways.

    • The print function can take multiple arguments, and it separates arguments with spaces.

      • If we wrote print("hello, ", name), we would get two spaces between “hello,” and “David”, one in the string with hello, and another as the separator between the arguments.

      • To fix this, we can simply write print("hello,", name).

    • The print function can be formatted such that we can literally write name in the string and instead print the value. We must surround the variable with curly braces and prefix the string with f; this tells Python that this string should be formatted in a special way. These strings are often called format strings or f-strings.

      • We can write print(f"hello {name}").
  • Let’s write the following code in arithmetic.py

      x = input("x: ")
      y = input("y: ")
      print(x + y)
    
    • Running this in the terminal, we get…

        $ python arithmetic.py
        x: 1
        y: 2
        12
      
    • We get 1 + 2 = 12. Remember that the input function returns a string and the + operator concatenates strings, and thus, we get the string “1” concatenated to “2”.

  • To fix this issue, we can change the input value from a string to an int, or integer. The function to do that is simply int.

  • Our code can then be written as…

    x = int(input("x: "))
    y = int(input("y: "))
    print(x + y)
    

Conditionals

  • Let us instead write a program that compares two numbers.
  • In conditions.py, we might write…

      x = int(input("x: "))
      y = int(input("y: "))
    
      if x < y :
          print("x is less than y")
      elif x > y:
          print("x is greater than y")
      elif x == y:
          print("x equals y")
    
  • The Boolean expressions are x < y, x > y, and x == y.
    • To check for equality, we have to use ==, since = is already the assignment operator.
  • The colon after the if and elif statements specifically say to do the following if the Boolean expression is true.

  • The indentations are necessary, so the print statements aren’t executed unless the Boolean expressions above them evaluate to true.

  • The second elif, or “else if”, statement is unnecessary since if a number is not less than or greater than another number, it must be equal to that number. We can modify our program to get this…

      x = int(input("x: "))
      y = int(input("y: "))
    
      if x < y :
          print("x is less than y")
      elif x > y:
          print("x is greater than y")
      else:
          print("x equals y")
    
  • In Boolean expressions, we can also use certain keywords: or and and.
  • We might write a program answer.py that does the following:

    c = input("Answer: ")
    
    if c == "Y" or c == "y":
        print("yes")
    elif c == "N" or c == "n":
        print ("no")
    
  • In this program, if the user inputs “Y”, c == "Y" will evaluate to true, and the program will print “yes”. If the user inputs “y”, c == "y" will evaluate to true, and the program will also print “yes”.

Functions

  • We might want to define our own function, such as square, where calling it returns the square of an input.

  • In return.py, we might define our own function called square.

      def main():
          x = int(input("x: "))
          print(square(x))
    
      def square(n):
          return n * n
    
      if __name__ == "__main__":
          main()
    
  • Note that we can’t call the function square before defining the function square since the interpreter reads from top to bottom. To fix this, we can create a main function, and then call the main function at the end of the file.

  • When we call the main function, we normally write a strange set of lines to ensure that the main function is not executed at the wrong time.

  • With the square function, we’ve abstracted away the multiplication, and now we can simply call square.

Loops

While Loops

  • To write a program positive.py that will pester the human until the human inputs a positive integer, we might write the following:

      def main():
          i = get_positive_int("i: ")
          print(i)
    
      def get_positive_int(prompt):
          while True:
              n = int(input(prompt))
              if n > 0:
                  break
          return n
    
      if __name__ == "__main__":
          main()
    
    • In the function get_positive_int, while True gives us an infinite loop. Python will then execute the indented code again and again until it is told to stop.
    • Note that True and False are Boolean values.
    • The break keyword tells Python to stop.
    • Once the loop has been broken, the function returns the value.

For Loops

  • To write a program score.py, where the user inputs a number and that many hashes are printed, we might write the following:

    n = int(input("n: "))
    for i in range(n):
        print("#", end="")
    print()
    
    • range is a function built into Python that returns a range of values from 0 to n - 1 inclusive.

    • The print function automatically prints a new line. In other words, it moves the cursor to the next line after printing. To stop Python from printing each hash on a separate line, we specify end="" as another argument to print, which tells Python to end the lines with nothing.

    • The final print() moves the cursor to the next line.

  • In the terminal, if we input 10 as n, we might see the following:

      $ python score.py
      n: 10
      ##########
    

Mario

  • In Super Mario Bros., a two dimensional world is created! Here’s one setting:

    questionmarks

  • To print the series of question marks shown, we might write

    for i in range(4):
        print("?", end="")
    print()
    
  • Here’s another setting with a 4x4 block.

    marioblock

  • To print the block shown, we’ll need to print hashes on both rows and columns. We must first iterate through the rows, and within each row, we then iterate through each column and print a hash.

    for row in range(4):
        for column in range(4):
            print("#", end="")
        print()
    

Types

  • In Python, there are many data types.
  • bool: True/False
  • int: Numbers
  • str: Strings of text
  • float: Real numbers with decimal points and digits after
  • dict: Hash table
  • list: Any number of values back to back
  • range: Range of values
  • set: A set of values with no duplicates
  • tuple: x, y or latitude, longitude

Libraries

  • In addition to the functions built into the core language, there are libraries and frameworks that provide additional features. These have to be imported manually to be used.

  • For example, in Python, if we want to generate pseudorandom numbers, we have to import a function randint from a library called random.

    • For example, to get a random integer between 1 and 10, we can write this:

      from random import randint
      
      print(randint(1, 10))
      
    • We can also just write import random without importing the specific function. In this case, we’ll have to prefix the function with the library name using dot notation as shown below.

    • To create a game where the user guesses a random integer between 1 and 10, we can write this:

      import random
      
      n = random.randint(1, 10)
      
      guess = int(input("Guess: "))
      
      if guess == n:
          print("Correct")
      
      else:
          print("Incorrect")
      
  • Note that these numbers are pseudorandom because computers can’t pick a random number like humans, they have to use algorithms, which are deterministic processes.

Memory

  • Inside a computer is hardware. These hardware chips are called RAM, or Random Access Memory. Inside each of these chips is some finite number of bytes used to represent values in our programs.

  • Python, and most other languages, decide a priori how many bits to use to represent values in our programs.

  • Thus, if our value cannot be represented in only that many bits, the language will instead approximately represent that value.

Imprecision

  • Let’s take a look at a program called imprecision.py that divides two numbers and returns the quotient.

        x = int(input("x: ))
        y = int(input("y: ))
    
        z = x / y
    
        print(f"{z:.30f}")
    
    • The syntax :.30f signifies that we’re printing z as a float to 30 decimal places.
    • We get…

        $ python imprecision.py
        x: 1
        y: 10
        x / y = 0.100000000000000005551115123126
      
  • This value isn’t what we expect! We don’t have enough bits to store the entire precise value, so the computer approximates the quotient. This is called floating-point imprecision.

Integer Overflow

  • A similar problem occurs with integers.

  • Consider a number that has been allocated three digits.
    • We start by counting.
    • Suppose we count until 999. We carry, and we get 1000.
    • However, the computer has only allocated three digits, so our 1000 gets mistaken for 000.
    • This is an example of integer overflow, where our large number has wrapped to a small number.
  • On December 31, 1999, people began to get nervous—programs stored the calendar year with only two digits. For 1999, the year was stored as 99. When the year 2000 approached, then, the year would be stored as 00, leading to confusion between the year 1900 and 2000. This became known as the Y2K problem.

  • In the past, Boeing 787 planes stored the number of hundredths of seconds in a counter. Once that counter overflowed (occurring on the 248th day), the plane would go into fail-safe mode and the power would shut off.