Lecture 1

Welcome!
What is an Algorithm
Analyzing Algorithms
Big O Notation
Omega Notation
Searching
Linear Search
Binary Search
Sorting
Selection Sort
Bubble Sort
Insertion Sort
Recursion
Merge Sort
Summing Up

Welcome!

In our previous session, we learned about interpreting information: How computers represent data using binary, how text is encoded using ASCII and Unicode, how images and videos are represented, and how algorithms solve problems.
This week, we are going to analyze algorithms, the step by step instructions for solving some problems.
Among today’s problems are searching, sorting, and more.
By the end of this lecture, you will understand how to evaluate and compare algorithms using formal notation, and you will have a vocabulary to communicate with technical members of your team about algorithm efficiency.

What is an Algorithm

Algorithms are processes that are designed to complete a specific task. You may not have called it by this name in the past. Maybe you’ve referred to it as a recipe or a routine. That’s basically the idea for what an algorithm is.
For example, you might have a morning routine or a morning algorithm. Even going through the process of solving a Wordle is an algorithm. Most folks who do so choose a starting word and always go off of that word. Then you go step by step, iterating through, trying to solve that puzzle. That is an algorithm.
Algorithms complete specific tasks. They also have well-defined inputs and outputs.
Consider the idea of a recipe. If you’re trying to make scrambled eggs for breakfast, you have all of your ingredients at the beginning and you’re trying to accomplish some objective at the end, namely a nice warm plate of scrambled eggs.
In order to do so, you need to go through a sequence of steps that are clear and ordered and unambiguous. You don’t, for example, put your eggs in the pan immediately. First you have to crack the eggs into a bowl and then add seasonings. Turn the oven on, and you’re going through these steps in a particular order.
If you do things out of order, you end up with a cold pan or eggshells in your breakfast, which is probably not ideal.
Ideally, an algorithm does not go on forever. This does not preclude the opportunity to repeat something over and over. For example, if you have an algorithm to count out numbers, the algorithm might be: Say the number you’re starting at, add one to that number in your head, say the next number out loud, and you go back and forth through those two steps over and over.
The algorithm itself has a finite number of steps, in this case two. But this algorithm will run forever. So algorithms can take an infinite amount of time to run, but they have a finite number of steps associated with them.

Analyzing Algorithms

Now that we have this idea of what an algorithm is, a set of instructions for completing a task, let’s talk about how we analyze algorithms in a computer science context.
There are basically two ways that we will do so. The first is based on how much time that algorithm takes to run. The other way is how much space or how much memory on your computer this algorithm takes up.
Sometimes we’ll see that the trade-off can be very important. We may take up far more memory on our computer in order to save time doing something.
There’s a catch here: We don’t exactly use time as a metric because different computers have different processors. The first computer from the 90s was a lot slower than computers today. The same algorithm running on both of those machines would run at very different amounts of time.
So instead of using time literally like seconds, minutes, hours, we instead consider how many steps an algorithm might take. In particular, we care about how many steps an algorithm might take as the data set gets bigger and bigger.
We care about what an algorithm will tend to do as some huge data set is thrown at it to give us a sense of how it will generally perform. Different processors, different computers are all going to handle things slightly differently. And so we only care about the general idea, the general number of steps it might take.
We also have to consider different situations. What if this data is in the worst possible shape and we need to do something to it? And what if this data is in the best possible shape?
The run times, or how many steps it takes to deal with that data, may vary based on whether that data is in good shape or in bad shape. These are the worst case and best case scenarios.
Whenever we’re talking about an algorithm, algorithms are always operating on some data, some inputs, and yielding some output. We use the letter n or the variable \(n\) to represent the size of the data that we are working with.

Big O Notation

Consider three hypothetical algorithms. The first runs in \(n^3\) time, meaning it takes \(n^3\) steps to run. The second runs in \(n^3 + n^2\) time. The third runs in \(n^3 - 8n^2 + 20n - 13\) time.
If we have a data set of size 1, the first algorithm takes 1 step, the second takes 2 steps, and the third takes 13 steps. So the last algorithm seems pretty bad, taking 13 times the amount of time compared to the first one.
But as the data set gets bigger, things change. With a data set of size 10, the first takes 1,000 steps, the second takes 1,100 steps, but the third only takes 400 steps.
With a data set of size 1,000, the first takes a billion steps, the second takes a billion and a million steps, and the third takes just shy of a billion steps (992 million).
What we’re seeing is that as the data set tends to get bigger, the algorithm tends to stop caring about the lower order terms. It’s really the \(n^3\) term that matters.
When we say an algorithm runs in \(n^3\) time, what we usually mean is that’s generally the term that’s going to dominate how many steps it takes to do something. We don’t concern ourselves with lower order terms like \(n^2\), and we don’t concern ourselves with constant factors.
If those times represented the worst case scenario runtimes, we may describe them as \(O(n^3)\). Big O means “on the order of.” We would describe that as an upper bound on how many steps that algorithm would take in the worst case.
There are different categories of running time that we might consider, in order of increasing steps:
Constant time, \(O(1)\): an algorithm that, irrespective of the size of the data, runs in one step or some constant number of steps. For example, an algorithm that always outputs the first element of a list, or an algorithm that adds two numbers together.
Logarithmic time, \(O(\log n)\): an algorithm proportional to log base 2 of \(n\). We’re basically dividing a problem in half repeatedly.
Linear time, \(O(n)\): an algorithm proportional to the size of the data set. One step for every one element, or three steps for every one element, but directly proportional to the size.
Log linear time, \(O(n \log n)\): we’ll talk about this later with merge sort.
Polynomial time, \(O(n^2)\): quadratic time is a special case.
Exponential time: the constant is in the base and the number of elements is in the exponent. This is rough territory. These algorithms take a lot of time to run and are generally to be avoided.
Factorial time, \(O(n!)\): even worse. With a data set of size 10, it takes 3 million steps. With size 1,000, it’s \(4 \times 10^{2567}\).
Unbounded: algorithms that may take infinite time in the worst case.

Omega Notation

If the times we discussed represented the best case scenario, we would represent them with the symbol omega, as in \(\Omega(n^3)\).
Omega represents the lower bound on how many steps it may take for that algorithm to run.
We can classify an algorithm by both its upper bound (Big O) and its lower bound (Omega), depending on whether we’re discussing the worst case or best case scenarios.

Searching

Searching for a value is one of the most fundamental things that we can do with a data set. This is true in computer science, but this is true for any business operation.
If you have a client list you need to go through, or you have a product list of SKUs that you need to go through, you’re going to be searching through that data set quite a bit.
There are a couple of ways to solve this problem with basic lists. They behave differently and they require slightly different situations to be true. One of them can be done on any list. One of them requires us to tweak that list a little bit in order to take advantage of an efficiency.

Linear Search

Linear search is the more fundamental approach. We have some list of data and we have some target that we’re trying to find in that list.

Here is the pseudocode for linear search:

 As long as we have not gone past the end of the list
     Check if the current item matches the target
     If so, stop (search was successful)
     Otherwise, go to the next item and return to step 1
 If we reach here, the search was unsuccessful

Notice that this is a finite number of steps, but there is a looping procedure where we keep going back to the beginning until we run out of data to check.

Imagine a list of 15 items. In computer science, we typically start our counting from zero. So there are 15 elements in the list numbered from index 0 up through index 14.
We step through the list one element at a time. We check each element against our target. If we find it, the search is successful. If we reach the end without finding it, the search is unsuccessful.
If we search for an element that isn’t in the list, we still go through the entire list. The algorithm was still correct. We exhaustively made sure that the element did not exist in the data set.
In the worst case scenario, we have to look through the entire list to find either that the item is in the last spot, or that the item wasn’t in the list at all.
In the best case scenario, the target is at the very beginning of the list. We only have to look at one element.
Using our notation, we could say that linear search runs in \(O(n)\) (worst case), and \(\Omega(1)\) (best case, finding it on the first step).

Binary Search

Binary search is different because it has a special condition: The list must be sorted.

If the list is sorted, we can take advantage of an efficiency. Here is the pseudocode:

 Note the start and end positions of the list
 As long as start is less than or equal to end
     Find the midpoint of those two values
     Check if the midpoint matches the target
     If so, stop (search was successful)
     If target is less than current element
         Change endpoint to look at smaller subset (left half)
     Else (target is greater than current element)
         Change start point to look at larger subset (right half)
If we reach here, the search was unsuccessful

Notice that we can throw away half of the list with each comparison because we know things are sorted.

If you’re looking for something smaller than what you found in the middle, you don’t need to look at everything bigger. If you’re looking for something bigger, you don’t need to look at everything smaller.
This is like tearing a phone book in half. You open to the middle, find that the name you’re looking for is on one side or the other, rip the phone book in half and throw away the portion that doesn’t have the name. You can keep tearing that part in half over and over.
With a list of 15 elements, instead of looking at all 15 elements, we might only iterate through the loop 3 or 4 times. That feels faster, and it is faster.
In the worst case scenario, we have to cut the list in half repeatedly, as many times as possible, down to a single element. This is \(O(\log n)\).
In the best case scenario, the very first time we calculate a midpoint, we found what we’re looking for. This is \(\Omega(1)\).
Binary search, with the caveat that it requires a sorted list, is more efficient than linear search.

Sorting

If we need a sorted list for binary search, how do we get one? Organizing data is very important for any business operation. We need to keep client lists maintained, SKUs maintained, databases in order.
There are many ways to sort. There are far more ways to sort than there are to search through lists. We’re going to talk about several options, but they are non-exhaustive.

Selection Sort

With selection sort, consider that we have an unsorted list. Our objective is to find the smallest item left in the list, then swap that item so it becomes the beginning of the list.
If we walk through the entire list and find the smallest thing and put it at the front, we can now declare that first item sorted. Then we repeat the process with the remaining elements.

Here is the pseudocode:

 Find the smallest item remaining in the unsorted portion
 Swap it with the first element of the unsorted portion
 Repeat until the entire list is sorted

Notice that we went through the list multiple times, not just once from beginning to end.

We went through \(n\) elements roughly \(n\) times (the list got a little smaller as we went). So the worst case scenario is \(O(n^2)\).
In the best case scenario, it’s exactly the same. We have no way to check if things are already in the right position. We still have to step through. So the lower bound is also \(\Omega(n^2)\): Both are \(n^2\) algorithms.

Bubble Sort

With bubble sort, we look at two elements at a time, and if they are out of order, we swap their positions. This effectively bubbles the bigger element further and further to the right.
We also keep track of how many times we make a swap. This will be useful.

Here is the pseudocode:

 Set swap counter to 0
 Look at each pair of adjacent elements
     If they are out of order, swap them and increment counter
 If swap counter is greater than 0, reset and repeat
 If swap counter is 0, the list is sorted

Notice that if we didn’t have to change the position of any values in a pass, everything must be correct.

In the example, we pushed the largest element to the end on each pass. After a few passes, when we made no swaps, we could declare everything sorted without going through all \(n\) iterations.
In the worst case (a completely backwards list), we would have to keep bubbling just the first element over, then the next, and so on. This is \(O(n^2)\).
In the best case (a completely sorted list), when we went through it one time, our swap counter would be zero, so we only need one pass. This is \(\Omega(n)\).
Bubble sort is an improvement over selection sort in the best case.

Insertion Sort

With insertion sort, we declare the first element sorted. Then we look at the next element, and if it’s out of order, we shift the sorted elements to fit that one into place.

Here is the pseudocode:

 Declare the first element sorted
 Look at the next unsorted element
 If it's larger than the largest sorted element, append it
 Otherwise, shift sorted elements to make room and insert it
 Repeat until all elements are sorted

Notice that we made one pass through the list, but every time we looked at one element, we maybe had to push many things out of the way.

If we have \(n\) elements, we might need to push \(n-1\) of them out of the way to make room for just one. That’s the same math as bubble sort.
Worst case scenario: The list is in reverse order, and we have to push up to \(n\) elements out of the way. This is \(O(n^2)\).
Best case scenario: Everything’s in the right order, and we just keep appending to the end. This is \(\Omega(n)\).
Insertion sort has the same runtime classification as bubble sort, but the process is very different. This is an example of how two algorithms that appear quite different can have the same effect and run in the same amount of time.

Recursion

The algorithms we’ve talked about so far have done an iterative process: We go through the steps, start at the beginning, continue on, and often go back to the beginning again.
In order to talk about the next sorting algorithm, we need to discuss recursion, which is also a very important problem-solving technique in computer science.
In a recursive algorithm, we try to make the problem a tiny bit smaller and just solve a teeny portion of that problem. Then we call the algorithm again on a slightly modified version of the problem and wait for its response to answer the question.
Consider the factorial function in math. The factorial of a number is the product of all integers from one up to and including that number. So factorial of 4 is 4 × 3 × 2 × 1 = 24.

Here is the iterative approach:

 Set product to 1
 Set counter to 1
 While counter is less than or equal to n
     Multiply product by counter
     Increment counter
 Return product

Notice that we loop through, multiplying as we go.

Here is the recursive approach:

 If n is less than or equal to 1
     Return 1 (this is the base case)
 Otherwise
     Return n × factorial(n - 1)

Notice that we call the same function on a smaller version of the problem.

The base case is the simplest problem we can definitely solve. The factorial of 1 is 1.
Otherwise, we return whatever we’re currently operating on multiplied by the factorial of the number one smaller. We’re trying to get towards that base case.
It’s like a game of telephone. We call somebody up on a smaller problem and wait for their response. Factorial of 4 asks for factorial of 3, which asks for factorial of 2, which asks for factorial of 1 (which we know is 1). Then the answers flow back up: 1, then 2, then 6, then 24.

Merge Sort

Merge sort requires us to think outside the box and come back to the trade-off between time and space. We’re going to use more memory to solve the problem as opposed to more time.
Every algorithm we’ve talked about has been done “in place,” working on that same array, just exchanging values. In merge sort, we have to make duplicates and do work elsewhere in memory, then put the pieces back together.
Here is the pseudocode for merge sort:
```
1   Sort the left half of the list
2   Sort the right half of the list
3   Merge the two halves together
```
Notice that this alone seems to outsource the problem. But we recursively apply this algorithm until we get to a list of size 1, which is trivially sorted.
A list of size 1 is automatically sorted. It’s the only thing in it. Once we reach lists of size 1, we start merging the pieces together.
When merging, we look at the first element of both sorted lists, compare them, put the smaller one into position, and repeat until both lists are exhausted.
We keep breaking the problem down into smaller and smaller pieces until we get to pieces we know how to solve (lists of size 1). Then we reassemble. It’s like taking a set of blocks, tearing it all apart, and rebuilding it back together in a different order.
In the worst case scenario, the list is in completely reverse order. The algorithm will never know this because the only thing it knows how to do is sort lists of size 1 and then put them back together. There are no shortcuts. This is \(O(n \log n)\).
In the best case scenario, it’s exactly the same. The algorithm knows nothing about the list, so there are no tricks or efficiencies. This is also \(\Omega(n \log n)\).
The upper bound (\(O(n \log n)\)) is better than the \(O(n^2)\) algorithms. But the lower bound (\(\Omega(n \log n)\)) is actually higher than the \(\Omega(n)\) we saw with bubble sort and insertion sort.
Why is it \(n \log n\)? We have \(n\) elements, and we may need to merge a single element back in \(\log n\) times (the number of times we can divide the list in half). So it’s \(n \log n\).

Summing Up

In this lesson, you learned about analyzing algorithms, the step by step instructions for solving problems. Specifically, you learned…

What algorithms are and their key characteristics: well-defined inputs and outputs, clear and ordered steps, and a finite number of steps.
How to analyze algorithms by considering time (number of steps) and space (memory usage).
About Big O notation for describing the upper bound (worst case) runtime of algorithms.
About Omega notation for describing the lower bound (best case) runtime of algorithms.
About different runtime categories: constant, logarithmic, linear, log linear, polynomial, exponential, factorial, and unbounded.
About linear search and how it runs in \(O(n)\) time and \(\Omega(1)\) time.
About binary search, which requires a sorted list but runs in \(O(\log n)\) time.
About selection sort, bubble sort, and insertion sort, which all run in \(O(n^2)\) time but have different best case behaviors.
About recursion as a problem-solving technique where we solve smaller versions of the same problem.
About merge sort, which uses extra memory but achieves \(O(n \log n)\) time.

See you next time!