Data Structures- Space and Time Complexity
Observing the time complexity of different algorithms
- Space and Time Complexity
- Constant O(1)
- Linear O(n)
- Quadratic O(n^2)
- Logarithmic O(logn)
- Exponential O(2^n)
- Hacks
Space and Time Complexity
Space complexity refers to the amount of memory used by an algorithm to complete its execution, as a function of the size of the input. The space complexity of an algorithm can be affected by various factors such as the size of the input data, the data structures used in the algorithm, the number and size of temporary variables, and the recursion depth. Time complexity refers to the amount of time required by an algorithm to run as the input size grows. It is usually measured in terms of the "Big O" notation, which describes the upper bound of an algorithm's time complexity.
Why do you think a programmer should care about space and time complexity?
- I think a programmer should care about space and time complexity because it is important to consider the resources and time of the programmers. This is important for making code as fast as possible and making sure there is minimal loading. Additionally, programs do not have unlimited storage to run, so we have to work under constraints of storage space.
Take a look at our lassen volcano example from the data compression tech talk. The first code block is the original image. In the second code block, change the baseWidth to rescale the image.
- This takes more time when you increase the baseWidth and scale the image to a higher resolution. This generates more pixels for the image and thus takes more time to load the image. So, having a larger baseWidth will require more time, and may not be necessary to scale to the maximum baseWidth in order to save resources.
from IPython.display import Image, display
from pathlib import Path
# prepares a series of images
def image_data(path=Path("../images/"), images=None): # path of static images is defaulted
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
if __name__ == "__main__":
lassen_volcano = image_data(images=[{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}])
image_display(lassen_volcano)
from IPython.display import HTML, display
from pathlib import Path
from PIL import Image as pilImage
from io import BytesIO
import base64
# prepares a series of images
def image_data(path=Path("../images/"), images=None): # path of static images is defaulted
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def scale_image(img):
#baseWidth = 625
#baseWidth = 1250
baseWidth = 2500
#baseWidth = 5000 # see the effect of doubling or halfing the baseWidth
#baseWidth = 10000
#baseWidth = 20000
#baseWidth = 40000
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
image['width'], image['height'] = img.size
image['pixels'] = image['width'] * image['height']
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
image['scaled_width'], image['scaled_height'] = img.size
image['scaled_pixels'] = image['scaled_width'] * image['scaled_height']
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data(images = [{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}])
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("---- meta data -----")
print(image['label'])
print(image['source'])
print(image['format'])
print(image['mode'])
print("Original size: ", image['size'], " pixels: ", f"{image['pixels']:,}")
print("Scaled size: ", image['scaled_size'], " pixels: ", f"{image['scaled_pixels']:,}")
print("-- original image --")
display(HTML(image['html']))
This code takes longer to run with a higher baseWidth. It is important to maintain a low enough baseWidth value for the code to run in a reasonable time.
- Computer usually does not respond with higher baseWidth due to not being powerful enough
Do you think this is a time complexity or space complexity or both problem?
- I think this is both a time and space complexity. You do not want the image to be too large and take up too much space for the program. Additionally, having a larger image with a larger baseWidth will take more time to generate. So, it is important to keep both space and time in mind when using the image scaler. It is important to pay attention to the time and physical(storage) constraints for a program.
numbers = list(range(1000))
print(numbers)
print(numbers[263])
ncaa_bb_ranks = {1:"Alabama",2:"Houston", 3:"Purdue", 4:"Kansas"}
#look up a value in a dictionary given a key
print(ncaa_bb_ranks[1])
Space
This function takes two number inputs and returns their sum. The function does not create any additional data structures or variables that are dependent on the input size, so its space complexity is constant, or O(1). Regardless of how large the input numbers are, the function will always require the same amount of memory to execute.
def sum(a, b):
return a + b
print(sum(90,88))
print(sum(.9,.88))
The amount of memory and the time it took to run both of these code segments was constant. This is good for a programmer because we always know the values of the constraints. However, these are very limited pieces of code and in order to make things more complex, we will need to have code which is more complex.
Time
An example of a linear time algorithm is traversing a list or an array. When the size of the list or array increases, the time taken to traverse it also increases linearly with the size. Hence, the time complexity of this operation is O(n), where n is the size of the list or array being traversed.
for i in numbers:
print(i)
Space
This function takes a list of elements arr as input and returns a new list with the elements in reverse order. The function creates a new list reversed_arr of the same size as arr to store the reversed elements. The size of reversed_arr depends on the size of the input arr, so the space complexity of this function is O(n). As the input size increases, the amount of memory required to execute the function also increases linearly.
def reverse_list(arr):
n = len(arr)
reversed_arr = [None] * n #create a list of None based on the length or arr
for i in range(n):
reversed_arr[n-i-1] = arr[i] #stores the value at the index of arr to the value at the index of reversed_arr starting at the beginning for arr and end for reversed_arr
return reversed_arr
print(numbers)
print(reverse_list(numbers))
These code segments show code which is linear in space and time. This helps us understand that the space and time values are variable upon the length of the list. This makes it easy for programmers to be able to estimate how long a code will run for. This can sometimes be inefficient, but a programmer will probably be able to make good predictions for linear space and time algorithms.
Time
An example of a quadratic time algorithm is nested loops. When there are two nested loops that both iterate over the same collection, the time taken to complete the algorithm grows quadratically with the size of the collection. Hence, the time complexity of this operation is O(n^2), where n is the size of the collection being iterated over.
for i in numbers:
for j in numbers:
print(i,j)
Space
This function takes two matrices matrix1 and matrix2 as input and returns their product as a new matrix. The function creates a new matrix result with dimensions m by n to store the product of the input matrices. The size of result depends on the size of the input matrices, so the space complexity of this function is O(n^2). As the size of the input matrices increases, the amount of memory required to execute the function also increases quadratically.
[Example of Matrix Multiplication]
- Main take away is that a new matrix is created.
def multiply_matrices(matrix1, matrix2):
m = len(matrix1)
n = len(matrix2[0])
result = [[0] * n] * m #this creates the new matrix based on the size of matrix 1 and 2
for i in range(m):
for j in range(n):
for k in range(len(matrix2)):
result[i][j] += matrix1[i][k] * matrix2[k][j]
return result
print(multiply_matrices([[1,2],[3,4]], [[3,4],[1,2]]))
This is quadratic space and time. It makes new matrices and takes more time as the lists get longer. This makes it harder to predict and see how much storage and time it will take.
Time
An example of a log time algorithm is binary search. Binary search is an algorithm that searches for a specific element in a sorted list by repeatedly dividing the search interval in half. As a result, the time taken to complete the search grows logarithmically with the size of the list. Hence, the time complexity of this operation is O(log n), where n is the size of the list being searched.
def binary_search(arr, low, high, target):
while low <= high:
mid = (low + high) // 2 #integer division
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
target = 263
result = binary_search(numbers, 0, len(numbers) - 1, target)
print(result)
Space
The same algorithm above has a O(logn) space complexity. The function takes an array arr, its lower and upper bounds low and high, and a target value target. The function searches for target within the bounds of arr by recursively dividing the search space in half until the target is found or the search space is empty. The function does not create any new data structures that depend on the size of arr. Instead, the function uses the call stack to keep track of the recursive calls. Since the maximum depth of the recursive calls is O(logn), where n is the size of arr, the space complexity of this function is O(logn). As the size of arr increases, the amount of memory required to execute the function grows logarithmically.
Time
An example of an O(2^n) algorithm is the recursive implementation of the Fibonacci sequence. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, starting from 0 and 1. The recursive implementation of the Fibonacci sequence calculates each number by recursively calling itself with the two preceding numbers until it reaches the base case (i.e., the first or second number in the sequence). The algorithm takes O(2^n) time in the worst case because it has to calculate each number in the sequence by making two recursive calls.
[A visualization of calculating the fibonacci sequence]
- Fibonacci: process of adding the most recent numbers of a series to get the next number(ex. 1, 1, 2, 3, 5, etc.)
def fibonacci(n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
#print(fibonacci(5))
#print(fibonacci(10))
#print(fibonacci(20))
#print(fibonacci(30))
#print(fibonacci(40))
Space
This function takes a set s as input and generates all possible subsets of s. The function does this by recursively generating the subsets of the set without the first element, and then adding the first element to each of those subsets to generate the subsets that include the first element. The function creates a new list for each recursive call that stores the subsets, and each element in the list is a new list that represents a subset. The number of subsets that can be generated from a set of size n is 2^n, so the space complexity of this function is O(2^n). As the size of the input set increases, the amount of memory required to execute the function grows exponentially.
def generate_subsets(s):
if not s:
return [[]]
subsets = generate_subsets(s[1:])
return [[s[0]] + subset for subset in subsets] + subsets
print(generate_subsets([1,2,3]))
print(generate_subsets([1,2,3,4,5,6]))
print(generate_subsets(numbers))
Using the time library, we are able to see the difference in time it takes to calculate the fibonacci function above.
- Based on what is known about the other time complexities, hypothesize the resulting elapsed time if the function is replaced.
- It takes more time with higher values of the function.
import time
start_time = time.time()
print(fibonacci(34))
end_time = time.time()
total_time = end_time - start_time
print("Time taken:", total_time, "seconds")
start_time = time.time()
print(fibonacci(36))
end_time = time.time()
total_time = end_time - start_time
print("Time taken:", total_time, "seconds")
Hacks
- Record your findings when testing the time elapsed of the different algorithms.
- They take more time with higher time or space complexity
- Often make the computer lag or sometimes VS Code will not respond
- Although we will go more in depth later, time complexity is a key concept that relates to the different sorting algorithms. Do some basic research on the different types of sorting algorithms and their time complexity.
- Binary search: will continuously discard half of the list searched to find an item, space complexity is O(1), time is O(log N)
- Searching through each element of the list has a time complexity of O(n) and space complexity of O(1)
- Bubble sort: sorts the list based on previous elements and will switch the order of the list, time complexity is O(n^2), space complexity is O(1)
- These are some common sorting and search algorithms we have experience with and I look forward to learning about the time and space complexity of other searches
- Why is time and space complexity important when choosing an algorithm?
- Time and space complexity is important when choosing an algorithm because it allows the user to have a program which can run on time and not use up enough storage in order to make sure that resources are conserved.
- Should you always use a constant time algorithm / Should you never use an exponential time algorithm? Explain?
- While constant time algorithms are more sure and will always have the same time, it is not always possible to have an algorithm which runs with constant time. For example, if you need to compare every item in a list to each element, this can become exponential time. While it is usually better to use a constant time algorithm, since time is conserved, it may be necessary to use exponential time.
- What are some general patterns that you noticed to determine each algorithm's time and space complexity?
- Usually, the algorithms with higher time and space complexity usually have defined multiple variables. The constant time and space code segments usually have variables which are not dependent on how long a certain variable or list is. They track the same amount of things each time.
Complete the Time and Space Complexity analysis questions linked below. Practice
- Answer Choice C: The time depends on the values of M and N, but they are independent of each other, so it iterates through each length. There are no new variables created
- Answer D: The code runs twice for each N, resulting in a N*N time complexity.
- Answer B: The code runs for each N and also runs for each logarithmic N.
- Answer B: If X runs at a lower time for large inputs, it will be better.
- Answer D: The code takes the value of i and divides it in half
- Answer C: You want to take into consideration both the space and time complexity of an algorithm.
- Answer B: This is how a time complexity algorithm works
- Answer C: The algorithm loops k^n-1 times.
- Answer C: Runs the first algorithm n times. Runs the second algorithm n-1 times.
def dog_breed():
print("Golden Retriever")
dog_breed()
# Same memory and time to run the program
def add_numbs(x):
# Makes list which has an entry for each value in range of x
numbs_list = ['None'] * x
# Iterates x times
for i in range(x):
numbs_list[i] = i
print(numbs_list)
add_numbs(10)
def wordgame():
print("Let us play a game")
word1 = input("Enter a word")
word2 = input("Enter another word")
list1 = []
list2 = []
score = 0
for i in word1:
list1.append(i)
for j in word2:
list2.append(j)
print(list1)
print(list2)
if len(list1) == len(list2):
score += 1
if len(list1) >= len(list2):
for idx, let in enumerate(list2):
if list1[idx] == list2[idx]:
score += 1
if len(list1) < len(list2):
for idx, jet in enumerate(list1):
if list1[idx] == list2[idx]:
score += 1
print("Your final score is ", score)
wordgame()
This code uses a lot of complexities in order to run. While this code could be written in a way to conserve a little more time and space, I wanted to play around with the time and space complexities. The code checks through 2 user inputted words whether the words have the same length and if they have any matching characters at the same index spots. If they do, the user is rewarded one point for each matching character or of the words being the same length. Time: O(i+j) The code goes through each unique list, and then checks each list against each other once. Space: O(i+j) This is a linear space complexity because there are some items created, like the score variable, but the values of list1 and list2 are dependent on the values of word1 and word2. Overall, this was somewhat challenging to determine the space and time complexity of, but good practice and helped show me why it is important to keep space and time complexity in mind, in order to prevent the code from becoming too clunky.