LeetCode: Arrays and Hashing

Table Of Contents
Arrays & Hashing Intro:
- What is a Hashmap
- Hashing With A Function
- Choosing a Hash Function
- Handling Collisions
- Balancing Hash Table With Load Factor
- Array Application: In place Transformations
- HashMap Application: Representations
- HashMap Application: Grouping by Criteria
- HashMap Application: Memoization in Dynamic Programming
- HashMap Application: Backtracking with Caching for Pruning
- HashMap Application: Representing Relationships
- HashMap Application: Index with Data Key
- HashMap Application: Algorithm
Arrays & Hashing Intro:
LeetCode problems with solutions using hashmaps.
What is a Hashmap
A dictionary is just a direct mapping of keys->values:
Dictionary: ['a': 2, 'b': 2, 'c' : 3, ...]
A Hashmap is just a dictionary that maps keys->values using a hash function. Hashmaps are a common use case for hashing. We choose hashing to maximize randomness and minimize collisions between elements to keys.
Hashing With A Function
Hashing is simply a function. It takes in an input and spits out an output. Here is the function mod, which takes in an integer and spits out an integer:
- 1 mod (3) = 1
- 2 mod (3) = 2
- 3 mod (3) = 0
- 4 mod (3) = 1
Now this is could be our function for our hashmap, but it would lead to high collisions and low randomness as there are only 3 possible key results: 0, 1, and 2. So hashmaps are only efficient as the chosen hash function.
Choosing a Hash Function
With a good hash function: Insert, Lookup, and Delete take O(1). In this case, every element gets its own unique key and so a lookup using this function would be constant O(1).
With a bad hash function: Insert, Lookup, and Delete take O(n). Lets say our function is hash():
Handling Collisions
Even with a good hash function, we may run into collisions sometimes. In those cases, we handle collision using chaining with a linked list of elements.
Here, there are 5 keys or buckets we could hash to.
hash() -> [key mod (5)]
----------------------------
hash(10) = 10 % 5 = 0
hash(22) = 22 % 5 = 2
hash(31) = 31 % 5 = 1
hash(14) = 14 % 5 = 4
hash(17) = 17 % 5 = 2 (collision!)
Key (index) | Value (element) |
---|---|
0 | [10] |
1 | [31] |
2 | [22, 17] |
3 | [] |
4 | [14] |
Balancing Hash Table With Load Factor
Load factors are calculated to ensure a hash table maintains its O(1) time complexity.
Balancing occurs when a hash table has passed its set load factor.
Load Factor = n/m Where n = num of elements, m = num of keys
Thus, when certain fraction of the table size, say 75% full with 3 elements and 4 potential keys. The table must increase in size and rehash/reinsert every element.
The efficiency of a hash table decreases significantly without balancing, as this leads to increased collisions as the table fills up, which leads to time needed to resolve collisions, as we traverse through list to find the element in linear time O(n). Operations like search, insert, and delete, degrade from O(1) on average to O(n) in the worst case.
Usually, when the load factor is reached, the table size doubles to 2n. 0.75 or 75% full is a common load factor for a hash table.
Upon balancing, we need to rehash every element leading to n/2 additional space:
New table size - current number of elements
(double table size * load factor) - current n elements
(2 n * 0.75)
1.5 n = New Load Factor Element Count
New Load Factor Element Count - Current Element Count
1.5 n - n
0.5
n/2 = Additional space until next rebalance
Rehashing is expensive as we need to rehash every element with the new hash function into their new bucket, leading to O(n) for resizing and reinserting n elements.
Array Application: In place Transformations
We can perform transformations or reorderings on the array itself without using extra space.
Ex: Rotate an array to the right by k steps.
def rotate(nums: List[int], k: int) -> None:
n = len(nums)
k %= n # handle cases where k > n
# Opposite ends reverse subarray [start, end]
def reverse(start: int, end: int) -> None:
# Reverse until hit middle of subarray
while start < end:
nums[start], nums[end] = nums[end], nums[start]
start += 1
end -= 1
# Reverse entire array
reverse(0, n - 1)
# Reverse first k elements [0, k)
reverse(0, k - 1)
# Reverse remaining elements [k, n)
reverse(k, n - 1)
HashMap Application: Representations
We can represent objects or data based on specific criteria.
Ex: Representing a string by character frequency
def freqCount():
# object
s = "aabbcc"
# representation
freq = {}
# mapping
for char in s:
freq[char] = freq.get(char, 0) + 1
# freq = {'a': 2, 'b': 2, 'c': 2}
HashMap Application: Grouping by Criteria
We can group elements based on a defined criterion, such as sorting or categorization, using hashing to push values into the corresponding bucket.
Ex: Grouping string anagrams
def groupAnagrams(strs):
# groups
anagrams = {}
# for list
for word in strs:
# create key
key = "".join(sorted(word))
if key not in anagrams:
anagrams[key] = []
# hash key and put value into bucket
anagrams[key].append(word)
# anagrams = {'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat'], 'abt': ['bat']}
HashMap Application: Memoization in Dynamic Programming
We can store solutions to sub problems to avoid redundant calculations.
Ex: Fibonacci number computation with memoization
def fib(n, memo={}):
if n <= 1:
return n
if n not in memo:
memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
return memo[n]
HashMap Application: Backtracking with Caching for Pruning
We can cache previous paths or states to prune the search space effectively as we explore.
Ex: Subset Sum with caching:
def subsetSum(nums, target):
result = []
# Cache to avoid exploring previously visited paths
cache = {}
def dfs_backtrack(current, index, total):
# Check if already explored state
state = (tuple(current), total)
if state in cache:
return
# Mark as explored
cache[state] = True
# Early Prune -> no potential valid path is current total exceeds target
if total > target:
return
# Valid solution -> no need to explore further, backtrack
if total == target:
result.append(list(current))
return
# Explore -> branches from current state
for i in range(index, len(nums)):
# Build
current.append(nums[i])
# Explore
dfs_backtrack(current, i + 1, total + nums[i])
# Backtrack
current.pop()
dfs_backtrack([], 0, 0)
return result
HashMap Application: Representing Relationships
We can model relationships between entities.
Ex: Adjacency list for a graph
def graph():
# List of edges in the graph
edges = [(1, 2), (2, 3), (1, 3)]
# Adjacency list
graph = {}
# Build the adjacency list
for u, v in edges:
# If the node 'u' or 'v' is not yet in the graph, initialize it
if u not in graph:
graph[u] = []
if v not in graph:
graph[v] = []
# Add 'v' to the list of neighbors for 'u' and 'u' to 'v'
graph[u].append(v)
graph[v].append(u)
# graph = {1: [2, 3], 2: [1, 3], 3: [2, 1]}
HashMap Application: Index with Data Key
We can index into arrays based on data or data structures.
Ex: Store integer complements for quick lookup:
def twoSum(nums, target):
# Complement -> index
complement_map = {}
for i, num in enumerate(nums):
# Calculate complement
complement = target - num
# Lookup(complement)
if num in complement_map:
return [complement_map[num], i]
# Put(complement)
complement_map[complement] = i
return []
HashMap Application: Algorithm
There are cases where problem that seems to require a hashmap have an existing algorithm made for that problem.
Ex: Boyer Moore Voting Algorithm
def majorityElement(nums: List[int]) -> int:
# Find element that appears more then floor(n/2) times
# votes for candidate
count = 0
# current candidate
candidate = None
for num in nums:
# Reset candidate
if count == 0:
candidate = num
if num == candidate:
count += 1
else:
count -= 1
# Confirm the candidate ( optional if majority element is guaranteed)
return candidate
217. Contains Duplicate ::2:: - Easy
Topics: Array, Hash Table, Sorting
Intro
Given an integer array nums, return true if any value appears at least twice in the array, return false if every element is distinct.
Example Input | Output |
---|---|
nums = [1,2,3,1] | true |
nums = [1,2,3,4] | false |
nums = [1,1,1,3,3,4,3,2,4,2] | true |
Constraints:
1 ≤ nums.length ≤ 105
-109 ≤ nums[i] ≤ 109
Abstraction
Given a list of elements, check for duplicates.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Hashmap | O(n) | O(n) | Iterate over array O(n) + get, put in O(1) | Dictionary size relative to input O(n) |
Set() | O(n) | O(n) | Iterate over array O(n) + get, in in O(1) | Set size relative to input O(n) |
Solution 1: Hashmap - Hashmap/Representation
def containsDuplicate(self, nums: List[int]) -> bool:
# sc: hashmap relative to input O(n)
count = defaultdict(int)
# tc: iterate over list O(n)
for num in nums:
# tc: in operation constant O(1)
if count[num] >= 1:
return True
# tc: put operation constant O(1)
count[num] += 1
# overall: tc O(n)
# overall: tc O(n)
return False
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iteration | O(n) | O(n) | Iterate over array in linear O(n) + get, put in O(1) | Dictionary relative to input O(n) |
Overall | O(n) | O(n) | Iteration over input dominates, O(n) | Allocation for dictionary dominates, O(n) |
Solution 2: Set - Hashmap/Representation
def containsDuplicate(self, nums: List[int]) -> bool:
# sc: set relative to input O(n)
seen = set()
# tc: iterate over list O(n)
for n in nums:
# tc: in operation O(1)
if n in seen:
return True
# tc: add operation O(1)
seen.add(n)
# overall: tc O(n)
# overall: sc O(n)
return False
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iteration | O(n) | O(n) | Iteration over list in linear O(n) + in, add operation O(1) | Set size relative to input O(n) |
Overall | O(n) | O(n) | Iteration over list dominates, O(n) | Allocation relative to input dominates, O(n) |
242. Valid Anagram ::2:: - Easy
Topics: Hash Table, String, Sorting
Intro
Given two strings s and t, return true if t is an anagram of s, and false otherwise. An Anagram is a word or phrase formed by rearranging the letters of a different word using all original letters exactly once.
Example Input 'S' | Example Input 'T' | Output |
---|---|---|
"anagram" | "nagaram" | true |
"rat" | "car" | false |
Constraints:
1 ≤ s.length, t.length ≤ 5 * 104
s and t consist of lowercase English letters
Abstraction
Given two strings determine if they have the same character count.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Hashmap | O(n) | O(1) | Two iterations over each word, linear O(n) + lookup, put in constant time O(1) | Dictionary will be proportional to input, space O(n) |
Array of 26 | O(n) | O(1) | Single iteration over both words at same time O(n) + lookup, put in constant time O(1) | Array will be constant 26 elements, space O(1) |
Solution 1: Hashmap Double Pass - Hashmap/Representation
def isAnagram(self, s: str, t: str) -> bool:
# sc: hashmap of 26 elements O(1)
count = defaultdict(int)
# tc: two iteration over string length O(n) * 2 ~= O(n)
for x in s:
count[x] += 1
for x in t:
count[x] -= 1
# tc: iteration over char count O(n)
for value in count.values():
# tc: get() in constant O(1)
if value != 0:
return False
# overall: tc O(n)
# overall: sc O(1)
return True
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iteration | O(n) | O(n) | Iterate over two strings in linear O(n) + put, get in O(1) | Dictionary relative to input O(n) |
Verification | O(n) | O(1) | Iterating over char count in linear O(n) + get in O(1) | No additional memory O(1) |
Overall | O(n) | O(n) | Iteration over input dominates, O(n) | Allocation for dictionary relative to input dominates, O(n) |
Solution 2: Array of 26 Single Pass - Hashmap/Representation
def isAnagram(self, s: str, t: str) -> bool:
# Note:
# ord() converts unicode char to int representation
# and use int to index into array
# sc: array of 26 constant O(1)
count = [0] * 26
# Check: mismatch length
if len(s) != len(t):
return False
# tc: single iterate over both strings at same time O(n)
for i in range(len(s)):
count[ord(s[i]) - ord('a')] += 1
count[ord(t[i]) - ord('a')] -= 1
# tc: iterate over count array length 26 O(1)
for val in count:
if value != 0:
return False
# overall: tc O(n)
# overall: sc O(1)
return True
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iteration | O(n) | O(1) | One iteration over both strings in linear O(n) + put, get in O(1) | Array of constant 26 size O(1) |
Verification | O(n) | O(1) | Iterating over count array length 26 in constant O(1) | No additional memory O(1) |
Overall | O(n) | O(1) | Iteration over input dominates, O(n) | Allocation for array is constant O(1) |
1. Two Sum I Not Sorted ::1:: - Easy
Topics: Array, Hash Table
Intro
Given an array of integers nums and an integer target, return indices of two numbers such that they add up to target. You can assume each test case only has one solution and you may cant use the same element twice. Answer can be returned in any order.
nums[] | Target | Output |
---|---|---|
[2,7,11,15] | 9 | [0,1] |
[3,2,4] | 6 | [1,2] |
[3,3] | 6 | [0,1] |
Constraints:
Only one valid answer exists
2 ≤ nums.length ≤ 10^4
-10^9 ≤ target ≤ 10^9
Abstraction
We need to find two elements that add up to the target and return their indexes.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Hashmap of Complements | O(n) | O(n) | Iterate over array O(n) | Allocation relative to input O(n) |
Solution 1: Hashmap - Hashmap/Index with Data Key
def twoSum(self, nums: List[int], target: int) -> List[int]:
# Note:
# dictionary is complement -> index
# sc: dictionary size relative to input O(n)
tracking = {}
# tc: iterate over list O(n)
for i in range (len(nums)):
complement = target - nums[i]
# tc: in operation O(1)
if complement in tracking:
return [i, tracking[complement]]
# tc: put operation O(1)
tracking[nums[i]] = i
# overall: tc O(n)
# overall: sc O(n)
return []
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iteration | O(n) | O(n) | Iterate over list in linear O(n) + in, put O(n) | Dictionary relative to input O(n) |
Overall | O(n) | O(n) | Iterate over list dominates, O(n) | Allocation relative to input dominates, O(n) |
49. Group Anagrams ::2:: - Medium
Topics: Array, Hash Table, String, Sorting
Intro
Given an array of strings strs, group the anagrams together. You can return the answer in any order. An Anagram is a word formed by rearranging the letters of a different word using all the original letters exactly once.
Input | Output |
---|---|
["eat","tea","tan","ate","nat","bat"] | [["bat"],["nat","tan"],["ate","eat","tea"]] |
[""] | [[""]] |
["a"] | [["a"]] |
Constraints:
strs[i] consists of lowercase English letters
Abstraction
Group the matching anagrams together.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Array to Tuple Key | O(n * k) | O(n * k) | Iterate over list O(n) * iterate over string k length O(k) | Dictionary relative to input O(n * k) |
Array to String Key | O(n * k) | O(n * k) | Iterate over list O(n) * iterate over string k length O(k) | Dictionary relative to input O(n * k) |
Bug: Mutable object, list, cannot be hashed
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
anaGroup = {}
for word in strs:
charCount = [0] * 26
for char in word:
charCount[ord(char) - ord('a')] += 1
# BUG:
# Attempted to hash mutable list charCount
if charCount not in anaGroup:
anaGroup[charCount] = []
anaGroup[charCount].append(word)
return list(anaGroup.values())
Bug: Cannot use unsorted hashmap to tuple as key
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
groupAna = {}
for s in strs:
count = defaultdict(int)
for c in s:
count[c] += 1
tupKey = tuple(count)
# BUG:
# hashmaps are not sorted
# so tuple will vary (e.g., "cat" vs "tac")
if tupKey not in groupAna:
groupAna[tupKey] = []
groupAna[tupKey].append(s)
return list(groupAna.values())
Solution 1: Array to Tuple Key - Hashmap/Grouping by Criteria
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
# Note:
# Tuples immutable, allowing them to be hashed
# Array allows for constant ordered representation
# sc: store k chars for n words O(n * k)
anaGroup = {}
# tc: iterate over list O(n)
for word in strs:
# sc: array for 26 O(1)
charCount = [0] * 26
# tc: iterate over string k length O(k)
for char in word:
charCount[ord(char) - ord('a')] += 1
# sc: array of 26 to tuple O(1)
key = tuple(charCount)
# tc: in operation O(1)
if key not in anaGroup:
anaGroup[key] = []
# tc: put operation O(1)
anaGroup[key].append(word)
# overall: tc O(n * k)
# overall: sc O(n * k)
return list(anaGroup.values())
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iterate list | O(n) | O(1) | Iterate over list O(n) | No allocation for iteration O(1) |
Iterate string | O(k) | O(1) | Iterate over string k length O(k) | Array of 26 per iteration O(1) |
Store string in anagram group | O(1) | O(n * k) | Put operation constant O(1) | Dictionary stores n words of k length O(n * k) |
Overall | O(n * k) | O(n * k) | Iteration over list and strings dominate, O(n * k) | Allocation for all strings dominates, O(n * k) |
Solution 2: Array to String Key - Hashmap/Grouping by Criteria
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
# Note:
# Appending to list then joining is more efficient than repeatedly
# appending to a string, it avoids the overhead of creating new string
# objects on each append, overall: list in O(m), string in O(m^2)
# sc: stores n tuple keys O(n) and lists of original k strings O(k), O(n * k)
anaGroup = {}
# tc: iterate over list O(n)
for word in strs:
# sc: array of 26 constant O(1)
charCount = [0] * 26
# tc: iterate over string O(m)
for char in word:
charCount[ord(char) - ord('a')] += 1
# tc: array of 26 to list O(1)
key_parts = []
for count in charCount:
key_parts.append(str(count))
# delimiter
key_parts.append("#")
# tc: concat list of 26 to string O(1)
key = ''.join(key_parts)
# tc: in operation O(1)
if key not in anaGroup:
anaGroup[key] = []
# tc: put operation O(1)
anaGroup[key].append(word)
# overall: tc O(n * k)
# overall: sc O(n * k)
return list(anaGroup.values())
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iterate list | O(n * m) | O(1) | Iterate over list O(n) | No allocation for iteration O(1) |
Iterate string | O(k) | O(1) | Iterate string of k length O(k) | Array of 26 O(1) |
Overall | O(n * k) | O(n * k) | Iterating over list and strings dominate, O(n * k) | Allocation for all strings dominates, O(n * k) |
347. Top K Elements in List ::3:: - Medium
Topics: Array, Hash Table, Divide and Conquer, Sorting, Heap (Priority Queue), Bucket Sort, Counting, Quickselect
Intro
Given an integer array nums and an integer k, return the k most frequent element within the array. Test cases are generated such that the answer is always unique. You may return the output in any order
Input | k | Output |
---|---|---|
[1,2,2,3,3,3,3] | 2 | [2,3] |
[7,7] | 1 | [7] |
Constraints:
1 ≤ k ≤ number of distinct elements in nums
-1000 ≤ nums[i] ≤ 1000
Abstraction
To find the k most frequent elements, we must first create an occurrence counter for each element in the list. Now that we have the count, we just grab the top k highest occurring elements.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
MinHeap | O(n log k) | O(n) | Iterating over freq counts * heap operations dominate, O(n log k) | Allocation for freq count of n elements dominates, O(n) |
QuickSelect | Average: O(n) Worst: O(n2) | O(n) | Good pivot splits array in half on average O(n), Bad pivot each iteration, smallest or largest, causes O(n2) | Allocation for freq count of n elements dominates, O(n) |
BucketSort | O(n) | O(n) | Iterating over list dominates, O(n) | Allocating n buckets dominates, O(n) |
Bug: Inverted Binary Search if statement:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
def partitionSection(left, right, randIndex):
freqPartition = frequency[unique[randIndex]]
leftPartition = left
unique[randIndex], unique[right] = unique[right], unique[randIndex]
for i in range (left, right):
if frequency[unique[i]] > freqPartition:
unique[i], unique[leftPartition] = unique[leftPartition], unique[i]
leftPartition += 1
unique[right], unique[leftPartition] = unique[leftPartition], unique[right]
return leftPartition
def quickSelectBinarySearchHelper(left, right, finalMarker):
# ran out of space
if left == right:
return
randIndex = random.randint(left, right)
resIndex = partitionSection(left, right, randIndex)
if resIndex == finalMarker:
return
# BUG:
# this format is confusing
# we are searching for the finalMarker, so always have that on
# the left.
# The correct order should be:
# if the finalMarker is less than the result, then we shift right
# but because of the confused order, we are doing the reverse
# INSTEAD:
# elif finalMarker < resIndex
# quickSelectBinarySearchHelper(left, resIndex-1, finalMarker)
elif resIndex < finalMarker:
quickSelectBinarySearchHelper(left, resIndex-1, finalMarker)
else:
quickSelectBinarySearchHelper(resIndex+1, right, finalMarker)
frequency = defaultdict(int)
for n in nums:
frequency[n] += 1
unique = list(frequency.keys())
n = len(unique)
l, r = 0, n-1
finalIndex = k-1
quickSelectBinarySearchHelper(l, r, finalIndex)
return unique[:k]
Solution 1: MinHeap Track K Elements - HashMap/Algorithm
def topKFrequent(self, nums: List[int], k:int) -> List[int]:
# Note:
# Heap only guarantees that the root is the min or max, and
# provides no guarantee about the relative order of nodes.
# The minHeap root will hold the smallest element,
# we iterate and add elements,
# we only remove the root/smallest element when heap exceeds size k,
# ensuring heap holds the k largest frequencies seen so far.
# sc: freq count for list, m unique elements in worst case O(n)
freq = defaultdict(int)
# tc: iterate over list O(n)
for num in nums:
freq[num] += 1
# sc: minHeap holds smallest k elements O(k)
minHeap = []
# tc: iterate over freq count, m unique elements in worst case O(n)
for num, freq in freq.items():
# tc: push operation O(log n)
heapq.heappush(minHeap, (freq, num))
# if heap grows to size k + 1
if len(minHeap) > k:
# tc: pop smallest element O(log k)
heapq.heappop(minHeap)
# tc: grab k elements from minHeap, worst case grab n elements O(n)
result = [num for freq, num in minHeap]
# expanded for loop:
# for _, num in minHeap:
# result.append(num)
# overall: tc O(n log n)
# overall: sc O(n)
return result
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iterate List | O(n) | O(n) | Iterate over list O(n) | Allocation for n elements O(n) |
Iterate Freq Counts + MinHeap | O(n log k) | O(k) | Iterate over counts * Push/BubbleUp, Pop O(log k) for heap of size k | Allocation for heap of max k elements O(k) |
MinHeap Grab k | O(k) | O(k) | Iterate over minHeap of k size O(k) | Result list of size k O(k) |
Overall | O(n log k) | O(n) | Iterating over freq counts * heap operations dominate, O(n log k) | Allocation for freq count of n elements dominates, O(n) |
Solution 2: QuickSelect BinarySearch High to Low Sort - Hashmap/Algorithm
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
# Note:
# QuickSelect is a modified quicksort
# A finalPartitionMarker marks the correct position,
# at which point, elements to the left and right of finalPartitionMarker
# are guaranteed to be smaller/larger.
# QuickSelect avoids sorting the entire array and
# only isolates the minimum elements we need to complete task
# In place partition of subarray [left,right] relative to random pivot index value
def partitionSection(left, right, randPivotElemIndex):
# tc: subarray m elements, n in worst case O(n)
# Frequency count of random pivot element
pivotElemFreq = frequency[unique[randPivotElemIndex]]
# Move pivot to end
# Tuple unpacking in python allows to swap two variables
unique[randPivotElemIndex], unique[right] = unique[right], unique[randPivotElemIndex]
# Index to partition larger elements to the left boundary of subarray
partitionIndex = left
# tc: iterate over subarray, n in worst case O(n)
for i in range(left, right):
# Partition elements with (pivotElemFreq < freq), move to left -> 'greater'
if pivotElemFreq < frequency[unique[i]]:
# larger freq element on left, sorting [high... low] count
# -> allows use to grab top freq k elements [:k]
unique[i], unique[partitionIndex] = unique[partitionIndex], unique[i]
partitionIndex += 1
# set pivot to correct partitioned index,
# elements left and right of pivot follow [greater... pivot ... lesser] freq count
unique[partitionIndex], unique[right] = unique[right], unique[partitionIndex]
return partitionIndex
# tc: average recursion depth O(log m)
# sc: recursion stack for in place partitioning on average O(log m)
def quickSelectBinarySearchHelper(left, right, finalPartitionMarker):
# Base Case:
if left == right:
return
# Random pivot index,
# differs from QuickSort "median-of-three" approach
# random pivot focuses on finding the kth smallest or largest,
# rather than fully sorting the array,
# random pivot avoids degrading to worst case O(n^2)
randPivotElemIndex = random.randint(left, right)
# partition subarray
resultPivotElemIndex = partitionSection(left, right, randPivotElemIndex)
# Base Case: kth item in correct sorted [high... low] place
if finalPartitionMarker == resultPivotElemIndex:
# -> allows us to grab k top freq count by [:k]
return
# Binary Search Modification:
# Selects next subarray and partition pivot
# resultPivotElemIndex must recurse towards side where finalPartitionMark is on
# final is in left partition of result: [left, resultPivot-1]
elif finalPartitionMarker < resultPivotElemIndex:
quickSelectBinarySearchHelper(left, resultPivotElemIndex - 1, finalPartitionMarker)
# final is in right partition of result: [resultPivot+1, right]
else:
quickSelectBinarySearchHelper(resultPivotElemIndex + 1, right, finalPartitionMarker)
# tc: iterate over list O(n)
# sc: allocate for list, n unique elements worst case O(n)
frequency = defaultdict(int)
for num in nums:
frequency[num] += 1
# QuickSelect BinarySearch to place kth most frequent element
# in its partitioned location, to then splice [:k]
unique = list(frequency.keys())
n = len(unique)
finalPartition = k - 1
l, r = 0, n - 1
# For High to Low (descending) indexing:
# with list of 6 elements, grab largest 2 elements
# 2 - 1 = 1 (our 2nd largest)
# splice [:2] = [0, 1]
quickSelectBinarySearchHelper(l, r, finalPartition)
# overall: tc O
# overall: sc
return unique[:k]
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iterate List | O(n) | O(n) | Iterate over list O(n) | Allocation for n elements O(n) |
QuickSort Partition | Average: O(n) Worst: O(n2) | O(1) | Good pivot splits array in half on average O(n), Bad pivot each iteration, smallest or largest, causes O(n2) | Partition of freq count array occurs in place O(1) |
QuickSort Recursion | O(k) | O(k) | Iterate over buckets for k steps O(k) | Result list of size k O(k) |
Result Splice | O(k) | O(k) | Splicing array for first k elements | Result list of k elements |
Overall | Average: O(n) Worst: O(n2) | O(n) | Bad pivot partition dominates O(n2) | Allocation for freq count of n elements dominates, O(n) |
Solution 3: BucketSort by Count - Hashmap/Algorithm
def topKFrequent(self, nums: List[int], k:int) -> List[int]:
# tc: iterate over list of n integers O(n)
# sc: frequency count for unique integers O(m)
count = defaultdict(int)
for key in nums:
count[key] += 1
# numBuckets index = items with that frequency count
# numBuckets must account for max freq count case, where list is full of only 1 element
# numBuckets must account for empty list, (but still needs a bucket index 0), so we need
# len(num)+1,
# empty: 1 bucket: [0] representing empty group
# else: list of len 5, 6 buckets: [0, 1, 2, 3, 4, 5] each representing length group,
# with max freq count of 5
numBuckets = len(nums) + 1
# list of empty lists, numBucket amount of times
# tc: iterate over list length O(n)
# sc: create len(list)+1 buckets O(n)
freqBuckets = [[] for i in range(numBuckets)]
# tc: iterate over frequency list for m unique integer tuples (int, occurrences) O(m)
for int, occurrences in count.items():
freqBuckets[occurrences].append(int)
# sc: grabbing top k integers, n worst case O(k)
res = []
# tc: iterate over len(list)+1 buckets O(n)
for i in range(len(freqBuckets) - 1, 0, -1):
# tc: iterate over all elements in curr bucket O(m)
for num in freqBuckets[i]:
# tc: insert operation O(1)
res.append(num)
# tc: continue while less than k elements grabbed O(k)
if len(res) == k:
return res
# overall: tc O(n)
# overall: sc O(n)
return res
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Iterate list | O(n) | O(n) | Iteration over list O(n) | Allocation for n elements O(n) |
Bucket Creation | O(n) | O(n) | Iterate len(list)+1 steps O(n) | Allocate len(list)+1 buckets O(n) |
Bucket Population | O(n) | O(n) | Inserting n integers into buckets O(n) | Buckets store n integers O(m) |
Bucket Grab k | O(k) | O(k) | Iterate over buckets for k steps O(k) | Result list of size k O(k) |
Overall | O(n) | O(n) | Iterating over list dominates, O(n) | Allocating n buckets dominates, O(n) |
238. Product of Array Except Self ::2:: - Medium
Topics: Array, Prefix Sum
Intro
Given an integer array nums, return an array answer such that answer[i] is equal to the product of all the elements of nums except nums[i]. The product of any prefix or suffix of nums is guaranteed to fit in a 32-bit integer. You must write an algorithm that runs in O(n) time and without using the division operation.
Input | Output |
---|---|
[1,2,3,4] | [24,12,8,6] |
[-1,1,0,-3,3] | [0,0,9,0,0] |
Constraints:
Product of any prefix or suffix is guaranteed to fit into 32 bit integer
Abstraction
Given a list of nums, return a list of nums that is the product of the array excluding itself. For num n, the result should be the product of all the numbers to the left of it, times the product of all the numbers to the right of it.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Prefix & Postfix | O(n) | O(n) | Three iterations, prefix, postfix, result, O(3n) ~= O(n) | Allocation for 3 arrays O(3n) ~= O(n) |
Prefix & Postfix | O(n) | O(1) | Two iterations, prefix, postfix, O(n) | Allocation for 1 array O(n) |
Solution 1: Prefix, Postfix, Result Arrays - Array/In Place Transformations
def productExceptSelf(self, nums: List[int]) -> List[int]:
# sc: prefix, postfix, result, arrays O(n)
prefix = [1] * len(nums)
postfix = [1] * len(nums)
res = [1] * len(nums)
# Prefix
# tc: iterate list O(n)
for i in range(1, len(nums)):
# calculate prefix of i = (prefix of i - 1) * (nums[i - 1])
prefix[i] = prefix[i - 1] * nums[i - 1]
# Postfix
# postfix starts at 1 = the postfix of len(n) - 1
# start reverse iteration at: len(n) - 2
# tc: iterate list O(n)
for i in range(n - 2, -1, -1):
# calculate postfix of i = (postfix of i + 1) * (nums[i + 1])
postfix[i] = postfix[i + 1] * nums[i + 1]
# Result
# tc: iterate list O(n)
for i in range(n):
# calculate product except self for i = (prefix of i) * (postfix of i)
res[i] = prefix[i] * postfix[i]
# overall: tc O(n)
# overall: sc O(n)
return res
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Prefix | O(n) | O(n) | Iterate over list O(n) | Allocation for prefix array O(n) |
Postfix | O(n) | O(n) | Iterate over list O(n) | Allocation for postfix array O(n) |
Result | O(n) | O(n) | Iterate over list O(n) | Allocation for result array O(n) |
Overall | O(n) | O(n) | Iteration over list dominates, O(n) | Allocation for list dominates, O(n) |
Solution 2: Result Array - Array/In Place Transformations
def productExceptSelf(self, nums: List[int]) -> List[int]:
# sc: result array O(1)
res = [1] * len(nums)
# Prefix
# tc: iterate list O(n)
for i in range(1, len(nums)):
# (prefix of i) = (prefix of i - 1) * (num at i - 1)
res[i] = res[i - 1] * nums[i - 1]
# Running postfix through reverse iteration,
# postfix starts at 1 = the postfix of len(n) - 1
postfix = 1
# tc: iterate list in reverse O(n)
for i in range(len(nums) - 1, -1, -1):
# calculate product except self for i = (prefix of i) * (postfix of i)
res[i] *= postfix
# update postfix = (postfix of i) * (num at i)
postfix *= nums[i]
# overall: tc O(n)
# overall: sc O(1)
return res
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Prefix | O(n) | O(1) | Iteration over list O(n) | Allocation for result array O(1) |
Result | O(n) | O(1) | Iteration over list O(n) | Allocation for result array O(1) |
Overall | O(n) | O(1) | Iteration over list dominates, O(n) | Allocation for result array dominates, O(1) |
36. Valid Sudoku ::2:: - Medium
Topics: Array, Hash Table, Matrix
Intro
Determine if a 9 x 9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules: Each row must contain the digits 1-9 without repetition. Each column must contain the digits 1-9 without repetition. Each of the nine 3 x 3 sub-boxes of the grid must contain the digits 1-9 without repetition.
Constraints:
Only the currently filled cells need to be validated, regardless is sudoku board is actually solvable or not.
Input | Output |
---|---|
a sudoku table is too big to put here lol | just look at -> sudoku board example |
Abstraction
Abstract board into sets of rows, cols, and boxes, and validate whether duplicate values exist in any set.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Set | O(n2) | O(n2) | Iterate over grid dominates, O(n2) | Allocation for grid cells O(n2) |
Array | O(n3) | O(n2) | Iterate over grid * linear array in operation dominates, O(n3) | Allocation for grid dominates, O(n2) |
Bug: Initialize DefaultDict of Sets Incorrectly
def isValidSudoku(self, board: List[List[str]]) -> bool:
# BUG:
# defaultdict argument must be callable or None
# --------
# Differences: [], list, set:
# []: instance of the list, not callable
# list: class creates a new list object when called, callable
# set: class creates a new set object when called, callable
# Instead:
# defaultdict(set)
rows = defaultdict([])
cols = defaultdict([])
boxs = defaultdict([])
for r in range(9):
for c in range(9):
tmp = board[r][c]
if tmp != ".":
boxKey = (r//3, c//3)
if (tmp in rows[r] or
tmp in cols[c] or
tmp in boxs[boxKey]):
return False
rows[r].add(tmp)
cols[c].add(tmp)
boxs[boxKey].add(tmp)
return True
Bug: 81 Tuples instead of 9 Tuples for Boxes
def isValidSudoku(self, board: List[List[str]]) -> bool:
rows = defaultdict(set)
cols = defaultdict(set)
boxs = defaultdict(set)
for r in range(9):
for c in range(9):
tmp = board[r][c]
if tmp != ".":
# BUG:
# will create 81 possible boxes (9 rows * 9 columns)
# Instead:
# boxKey = tuple((r//3, c//3))
# creates 9 boxes
# (9//3), (9//3) = (1-3, 1-3) = 9 possibilities
boxKey = tuple((r, c))
if (tmp in rows[r] or
tmp in cols[c] or
tmp in boxs[boxKey]):
return False
rows[r].add(tmp)
cols[c].add(tmp)
boxs[boxKey].add(tmp)
return True
Solution 1: DefaultDict Matrix - Hashmap/Representation
def isValidSudoku(self, board: List[List[str]]) -> bool:
# sc: dictionary for rows, columns, boxes O(n)
cols = defaultdict(set)
rows = defaultdict(set)
grids = defaultdict(set)
# tc: iterate over grid (r * c) O(n^2)
for r in range(9):
for c in range(9):
cell = board[r][c]
if cell != ".":
# index between 9 box sets by tuple key (r/3, c/3)
gridTuple = (r // 3, c // 3)
# tc: lookup operation O(1)
if (cell in rows[r] or
cell in cols[c] or
cell in grids[gridTuple] ):
return False
# tc: put operation O(1)
cols[c].add(cell)
rows[r].add(cell)
grids[gridTuple].add(cell)
# overall: tc O(n^2)
# overall: sc O(n)
return True
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Row, Col, Box Init | O(1) | O(n2) | No iteration O(1) | Dictionary allocation for grid cells O(n2) |
Iterate over grid | O(n2) | O(1) | Iterate over grid O(n2) | No allocation O(1) |
Operations | O(1) | O(1) | Lookup, Put operation O(1) | No allocation for operations O(1) |
Overall | O(n2) | O(n2) | Iterate over grid dominates, O(n2) | Allocation for grid cells O(n2) |
Solution 2: [[]] Matrix - Hashmap/Representation
def isValidSudoku(self, board: List[List[str]]) -> bool:
# sc: arrays for rows, columns, boxes O(n)
rows = [[], [], [], [], [], [], [], [], []]
col = [[], [], [], [], [], [], [], [], []]
grids = [[], [], [], [], [], [], [], [], []]
# tc: iterate over grid (r * c) O(n^2)
for r in range(9):
for c in range(9):
cell = board[r][c]
if cell != ".":
# indexing box sets by unique int calculation:
# ------------
# (j//3) = 0, 1, 2
# (i//3) * 3 = 0, 3, 6
# ------------
# 0 1 2
# 0 0 1 2
# 3 3 4 5
# 6 6 7 8
# ------------
# allows to 9 indexing options
boxKey = (j//3) + ((i//3) * 3)
# tc: lookup operation O(1)
if (cell in rows[r] or
cell in col[c] or
cell in grids[boxKey]):
return False
# sc: put operation O(1)
col[c].append(cell)
rows[r].append(cell)
grids[boxKey].append(cell)
# overall: tc O(n^2)
# overall: sc O(n)
return True
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Row, Col, Box Init | O(1) | O(n2) | No iteration O(1) | Dictionary allocation for grid cells O(n2) |
Iterate over grid | O(n2) | O(1) | Iterate over grid O(n2) | No allocation O(1) |
Operations | O(n) | O(1) | In array operation O(n), Put operation O(1) | No allocation for operations O(1) |
Overall | O(n3) | O(n2) | Iterate over grid * linear array in operation dominates, O(n3) | Allocation for grid dominates, O(n2) |
Note: Solution 2 is faster than Solution 1 for small sets, such as our 9x9 board, due to defaultdict overhead
128. Longest Consecutive Sequence ::3:: - Medium
Topics: Array, Hash Table, Union Find
Intro
Given an array of integers nums, return the length of the longest consecutive sequence of elements. A consecutive sequence is a sequence of elements in which each element is exactly 1 greater than the previous element You must wrtie an algorithm that runs in O(n) time.
Input | Output |
---|---|
[100,4,200,1,3,2] | 4 from [1, 2, 3, 4] |
[0,3,7,2,5,8,4,6,0,1] | 9 from [0, 1, 2, 3, 4, 5, 6, 7, 8] |
[1, 0, 1, 2] | 3 from [0, 1, 2] |
Constraints:
0 ≤ nums.length ≤ 105
-109 ≤ nums[i] ≤ 109
Abstraction
Give a list of integers, find the longest sequence of increasing integers. This sequence does not have to follow the left to right order of the array.
Space & Time Complexity
Solution | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Brute Force | ||||
Hashmap |
Brute Force
def longestConsecutive(nums: List[int]) -> int:
longest = 0
# time complexity: Iterate over list of n length O(n)
for num in nums:
currLen = 1
# time complexity: Iterate over potential n consecutive integers O(n)
# time complexity: Lookup in list takes O(n)
# time complexity: O(n^2)
while num + currLen in nums:
currLen += 1
# compare to global max
longest = max(longest, currLen)
# Overall time complexity: O(n^3) for worst-case where we scan sequences n times for n numbers
# Space complexity: O(1)
return longest
Aspect | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Iterate | O(n) | O(1) | Iterate over list of n length O(n) | No additional memory allocation for iteration O(1) |
Iterate consecutive sequence | O(n) | O(1) | Iterate over potential n consecutive integers O(n) | No additional memory allocation for iteration O(1) |
List lookup | O(n) | O(1) | Lookup in list takes O(n) | No additional memory allocation for list lookup O(1) |
Overall | O(n3) | O(1) | Iterating * Iterating * Lookup dominate leading to O(n3) | No additional memory allocation O(1) |
Bug: Empty list for Union tree leads to fail
def longestConsecutive(self, nums: List[int]) -> int:
# BUG:
# need to check for empty list case
# or else max(size.values()) will return error
# Instead:
# if len(nums) == 0
# return 0
parent = {}
size = {}
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(x, y):
root_x = find(x)
root_y = find(y)
if root_x != root_y:
size_x = size[root_x]
size_y = size[root_y]
if size_x < size_y:
parent[root_x] = root_y
size[root_y] += size[root_x]
else:
parent[root_y] = root_x
size[root_x] += size[root_y]
setNum = set(nums)
for n in setNum:
parent[n] = n
size[n] = 1
for n in setNum:
if (n+1) in parent:
union(n, n+1)
return max(size.values())
Solution 1: HashMap Continent Boundaries - HashMap/Representation
def longestConsecutive(nums: List[int]) -> int:
longest = 0
# sc: hashmap of list of n length O(n)
seqLen = defaultdict(int)
# tc: iterate list O(n)
# sc: ignore duplicates O(n)
numSet = set(nums)
for num in nums:
# Get lengths of neighbor sequences
leftContinentLenFromBoundary = seqLen[num - 1]
rightContinentLenFromBoundary = seqLen[num + 1]
# Calculate new bridged sequence length
bridgedLen = 1 + leftContinentLenFromBoundary + rightContinentLenFromBoundary
# Update new continent boundaries
seqLen[num - leftContinentLenFromBoundary] = bridgedLen
seqLen[num + rightContinentLenFromBoundary] = bridgedLen
# Update the global max
longest = max(longest, bridgedLen)
# overall: tc
# overall: sc
return longest
Aspect | Time Complexity | Space Complexity | Time Remark | Space Remark |
---|---|---|---|---|
Set | O(n) | O(n) | Converting list unique list O(n) | Allocation for unique set O(n) |
Iterate over set | O(n) | O(n) | Iterate set O(n) | Dictionary allocation for sequence length per element O(n) |
Operations | O(1) | O(1) | Get, Put operation O(1) | No allocation O(1) |
Overall | O(n) | O(n) | Iterate unique list dominates O(n) | Allocation for unique elements, sequence length dominates, O(n) |
Solution 2: Set Rummy Run - Hashmap/Representation
def longestConsecutive(self, nums: List[int]) -> int:
# Note:
# although rummy has an inner while loop,
# each number is part of at most one sequence
# and are not checked again in future sequences
# thus, the number of times the while loop runs across
# all iterations is n.
# so the overall tc is O(n), instead of O(n^2)
longest = 0
# sc: ignore duplicates O(n)
numSet = set(nums)
# tc: iterate list O(n)
for num in numSet:
# tc: lookup operation takes constant O(1)
# found start of a new run
if (num - 1) not in numSet:
newRunLen = 1
# tc: iterate over sequence/list O(n)
# rummy run until missing an element
while (num + newRunLen) in numSet:
newRunLen += 1
# compare with global max sequence
longest = max(longest, newRunLen)
# overall: tc O(n)
# overall: sc O(n)
return longest
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Set | O(n) | O(n) | Converting list unique list O(n) | Allocation for unique set O(n) |
Iterate over set | O(n) | O(1) | Iterate set O(n) * Sequence Start Check In Operation O(1) | No allocation O(1) |
Rummy Iterate | O(n) | O(1) | Iterate while valid sequence, n steps across all iterations O(n) | No allocation O(1) |
Overall | O(n) | O(n) | Iterate set dominates, O(n) | Set allocation dominates, O(n) |
Solution 3: Union Find Tree - HashMap/Algorithm
def longestConsecutive(self, nums: List[int]) -> int:
# Check Empty: avoid max(size.values()) return error
if len(nums) == 0:
return 0
# Union Find Initialization:
# sc: dictionary for list O(n)
# Stores parent of each number
parent = {}
# sc: dictionary for list O(n)
# Stores the size of connected components
size = {}
# tc: Amortized O(α(n)) per operation
# Find + path compression
def find(x):
# if the parent of x is not itself, we have not reached the representative
if parent[x] != x:
# Path compression
# set parent of curr, to parent of parent
parent[x] = find(parent[x])
# return representative of group
# either return self
# or return parent of parent
return parent[x]
# tc: Amortized O(α(n)) per operation
# Union operation with size optimization
def union(x, y):
# Get parent for trees x and y
rootX = find(x)
rootY = find(y)
# If trees do not share representative
if rootX != rootY:
# Attach smaller tree to larger tree,
# update larger tree size and
# smaller tree parent
xSize = size[rootX]
ySize = size[rootY]
if ySize < xSize:
parent[rootY] = rootX
size[rootX] += size[rootY]
else:
parent[rootX] = rootY
size[rootY] += size[rootX]
# tc: iterate list O(n)
# Initialize Union Find:
# 1. Set all parents to self
# 2. Set all sizes to 1
for num in nums:
# ignore duplicates
if num not in parent:
parent[num] = num
size[num] = 1
# tc: iterate list O(n)
# Join() consecutive numbers,
# stores sequences represented as trees
for num in nums:
if num + 1 in parent:
union(num, num + 1)
# return tree (sequence of numbers) with
# the longest size/length
# overall: tc O(n)
# overall: sc O(n)
return max(size.values())
Aspect | Time Complexity | Space Complexity | Time Remarks | Space Remarks |
---|---|---|---|---|
Union Tree Init | O(n) | O(n) | Iterate list O(n) | Dictionary allocation for parent and size O(n) |
Union | Amortized O(α(n)) | O(1) | Each union operation is nearly constant with path compression and size optimization α(n) | No allocation O(1) |
Find | Amortized O(α(n)) | O(1) | Find with path compression is nearly constant O(α(n)) | No allocation O(1) |
Iterate Join | O(n) | O(1) | Iterate list O(n) * Union which is constant, so just O(n) | No allocation O(1) |
Get Largest | O(n) | O(1) | Grab max from dictionary O(n) | No allocation O(1) |
Overall | O(n) | O(n) | Iterate * Union dominates, O(n) | Dictionary allocation for parent and size dominates, O(n) |