Jc-alt logo
jc
data structures and algorithms

LeetCode: Arrays and Hashing

LeetCode: Arrays and Hashing
43 min read
#data structures and algorithms
Table Of Contents

Arrays & Hashing Intro:

LeetCode problems with solutions using hashmaps.

What is a Hashmap

A dictionary is just a direct mapping of keys->values:

Dictionary: ['a': 2, 'b': 2, 'c' : 3, ...]

A Hashmap is just a dictionary that maps keys->values using a hash function. Hashmaps are a common use case for hashing. We choose hashing to maximize randomness and minimize collisions between elements to keys.

Hashing With A Function

Hashing is simply a function. It takes in an input and spits out an output. Here is the function mod, which takes in an integer and spits out an integer:

  • 1 mod (3) = 1
  • 2 mod (3) = 2
  • 3 mod (3) = 0
  • 4 mod (3) = 1

Now this is could be our function for our hashmap, but it would lead to high collisions and low randomness as there are only 3 possible key results: 0, 1, and 2. So hashmaps are only efficient as the chosen hash function.

Choosing a Hash Function

With a good hash function: Insert, Lookup, and Delete take O(1). In this case, every element gets its own unique key and so a lookup using this function would be constant O(1).

With a bad hash function: Insert, Lookup, and Delete take O(n). Lets say our function is hash():

Handling Collisions

Even with a good hash function, we may run into collisions sometimes. In those cases, we handle collision using chaining with a linked list of elements.

Here, there are 5 keys or buckets we could hash to.

    hash()  ->  [key mod (5)]
    ----------------------------
    hash(10) = 10 % 5 = 0
    hash(22) = 22 % 5 = 2
    hash(31) = 31 % 5 = 1
    hash(14) = 14 % 5 = 4
    hash(17) = 17 % 5 = 2 (collision!)
Key (index)Value (element)
0[10]
1[31]
2[22, 17]
3[]
4[14]

Balancing Hash Table With Load Factor

Load factors are calculated to ensure a hash table maintains its O(1) time complexity.

Balancing occurs when a hash table has passed its set load factor.

Load Factor = n/m Where n = num of elements, m = num of keys

Thus, when certain fraction of the table size, say 75% full with 3 elements and 4 potential keys. The table must increase in size and rehash/reinsert every element.

The efficiency of a hash table decreases significantly without balancing, as this leads to increased collisions as the table fills up, which leads to time needed to resolve collisions, as we traverse through list to find the element in linear time O(n). Operations like search, insert, and delete, degrade from O(1) on average to O(n) in the worst case.

Usually, when the load factor is reached, the table size doubles to 2n. 0.75 or 75% full is a common load factor for a hash table.

Upon balancing, we need to rehash every element leading to n/2 additional space:

    New table size - current number of elements 
    (double table size * load factor) - current n elements
    (2 n * 0.75) 
    1.5 n  = New Load Factor Element Count

    New Load Factor Element Count - Current Element Count
    1.5 n - n 
    0.5
    n/2 = Additional space until next rebalance

Rehashing is expensive as we need to rehash every element with the new hash function into their new bucket, leading to O(n) for resizing and reinserting n elements.

Array Application: In place Transformations

We can perform transformations or reorderings on the array itself without using extra space.

Ex: Rotate an array to the right by k steps.

    def rotate(nums: List[int], k: int) -> None:
        
        n = len(nums)
        k %= n  # handle cases where k > n
        
        # Opposite ends reverse subarray [start, end]
        def reverse(start: int, end: int) -> None:
            
            # Reverse until hit middle of subarray
            while start < end:
                nums[start], nums[end] = nums[end], nums[start]
                start += 1
                end -= 1

        # Reverse entire array
        reverse(0, n - 1)
        # Reverse first k elements [0, k)
        reverse(0, k - 1)
        # Reverse remaining elements [k, n)
        reverse(k, n - 1)

HashMap Application: Representations

We can represent objects or data based on specific criteria.

Ex: Representing a string by character frequency

    def freqCount():

        # object
        s = "aabbcc"

        # representation
        freq = {}

        # mapping
        for char in s:
            freq[char] = freq.get(char, 0) + 1

    # freq = {'a': 2, 'b': 2, 'c': 2}

HashMap Application: Grouping by Criteria

We can group elements based on a defined criterion, such as sorting or categorization, using hashing to push values into the corresponding bucket.

Ex: Grouping string anagrams

    def groupAnagrams(strs):
        
        # groups
        anagrams = {}

        # for list
        for word in strs:

            # create key
            key = "".join(sorted(word))
            if key not in anagrams:
                anagrams[key] = []
            
            # hash key and put value into bucket
            anagrams[key].append(word)

    # anagrams = {'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat'], 'abt': ['bat']}

HashMap Application: Memoization in Dynamic Programming

We can store solutions to sub problems to avoid redundant calculations.

Ex: Fibonacci number computation with memoization

    def fib(n, memo={}):
        if n <= 1:
            return n
        if n not in memo:
            memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
        return memo[n]

HashMap Application: Backtracking with Caching for Pruning

We can cache previous paths or states to prune the search space effectively as we explore.

Ex: Subset Sum with caching:

    def subsetSum(nums, target):

        result = []

        # Cache to avoid exploring previously visited paths
        cache = {}

        def dfs_backtrack(current, index, total):
            
            # Check if already explored state
            state = (tuple(current), total)
            if state in cache:
                return

            # Mark as explored
            cache[state] = True

            # Early Prune -> no potential valid path is current total exceeds target
            if total > target:
                return
            
            # Valid solution -> no need to explore further, backtrack
            if total == target:
                result.append(list(current))
                return

            # Explore -> branches from current state
            for i in range(index, len(nums)):
                
                # Build
                current.append(nums[i])
                # Explore
                dfs_backtrack(current, i + 1, total + nums[i])
                # Backtrack
                current.pop()

        dfs_backtrack([], 0, 0)
        return result

HashMap Application: Representing Relationships

We can model relationships between entities.

Ex: Adjacency list for a graph

    def graph():

        # List of edges in the graph
        edges = [(1, 2), (2, 3), (1, 3)]

        # Adjacency list
        graph = {}

        # Build the adjacency list
        for u, v in edges:

            # If the node 'u' or 'v' is not yet in the graph, initialize it
            if u not in graph:
                graph[u] = []
            if v not in graph:
                graph[v] = []

            # Add 'v' to the list of neighbors for 'u' and 'u' to 'v'
            graph[u].append(v)
            graph[v].append(u)

    # graph = {1: [2, 3], 2: [1, 3], 3: [2, 1]}

HashMap Application: Index with Data Key

We can index into arrays based on data or data structures.

Ex: Store integer complements for quick lookup:

    def twoSum(nums, target):

        # Complement -> index
        complement_map = {}

        for i, num in enumerate(nums):
            
            # Calculate complement
            complement = target - num
            
            # Lookup(complement)
            if num in complement_map:
                return [complement_map[num], i]
            
            # Put(complement)
            complement_map[complement] = i

        return []

HashMap Application: Algorithm

There are cases where problem that seems to require a hashmap have an existing algorithm made for that problem.

Ex: Boyer Moore Voting Algorithm

def majorityElement(nums: List[int]) -> int:
    
    # Find element that appears more then floor(n/2) times

    # votes for candidate
    count = 0

    # current candidate
    candidate = None

    for num in nums:
        
        # Reset candidate
        if count == 0:
            candidate = num

        if num == candidate:
            count += 1
        else:
            count -= 1

    # Confirm the candidate ( optional if majority element is guaranteed)
    return candidate

217. Contains Duplicate ::2:: - Easy

Topics: Array, Hash Table, Sorting

Intro

Given an integer array nums, return true if any value appears at least twice in the array, return false if every element is distinct.

Example InputOutput
nums = [1,2,3,1]true
nums = [1,2,3,4]false
nums = [1,1,1,3,3,4,3,2,4,2]true

Constraints:

1 ≤ nums.length ≤ 105

-109 ≤ nums[i] ≤ 109

Abstraction

Given a list of elements, check for duplicates.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
HashmapO(n)O(n)Iterate over array O(n) + get, put in O(1)Dictionary size relative to input O(n)
Set()O(n)O(n)Iterate over array O(n) + get, in in O(1)Set size relative to input O(n)

Solution 1: Hashmap - Hashmap/Representation

    def containsDuplicate(self, nums: List[int]) -> bool:

        # sc: hashmap relative to input O(n)
        count = defaultdict(int)

        # tc: iterate over list O(n)
        for num in nums:

            # tc: in operation constant O(1)
            if count[num] >= 1:
                return True
            # tc: put operation constant O(1)
            count[num] += 1

        # overall: tc  O(n) 
        # overall: tc O(n)
        return False
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
IterationO(n)O(n)Iterate over array in linear O(n) + get, put in O(1)Dictionary relative to input O(n)
OverallO(n)O(n)Iteration over input dominates, O(n)Allocation for dictionary dominates, O(n)

Solution 2: Set - Hashmap/Representation

    def containsDuplicate(self, nums: List[int]) -> bool:
        
        # sc: set relative to input O(n)
        seen = set()

        # tc: iterate over list O(n)
        for n in nums:

            # tc: in operation O(1)
            if n in seen:
                return True
            # tc: add operation O(1)
            seen.add(n)
            
        # overall: tc O(n)
        # overall: sc O(n)
        return False
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
IterationO(n)O(n)Iteration over list in linear O(n) + in, add operation O(1)Set size relative to input O(n)
OverallO(n)O(n)Iteration over list dominates, O(n)Allocation relative to input dominates, O(n)

242. Valid Anagram ::2:: - Easy

Topics: Hash Table, String, Sorting

Intro

Given two strings s and t, return true if t is an anagram of s, and false otherwise. An Anagram is a word or phrase formed by rearranging the letters of a different word using all original letters exactly once.

Example Input 'S'Example Input 'T'Output
"anagram""nagaram"true
"rat""car"false

Constraints:

1 ≤ s.length, t.length ≤ 5 * 104

s and t consist of lowercase English letters

Abstraction

Given two strings determine if they have the same character count.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
HashmapO(n)O(1)Two iterations over each word, linear O(n) + lookup, put in constant time O(1)Dictionary will be proportional to input, space O(n)
Array of 26O(n)O(1)Single iteration over both words at same time O(n) + lookup, put in constant time O(1)Array will be constant 26 elements, space O(1)

Solution 1: Hashmap Double Pass - Hashmap/Representation

    def isAnagram(self, s: str, t: str) -> bool:
        
        # sc: hashmap of 26 elements O(1) 
        count = defaultdict(int)

        # tc: two iteration over string length O(n) * 2 ~= O(n)
        for x in s:
           count[x] += 1
        for x in t:
           count[x] -= 1

        # tc: iteration over char count O(n)
        for value in count.values():

            # tc: get() in constant O(1)
            if value != 0:
                return False

        # overall: tc O(n)
        # overall: sc O(1)
        return True
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
IterationO(n)O(n)Iterate over two strings in linear O(n) + put, get in O(1)Dictionary relative to input O(n)
VerificationO(n)O(1)Iterating over char count in linear O(n) + get in O(1)No additional memory O(1)
OverallO(n)O(n)Iteration over input dominates, O(n)Allocation for dictionary relative to input dominates, O(n)

Solution 2: Array of 26 Single Pass - Hashmap/Representation

    def isAnagram(self, s: str, t: str) -> bool:

        # Note:
        # ord() converts unicode char to int representation
        # and use int to index into array

        # sc: array of 26 constant O(1)
        count = [0] * 26

        # Check: mismatch length
        if len(s) != len(t):
            return False

        # tc: single iterate over both strings at same time O(n)
        for i in range(len(s)):
            count[ord(s[i]) - ord('a')] += 1
            count[ord(t[i]) - ord('a')] -= 1

        # tc: iterate over count array length 26 O(1)
        for val in count:
            if value != 0:
                return False

        # overall: tc O(n)
        # overall: sc O(1)
        return True
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
IterationO(n)O(1)One iteration over both strings in linear O(n) + put, get in O(1)Array of constant 26 size O(1)
VerificationO(n)O(1)Iterating over count array length 26 in constant O(1)No additional memory O(1)
OverallO(n)O(1)Iteration over input dominates, O(n)Allocation for array is constant O(1)

1. Two Sum I Not Sorted ::1:: - Easy

Topics: Array, Hash Table

Intro

Given an array of integers nums and an integer target, return indices of two numbers such that they add up to target. You can assume each test case only has one solution and you may cant use the same element twice. Answer can be returned in any order.

nums[]TargetOutput
[2,7,11,15]9[0,1]
[3,2,4]6[1,2]
[3,3]6[0,1]

Constraints:

Only one valid answer exists

2 ≤ nums.length ≤ 10^4

-10^9 ≤ target ≤ 10^9

Abstraction

We need to find two elements that add up to the target and return their indexes.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
Hashmap of ComplementsO(n)O(n)Iterate over array O(n)Allocation relative to input O(n)

Solution 1: Hashmap - Hashmap/Index with Data Key

    def twoSum(self, nums: List[int], target: int) -> List[int]:
        
        # Note:
        # dictionary is complement -> index 

        # sc: dictionary size relative to input O(n)
        tracking = {}
 
        # tc: iterate over list O(n)
        for i in range (len(nums)):

            complement = target - nums[i]

            # tc: in operation O(1)
            if complement in tracking:
                return [i, tracking[complement]]
    
            # tc: put operation O(1)
            tracking[nums[i]] = i

        # overall: tc O(n) 
        # overall: sc O(n)
        return []
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
IterationO(n)O(n)Iterate over list in linear O(n) + in, put O(n)Dictionary relative to input O(n)
OverallO(n)O(n)Iterate over list dominates, O(n)Allocation relative to input dominates, O(n)

49. Group Anagrams ::2:: - Medium

Topics: Array, Hash Table, String, Sorting

Intro

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An Anagram is a word formed by rearranging the letters of a different word using all the original letters exactly once.

InputOutput
["eat","tea","tan","ate","nat","bat"][["bat"],["nat","tan"],["ate","eat","tea"]]
[""][[""]]
["a"][["a"]]

Constraints:

strs[i] consists of lowercase English letters

Abstraction

Group the matching anagrams together.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
Array to Tuple KeyO(n * k)O(n * k)Iterate over list O(n) * iterate over string k length O(k)Dictionary relative to input O(n * k)
Array to String KeyO(n * k)O(n * k)Iterate over list O(n) * iterate over string k length O(k)Dictionary relative to input O(n * k)

Bug: Mutable object, list, cannot be hashed

    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        
        anaGroup = {}
        
        for word in strs:
            
            charCount = [0] * 26
            for char in word:
                charCount[ord(char) - ord('a')] += 1
            
            # BUG: 
            # Attempted to hash mutable list charCount
            if charCount not in anaGroup:
                anaGroup[charCount] = []
            
            anaGroup[charCount].append(word)
        
        return list(anaGroup.values())

Bug: Cannot use unsorted hashmap to tuple as key

    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        
        groupAna = {}

        for s in strs:
            count = defaultdict(int)

            for c in s:
                count[c] += 1

            tupKey = tuple(count)

            # BUG: 
            # hashmaps are not sorted
            # so tuple will vary (e.g., "cat" vs "tac")
            if tupKey not in groupAna:
                groupAna[tupKey] = []

            groupAna[tupKey].append(s)

        return list(groupAna.values())

Solution 1: Array to Tuple Key - Hashmap/Grouping by Criteria

    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        
        # Note: 
        # Tuples immutable, allowing them to be hashed
        # Array allows for constant ordered representation

        # sc: store k chars for n words O(n * k)
        anaGroup = {}

        # tc: iterate over list O(n) 
        for word in strs:
            
            # sc: array for 26 O(1)
            charCount = [0] * 26  
            
            # tc: iterate over string k length O(k)
            for char in word:
                charCount[ord(char) - ord('a')] += 1
            
            # sc: array of 26 to tuple O(1)
            key = tuple(charCount)  
            
            # tc: in operation O(1) 
            if key not in anaGroup:
                anaGroup[key] = []  

            # tc: put operation O(1)
            anaGroup[key].append(word)  


        # overall: tc O(n * k)
        # overall: sc O(n * k)
        return list(anaGroup.values())
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Iterate listO(n)O(1)Iterate over list O(n)No allocation for iteration O(1)
Iterate stringO(k)O(1)Iterate over string k length O(k)Array of 26 per iteration O(1)
Store string in anagram groupO(1)O(n * k)Put operation constant O(1)Dictionary stores n words of k length O(n * k)
OverallO(n * k)O(n * k)Iteration over list and strings dominate, O(n * k)Allocation for all strings dominates, O(n * k)

Solution 2: Array to String Key - Hashmap/Grouping by Criteria

    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:

        # Note: 
        # Appending to list then joining is more efficient than repeatedly 
        # appending to a string, it avoids the overhead of creating new string
        # objects on each append, overall: list in O(m), string in O(m^2)

        # sc: stores n tuple keys O(n) and lists of original k strings O(k), O(n * k)
        anaGroup = {}

        # tc: iterate over list O(n)
        for word in strs:

            # sc: array of 26 constant O(1)
            charCount = [0] * 26
            
            # tc: iterate over string O(m)
            for char in word:
                charCount[ord(char) - ord('a')] += 1
            
            # tc: array of 26 to list O(1)
            key_parts = []
            for count in charCount:
                key_parts.append(str(count))
                # delimiter
                key_parts.append("#")  

            # tc: concat list of 26 to string O(1)
            key = ''.join(key_parts)

            # tc: in operation O(1)
            if key not in anaGroup:
                anaGroup[key] = []

            # tc: put operation O(1)
            anaGroup[key].append(word)

        # overall: tc O(n * k)
        # overall: sc O(n * k)
        return list(anaGroup.values())
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Iterate listO(n * m)O(1)Iterate over list O(n)No allocation for iteration O(1)
Iterate stringO(k)O(1)Iterate string of k length O(k)Array of 26 O(1)
OverallO(n * k)O(n * k)Iterating over list and strings dominate, O(n * k)Allocation for all strings dominates, O(n * k)

347. Top K Elements in List ::3:: - Medium

Topics: Array, Hash Table, Divide and Conquer, Sorting, Heap (Priority Queue), Bucket Sort, Counting, Quickselect

Intro

Given an integer array nums and an integer k, return the k most frequent element within the array. Test cases are generated such that the answer is always unique. You may return the output in any order

InputkOutput
[1,2,2,3,3,3,3]2[2,3]
[7,7]1[7]

Constraints:

1 ≤ k ≤ number of distinct elements in nums

-1000 ≤ nums[i] ≤ 1000

Abstraction

To find the k most frequent elements, we must first create an occurrence counter for each element in the list. Now that we have the count, we just grab the top k highest occurring elements.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
MinHeapO(n log k)O(n)Iterating over freq counts * heap operations dominate, O(n log k)Allocation for freq count of n elements dominates, O(n)
QuickSelectAverage: O(n) Worst: O(n2)O(n)Good pivot splits array in half on average O(n), Bad pivot each iteration, smallest or largest, causes O(n2)Allocation for freq count of n elements dominates, O(n)
BucketSortO(n)O(n)Iterating over list dominates, O(n)Allocating n buckets dominates, O(n)

Bug: Inverted Binary Search if statement:

    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        
        def partitionSection(left, right, randIndex):
            freqPartition = frequency[unique[randIndex]]
            leftPartition = left
            unique[randIndex], unique[right] = unique[right], unique[randIndex]

            for i in range (left, right):
                if frequency[unique[i]] > freqPartition:
                    unique[i], unique[leftPartition] = unique[leftPartition], unique[i]
                    leftPartition += 1
            
            unique[right], unique[leftPartition] = unique[leftPartition], unique[right]

            return leftPartition

        def quickSelectBinarySearchHelper(left, right, finalMarker):
            
            # ran out of space
            if left == right:
                return

            randIndex = random.randint(left, right)
            resIndex = partitionSection(left, right, randIndex)

            if resIndex == finalMarker:
                return

            # BUG:
            # this format is confusing
            # we are searching for the finalMarker, so always have that on 
            # the left.
            # The correct order should be:
            # if the finalMarker is less than the result, then we shift right
            # but because of the confused order, we are doing the reverse
            # INSTEAD:
            # elif finalMarker < resIndex 
            #   quickSelectBinarySearchHelper(left, resIndex-1, finalMarker)
            elif resIndex < finalMarker:
                quickSelectBinarySearchHelper(left, resIndex-1, finalMarker)
            else:
                quickSelectBinarySearchHelper(resIndex+1, right, finalMarker)

        frequency = defaultdict(int)
        for n in nums:
            frequency[n] += 1
        
        unique = list(frequency.keys())
        n = len(unique)
        l, r = 0, n-1

        finalIndex = k-1
        quickSelectBinarySearchHelper(l, r, finalIndex)
        return unique[:k]

Solution 1: MinHeap Track K Elements - HashMap/Algorithm

    def topKFrequent(self, nums: List[int], k:int) -> List[int]:

        # Note: 
        # Heap only guarantees that the root is the min or max, and
        # provides no guarantee about the relative order of nodes. 
        # The minHeap root will hold the smallest element,
        # we iterate and add elements, 
        # we only remove the root/smallest element when heap exceeds size k, 
        # ensuring heap holds the k largest frequencies seen so far.

        # sc: freq count for list, m unique elements in worst case O(n)
        freq = defaultdict(int)

        # tc: iterate over list O(n)
        for num in nums: 
            freq[num] += 1

        # sc: minHeap holds smallest k elements O(k)
        minHeap = []

        # tc: iterate over freq count, m unique elements in worst case O(n)
        for num, freq in freq.items(): 

            # tc: push operation O(log n)
            heapq.heappush(minHeap, (freq, num)) 
            
            # if heap grows to size k + 1
            if len(minHeap) > k:

                # tc: pop smallest element O(log k)
                heapq.heappop(minHeap) 
        
        # tc: grab k elements from minHeap, worst case grab n elements O(n)
        result = [num for freq, num in minHeap]

        # expanded for loop:
        # for _, num in minHeap:
        #    result.append(num) 

        # overall: tc O(n log n)
        # overall: sc O(n)
        return result
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Iterate ListO(n)O(n)Iterate over list O(n)Allocation for n elements O(n)
Iterate Freq Counts + MinHeapO(n log k)O(k)Iterate over counts * Push/BubbleUp, Pop O(log k) for heap of size kAllocation for heap of max k elements O(k)
MinHeap Grab kO(k)O(k)Iterate over minHeap of k size O(k)Result list of size k O(k)
OverallO(n log k)O(n)Iterating over freq counts * heap operations dominate, O(n log k)Allocation for freq count of n elements dominates, O(n)

Solution 2: QuickSelect BinarySearch High to Low Sort - Hashmap/Algorithm

    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    
        # Note: 
        # QuickSelect is a modified quicksort
        # A finalPartitionMarker marks the correct position, 
        # at which point, elements to the left and right of finalPartitionMarker
        # are guaranteed to be smaller/larger.
        # QuickSelect avoids sorting the entire array and
        # only isolates the minimum elements we need to complete task

        # In place partition of subarray [left,right] relative to random pivot index value
        def partitionSection(left, right, randPivotElemIndex):
            
            # tc: subarray m elements, n in worst case O(n)
            
            # Frequency count of random pivot element
            pivotElemFreq = frequency[unique[randPivotElemIndex]]

            # Move pivot to end
            # Tuple unpacking in python allows to swap two variables
            unique[randPivotElemIndex], unique[right] = unique[right], unique[randPivotElemIndex]
            
            # Index to partition larger elements to the left boundary of subarray
            partitionIndex = left

            # tc: iterate over subarray, n in worst case O(n)
            for i in range(left, right):

                # Partition elements with (pivotElemFreq < freq), move to left -> 'greater'
                if pivotElemFreq < frequency[unique[i]]:  

                    # larger freq element on left, sorting [high... low] count 
                    # -> allows use to grab top freq k elements [:k]
                    unique[i], unique[partitionIndex] = unique[partitionIndex], unique[i]
                    partitionIndex += 1

            # set pivot to correct partitioned index,
            # elements left and right of pivot follow [greater... pivot ... lesser] freq count
            unique[partitionIndex], unique[right] = unique[right], unique[partitionIndex]
            return partitionIndex

        # tc: average recursion depth O(log m)
        # sc: recursion stack for in place partitioning on average O(log m)
        def quickSelectBinarySearchHelper(left, right, finalPartitionMarker):
            
            # Base Case: 
            if left == right:
                return

            # Random pivot index,
            # differs from QuickSort "median-of-three" approach
            # random pivot focuses on finding the kth smallest or largest,
            # rather than fully sorting the array, 
            # random pivot avoids degrading to worst case O(n^2)
            randPivotElemIndex = random.randint(left, right)

            # partition subarray
            resultPivotElemIndex = partitionSection(left, right, randPivotElemIndex)
            
            # Base Case: kth item in correct sorted [high... low] place
            if finalPartitionMarker == resultPivotElemIndex:
                # -> allows us to grab k top freq count by [:k]
                return 
            
            # Binary Search Modification:
            # Selects next subarray and partition pivot
            # resultPivotElemIndex must recurse towards side where finalPartitionMark is on

            # final is in left partition of result: [left, resultPivot-1]
            elif finalPartitionMarker < resultPivotElemIndex:
                quickSelectBinarySearchHelper(left, resultPivotElemIndex - 1, finalPartitionMarker)
            
            # final is in right partition of result: [resultPivot+1, right]
            else:
                quickSelectBinarySearchHelper(resultPivotElemIndex + 1, right, finalPartitionMarker)

        # tc: iterate over list O(n)
        # sc: allocate for list, n unique elements worst case O(n)
        frequency = defaultdict(int)
        for num in nums:
            frequency[num] += 1

        # QuickSelect BinarySearch to place kth most frequent element
        # in its partitioned location, to then splice [:k]
        unique = list(frequency.keys())
        n = len(unique)
        finalPartition = k - 1
        l, r = 0, n - 1

        # For High to Low (descending) indexing:
        # with list of 6 elements, grab largest 2 elements
        # 2 - 1 = 1 (our 2nd largest)
        # splice [:2] = [0, 1]
        quickSelectBinarySearchHelper(l, r, finalPartition)

        # overall: tc O
        # overall: sc 
        return unique[:k]
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Iterate ListO(n)O(n)Iterate over list O(n)Allocation for n elements O(n)
QuickSort PartitionAverage: O(n) Worst: O(n2)O(1)Good pivot splits array in half on average O(n), Bad pivot each iteration, smallest or largest, causes O(n2)Partition of freq count array occurs in place O(1)
QuickSort RecursionO(k)O(k)Iterate over buckets for k steps O(k)Result list of size k O(k)
Result SpliceO(k)O(k)Splicing array for first k elementsResult list of k elements
OverallAverage: O(n) Worst: O(n2)O(n)Bad pivot partition dominates O(n2)Allocation for freq count of n elements dominates, O(n)

Solution 3: BucketSort by Count - Hashmap/Algorithm

    def topKFrequent(self, nums: List[int], k:int) -> List[int]:
            
        # tc: iterate over list of n integers O(n)
        # sc: frequency count for unique integers O(m) 
        count = defaultdict(int)
        for key in nums:
            count[key] += 1

        # numBuckets index = items with that frequency count
        # numBuckets must account for max freq count case, where list is full of only 1 element
        # numBuckets must account for empty list, (but still needs a bucket index 0), so we need
        # len(num)+1,
        # empty: 1 bucket: [0] representing empty group
        # else: list of len 5, 6 buckets: [0, 1, 2, 3, 4, 5] each representing length group,
        # with max freq count of 5
        numBuckets = len(nums) + 1

        # list of empty lists, numBucket amount of times 
        # tc: iterate over list length O(n)
        # sc: create len(list)+1 buckets O(n)
        freqBuckets = [[] for i in range(numBuckets)]

        # tc: iterate over frequency list for m unique integer tuples (int, occurrences) O(m) 
        for int, occurrences in count.items():
            freqBuckets[occurrences].append(int)

        # sc: grabbing top k integers, n worst case O(k)
        res = []

        # tc: iterate over len(list)+1 buckets O(n)
        for i in range(len(freqBuckets) - 1, 0, -1):
            
            # tc: iterate over all elements in curr bucket O(m)
            for num in freqBuckets[i]:
                
                # tc: insert operation O(1)
                res.append(num)
                
                # tc: continue while less than k elements grabbed O(k)
                if len(res) == k:
                    return res

        # overall: tc O(n)
        # overall: sc O(n)
        return res
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Iterate listO(n)O(n)Iteration over list O(n)Allocation for n elements O(n)
Bucket CreationO(n)O(n)Iterate len(list)+1 steps O(n)Allocate len(list)+1 buckets O(n)
Bucket PopulationO(n)O(n)Inserting n integers into buckets O(n)Buckets store n integers O(m)
Bucket Grab kO(k)O(k)Iterate over buckets for k steps O(k)Result list of size k O(k)
OverallO(n)O(n)Iterating over list dominates, O(n)Allocating n buckets dominates, O(n)

238. Product of Array Except Self ::2:: - Medium

Topics: Array, Prefix Sum

Intro

Given an integer array nums, return an array answer such that answer[i] is equal to the product of all the elements of nums except nums[i]. The product of any prefix or suffix of nums is guaranteed to fit in a 32-bit integer. You must write an algorithm that runs in O(n) time and without using the division operation.

InputOutput
[1,2,3,4][24,12,8,6]
[-1,1,0,-3,3][0,0,9,0,0]

Constraints:

Product of any prefix or suffix is guaranteed to fit into 32 bit integer

Abstraction

Given a list of nums, return a list of nums that is the product of the array excluding itself. For num n, the result should be the product of all the numbers to the left of it, times the product of all the numbers to the right of it.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
Prefix & PostfixO(n)O(n)Three iterations, prefix, postfix, result, O(3n) ~= O(n)Allocation for 3 arrays O(3n) ~= O(n)
Prefix & PostfixO(n)O(1)Two iterations, prefix, postfix, O(n)Allocation for 1 array O(n)

Solution 1: Prefix, Postfix, Result Arrays - Array/In Place Transformations

    def productExceptSelf(self, nums: List[int]) -> List[int]:
        
        # sc: prefix, postfix, result, arrays O(n)
        prefix = [1] * len(nums)
        postfix = [1] * len(nums)
        res = [1] * len(nums)

        # Prefix
        # tc: iterate list O(n)
        for i in range(1, len(nums)):
            
            # calculate prefix of i = (prefix of i - 1) * (nums[i - 1])
            prefix[i] = prefix[i - 1] * nums[i - 1]

        # Postfix
        # postfix starts at 1 = the postfix of len(n) - 1
        # start reverse iteration at: len(n) - 2
        # tc: iterate list O(n)
        for i in range(n - 2, -1, -1):

            # calculate postfix of i = (postfix of i + 1) * (nums[i + 1])
            postfix[i] = postfix[i + 1] * nums[i + 1]

        # Result
        # tc: iterate list O(n)
        for i in range(n):

            # calculate product except self for i = (prefix of i) * (postfix of i)
            res[i] = prefix[i] * postfix[i]


        # overall: tc O(n)
        # overall: sc O(n)
        return res
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
PrefixO(n)O(n)Iterate over list O(n)Allocation for prefix array O(n)
PostfixO(n)O(n)Iterate over list O(n)Allocation for postfix array O(n)
ResultO(n)O(n)Iterate over list O(n)Allocation for result array O(n)
OverallO(n)O(n)Iteration over list dominates, O(n)Allocation for list dominates, O(n)

Solution 2: Result Array - Array/In Place Transformations

    def productExceptSelf(self, nums: List[int]) -> List[int]:
        
        # sc: result array O(1) 
        res = [1] * len(nums)

        # Prefix
        # tc: iterate list O(n)
        for i in range(1, len(nums)):

            # (prefix of i) = (prefix of i - 1) * (num at i - 1)
            res[i] = res[i - 1] * nums[i - 1]

        # Running postfix through reverse iteration,
        # postfix starts at 1 = the postfix of len(n) - 1
        postfix = 1

        # tc: iterate list in reverse O(n)
        for i in range(len(nums) - 1, -1, -1):
            
            # calculate product except self for i = (prefix of i) * (postfix of i)
            res[i] *= postfix
            
            # update postfix = (postfix of i) * (num at i) 
            postfix *= nums[i]

        # overall: tc O(n)
        # overall: sc O(1)
        return res
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
PrefixO(n)O(1)Iteration over list O(n)Allocation for result array O(1)
ResultO(n)O(1)Iteration over list O(n)Allocation for result array O(1)
OverallO(n)O(1)Iteration over list dominates, O(n)Allocation for result array dominates, O(1)

36. Valid Sudoku ::2:: - Medium

Topics: Array, Hash Table, Matrix

Intro

Determine if a 9 x 9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules: Each row must contain the digits 1-9 without repetition. Each column must contain the digits 1-9 without repetition. Each of the nine 3 x 3 sub-boxes of the grid must contain the digits 1-9 without repetition.

Constraints:

Only the currently filled cells need to be validated, regardless is sudoku board is actually solvable or not.

InputOutput
a sudoku table is too big to put here loljust look at -> sudoku board example

Abstraction

Abstract board into sets of rows, cols, and boxes, and validate whether duplicate values exist in any set.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
SetO(n2)O(n2)Iterate over grid dominates, O(n2)Allocation for grid cells O(n2)
ArrayO(n3)O(n2)Iterate over grid * linear array in operation dominates, O(n3)Allocation for grid dominates, O(n2)

Bug: Initialize DefaultDict of Sets Incorrectly

    def isValidSudoku(self, board: List[List[str]]) -> bool:

        # BUG:
        # defaultdict argument must be callable or None
        # --------
        # Differences: [], list, set:
        # []: instance of the list, not callable
        # list: class creates a new list object when called, callable
        # set: class creates a new set object when called, callable

        # Instead:
        # defaultdict(set)
        
        rows = defaultdict([])
        cols = defaultdict([])
        boxs = defaultdict([])

        for r in range(9):
            for c in range(9):
                tmp = board[r][c]

                if tmp != ".":

                    boxKey = (r//3, c//3)

                    if (tmp in rows[r] or
                        tmp in cols[c] or
                        tmp in boxs[boxKey]):
                        return False

                    rows[r].add(tmp)
                    cols[c].add(tmp)
                    boxs[boxKey].add(tmp)
        return True

Bug: 81 Tuples instead of 9 Tuples for Boxes

    def isValidSudoku(self, board: List[List[str]]) -> bool:
        rows = defaultdict(set)
        cols = defaultdict(set)
        boxs = defaultdict(set)


        for r in range(9):
            for c in range(9):
                tmp = board[r][c]

                if tmp != ".":

                    # BUG:
                    # will create 81 possible boxes (9 rows * 9 columns)
                    # Instead:
                    # boxKey = tuple((r//3, c//3))
                    # creates 9 boxes 
                    # (9//3), (9//3) = (1-3, 1-3) = 9 possibilities
                    boxKey = tuple((r, c))

                    if (tmp in rows[r] or
                        tmp in cols[c] or
                        tmp in boxs[boxKey]):
                        return False

                    rows[r].add(tmp)
                    cols[c].add(tmp)
                    boxs[boxKey].add(tmp)

        return True

Solution 1: DefaultDict Matrix - Hashmap/Representation

    def isValidSudoku(self, board: List[List[str]]) -> bool:
        
        # sc: dictionary for rows, columns, boxes O(n)
        cols = defaultdict(set)
        rows = defaultdict(set)
        grids = defaultdict(set)

        # tc: iterate over grid (r * c) O(n^2)
        for r in range(9):
            for c in range(9):
                
                cell = board[r][c]
                if cell != ".":

                    # index between 9 box sets by tuple key (r/3, c/3)
                    gridTuple = (r // 3, c // 3)

                    # tc: lookup operation O(1)
                    if (cell in rows[r] or 
                        cell in cols[c] or 
                        cell in grids[gridTuple] ):
                        return False
                    
                    # tc: put operation O(1)
                    cols[c].add(cell)
                    rows[r].add(cell)
                    grids[gridTuple].add(cell)

        # overall: tc O(n^2)
        # overall: sc O(n)
        return True
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Row, Col, Box InitO(1)O(n2)No iteration O(1)Dictionary allocation for grid cells O(n2)
Iterate over gridO(n2)O(1)Iterate over grid O(n2)No allocation O(1)
OperationsO(1)O(1)Lookup, Put operation O(1)No allocation for operations O(1)
OverallO(n2)O(n2)Iterate over grid dominates, O(n2)Allocation for grid cells O(n2)

Solution 2: [[]] Matrix - Hashmap/Representation

    def isValidSudoku(self, board: List[List[str]]) -> bool:
        
        # sc: arrays for rows, columns, boxes O(n)
        rows = [[], [], [], [], [], [], [], [], []]
        col = [[], [], [], [], [], [], [], [], []]
        grids = [[], [], [], [], [], [], [], [], []]

        # tc: iterate over grid (r * c) O(n^2)
        for r in range(9):
            for c in range(9):
                
                cell = board[r][c]
                if cell != ".":  

                    # indexing box sets by unique int calculation:
                    # ------------
                    #  (j//3)     = 0, 1, 2
                    #  (i//3) * 3 = 0, 3, 6
                    # ------------
                    #    0 1 2
                    # 0  0 1 2
                    # 3  3 4 5
                    # 6  6 7 8
                    # ------------
                    # allows to 9 indexing options
                    boxKey = (j//3) + ((i//3) * 3)

                    # tc: lookup operation O(1)
                    if (cell in rows[r] or 
                        cell in col[c]  or 
                        cell in grids[boxKey]):
                        return False

                    # sc: put operation O(1)
                    col[c].append(cell)
                    rows[r].append(cell)
                    grids[boxKey].append(cell)
        
        # overall: tc O(n^2)
        # overall: sc  O(n)
        return True
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Row, Col, Box InitO(1)O(n2)No iteration O(1)Dictionary allocation for grid cells O(n2)
Iterate over gridO(n2)O(1)Iterate over grid O(n2)No allocation O(1)
OperationsO(n)O(1)In array operation O(n), Put operation O(1)No allocation for operations O(1)
OverallO(n3)O(n2)Iterate over grid * linear array in operation dominates, O(n3)Allocation for grid dominates, O(n2)

Note: Solution 2 is faster than Solution 1 for small sets, such as our 9x9 board, due to defaultdict overhead

128. Longest Consecutive Sequence ::3:: - Medium

Topics: Array, Hash Table, Union Find

Intro

Given an array of integers nums, return the length of the longest consecutive sequence of elements. A consecutive sequence is a sequence of elements in which each element is exactly 1 greater than the previous element You must wrtie an algorithm that runs in O(n) time.

InputOutput
[100,4,200,1,3,2]4 from [1, 2, 3, 4]
[0,3,7,2,5,8,4,6,0,1]9 from [0, 1, 2, 3, 4, 5, 6, 7, 8]
[1, 0, 1, 2]3 from [0, 1, 2]

Constraints:

0 ≤ nums.length ≤ 105

-109 ≤ nums[i] ≤ 109

Abstraction

Give a list of integers, find the longest sequence of increasing integers. This sequence does not have to follow the left to right order of the array.

Space & Time Complexity

SolutionTime ComplexitySpace ComplexityTime RemarkSpace Remark
Brute Force
Hashmap

Brute Force

    def longestConsecutive(nums: List[int]) -> int:
        
        longest = 0

        # time complexity: Iterate over list of n length O(n)
        for num in nums:

            currLen = 1
            
            # time complexity: Iterate over potential n consecutive integers O(n)
            # time complexity: Lookup in list takes O(n)
            # time complexity: O(n^2)
            while num + currLen in nums:
                currLen += 1

            # compare to global max
            longest = max(longest, currLen)

        # Overall time complexity: O(n^3) for worst-case where we scan sequences n times for n numbers
        # Space complexity: O(1)
        return longest
AspectTime ComplexitySpace ComplexityTime RemarkSpace Remark
IterateO(n)O(1)Iterate over list of n length O(n)No additional memory allocation for iteration O(1)
Iterate consecutive sequenceO(n)O(1)Iterate over potential n consecutive integers O(n)No additional memory allocation for iteration O(1)
List lookupO(n)O(1)Lookup in list takes O(n)No additional memory allocation for list lookup O(1)
OverallO(n3)O(1)Iterating * Iterating * Lookup dominate leading to O(n3)No additional memory allocation O(1)

Bug: Empty list for Union tree leads to fail

    def longestConsecutive(self, nums: List[int]) -> int:

        # BUG:
        # need to check for empty list case
        # or else max(size.values()) will return error

        # Instead:
        # if len(nums) == 0
        #   return 0

        parent = {}
        size = {}

        def find(x):

            if parent[x] != x:
                parent[x] = find(parent[x])

            return parent[x]


        def union(x, y):
            root_x = find(x)
            root_y = find(y)

            if root_x != root_y:

                size_x = size[root_x]
                size_y = size[root_y]

                if size_x < size_y:
                    parent[root_x] = root_y
                    size[root_y] += size[root_x]
                else:
                    parent[root_y] = root_x
                    size[root_x] += size[root_y]

        setNum = set(nums)
        for n in setNum:
            parent[n] = n
            size[n] = 1

        for n in setNum:
            if (n+1) in parent:
                union(n, n+1)

        return max(size.values())

Solution 1: HashMap Continent Boundaries - HashMap/Representation

    def longestConsecutive(nums: List[int]) -> int:
        
        longest = 0

        # sc: hashmap of list of n length O(n)
        seqLen = defaultdict(int)

        # tc: iterate list O(n)
        # sc: ignore duplicates O(n)
        numSet = set(nums)
        for num in nums:
            
            # Get lengths of neighbor sequences
            leftContinentLenFromBoundary = seqLen[num - 1]
            rightContinentLenFromBoundary = seqLen[num + 1]

            # Calculate new bridged sequence length
            bridgedLen = 1 + leftContinentLenFromBoundary + rightContinentLenFromBoundary

            # Update new continent boundaries
            seqLen[num - leftContinentLenFromBoundary] = bridgedLen
            seqLen[num + rightContinentLenFromBoundary] = bridgedLen

            # Update the global max
            longest = max(longest, bridgedLen)

        # overall: tc 
        # overall: sc
        return longest
AspectTime ComplexitySpace ComplexityTime RemarkSpace Remark
SetO(n)O(n)Converting list unique list O(n)Allocation for unique set O(n)
Iterate over setO(n)O(n)Iterate set O(n)Dictionary allocation for sequence length per element O(n)
OperationsO(1)O(1)Get, Put operation O(1)No allocation O(1)
OverallO(n)O(n)Iterate unique list dominates O(n)Allocation for unique elements, sequence length dominates, O(n)

Solution 2: Set Rummy Run - Hashmap/Representation

def longestConsecutive(self, nums: List[int]) -> int: 

    
    # Note: 
    # although rummy has an inner while loop,
    # each number is part of at most one sequence
    # and are not checked again in future sequences
    # thus, the number of times the while loop runs across 
    # all iterations is n.
    # so the overall tc is O(n), instead of O(n^2)


    longest = 0

    # sc: ignore duplicates O(n)
    numSet = set(nums)

    # tc: iterate list O(n)
    for num in numSet:

        # tc: lookup operation takes constant O(1)
        # found start of a new run
        if (num - 1) not in numSet: 

            newRunLen = 1

            # tc: iterate over sequence/list O(n)
            # rummy run until missing an element
            while (num + newRunLen) in numSet:
                newRunLen += 1 

            # compare with global max sequence
            longest = max(longest, newRunLen)

    # overall: tc O(n)
    # overall: sc O(n)
    return longest        
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
SetO(n)O(n)Converting list unique list O(n)Allocation for unique set O(n)
Iterate over setO(n)O(1)Iterate set O(n) * Sequence Start Check In Operation O(1)No allocation O(1)
Rummy IterateO(n)O(1)Iterate while valid sequence, n steps across all iterations O(n)No allocation O(1)
OverallO(n)O(n)Iterate set dominates, O(n)Set allocation dominates, O(n)

Solution 3: Union Find Tree - HashMap/Algorithm

    def longestConsecutive(self, nums: List[int]) -> int:
        
        # Check Empty: avoid max(size.values()) return error
        if len(nums) == 0:
            return 0

        # Union Find Initialization:
        
        # sc: dictionary for list O(n)
        # Stores parent of each number
        parent = {}  

        # sc: dictionary for list O(n)
        # Stores the size of connected components
        size = {}    

        # tc: Amortized O(α(n)) per operation
        # Find + path compression
        def find(x):

            # if the parent of x is not itself, we have not reached the representative  
            if parent[x] != x:

                # Path compression
                # set parent of curr, to parent of parent
                parent[x] = find(parent[x])

            # return representative of group
            # either return self
            # or return parent of parent
            return parent[x]

        # tc: Amortized O(α(n)) per operation
        # Union operation with size optimization
        def union(x, y):

            # Get parent for trees x and y
            rootX = find(x)
            rootY = find(y)

            # If trees do not share representative
            if rootX != rootY:

                # Attach smaller tree to larger tree,
                # update larger tree size and 
                # smaller tree parent

                xSize = size[rootX]
                ySize = size[rootY]

                if ySize < xSize:
                    parent[rootY] = rootX
                    size[rootX] += size[rootY]
                else:
                    parent[rootX] = rootY
                    size[rootY] += size[rootX]

        # tc: iterate list O(n)
        # Initialize Union Find: 
        #   1. Set all parents to self
        #   2. Set all sizes to 1
        for num in nums:

            # ignore duplicates
            if num not in parent:
                parent[num] = num
                size[num] = 1

        # tc: iterate list O(n)
        # Join() consecutive numbers,
        # stores sequences represented as trees
        for num in nums:
            if num + 1 in parent:
                union(num, num + 1)
        
        # return tree (sequence of numbers) with
        # the longest size/length

        # overall: tc O(n)
        # overall: sc O(n)
        return max(size.values())
AspectTime ComplexitySpace ComplexityTime RemarksSpace Remarks
Union Tree InitO(n)O(n)Iterate list O(n)Dictionary allocation for parent and size O(n)
UnionAmortized O(α(n))O(1)Each union operation is nearly constant with path compression and size optimization α(n)No allocation O(1)
FindAmortized O(α(n))O(1)Find with path compression is nearly constant O(α(n))No allocation O(1)
Iterate JoinO(n)O(1)Iterate list O(n) * Union which is constant, so just O(n)No allocation O(1)
Get LargestO(n)O(1)Grab max from dictionary O(n)No allocation O(1)
OverallO(n)O(n)Iterate * Union dominates, O(n)Dictionary allocation for parent and size dominates, O(n)