Teaching Kids Programming: Videos on Data Structures and Algorithms
You are given a list of unique strings genes where each element has the same length and contains characters “A”, “C”, “G” and/or “T”.
If strings a and b are the same string except for one character, then a and b are in the same mutation group.
If strings a and b are in a group and b and c are in a group, then a and c are in the same group.
Return the total number of mutation groups.Constraints
n ≤ 10,000
k ≤ 20 where k is the length of a string in genes
Example 1
Input
genes = [“ACGT”, “ACCT”, “AGGT”, “TTTT”, “TTTG”]
Output
2
Explanation
There are two mutation groups:
[“ACGT”, “ACCT”, “AGGT”]
[“TTTT”, “TTTG”]Hint:
How can we connect one word with another? Is there a better way to connect than simply comparing each pair? After connecting words with each other, return the groups they form.
Find Gene Mutation Groups via UnionFind Algorithm (Disjoint Set)
With Disjoint Set Data Structure, we can easily perform the Union Find Algorithm. We can union the current gene with its neighbours (mutation Group) and then we can get the number of connected components for a Graph.
Here is the class implementation for a Disjoint Set using the hash table. If we know the number of groups and the number of groups is relatively small, we can initialize the group array (static).
class UnionFind:
def __init__(self):
self.parents = {}
self.sizes = {}
def union(self, a, b):
a = self.find(a)
b = self.find(b)
if a == b:
return False
if self.sizes[a] <= self.sizes[b]:
a, b = b, a
self.parents[b] = a
self.sizes[a] += self.sizes[b]
return True
def find(self, x):
if x not in self.parents:
self.parents[x] = x
self.sizes[x] = 1
return x
if self.parents[x] != x:
self.parents[x] = self.find(self.parents[x])
return self.parents[x]
@property
def root_count(self):
return sum(x == p for x, p in self.parents.items())
The Disjoint Set has union, find and get count operation. We tend to merge the component with less sizes to larger ones. The union and find complexity is amortized O(1) constant.
Then the following is the Python code that uses Disjoint Set to find the number of connected components (Mutation Gene Groups) using UnionFind Algorithm:
class Solution:
def solve(self, genes):
uf = UnionFind()
g = set(genes)
for i in genes:
for j in range(len(i)):
for x in ('A', 'C', 'G', 'T'):
n = i[:j] + x + i[j+1:]
if n in g:
uf.union(i, n)
return uf.root_count
Find Gene Mutation Groups
- Teaching Kids Programming – Find Gene Mutation Groups via UnionFind Algorithm (Disjoint Set)
- Teaching Kids Programming – Find Gene Mutation Groups via Breadth First Search Algorithm
- Teaching Kids Programming – Find Gene Mutation Groups via Recursive Depth First Search Algorithm
–EOF (The Ultimate Computing & Technology Blog) —
Last Post: Teaching Kids Programming - Find Gene Mutation Groups via Breadth First Search Algorithm
Next Post: Teaching Kids Programming - Find First Recurring Character using Hash Table/Set