Algorithms, Blockchain and Cloud

Teaching Kids Programming – Find Gene Mutation Groups via UnionFind Algorithm (Disjoint Set)


Teaching Kids Programming: Videos on Data Structures and Algorithms

You are given a list of unique strings genes where each element has the same length and contains characters “A”, “C”, “G” and/or “T”.

If strings a and b are the same string except for one character, then a and b are in the same mutation group.
If strings a and b are in a group and b and c are in a group, then a and c are in the same group.
Return the total number of mutation groups.

Constraints
n ≤ 10,000
k ≤ 20 where k is the length of a string in genes
Example 1
Input
genes = [“ACGT”, “ACCT”, “AGGT”, “TTTT”, “TTTG”]
Output
2
Explanation
There are two mutation groups:
[“ACGT”, “ACCT”, “AGGT”]
[“TTTT”, “TTTG”]

Hint:
How can we connect one word with another? Is there a better way to connect than simply comparing each pair? After connecting words with each other, return the groups they form.

Find Gene Mutation Groups via UnionFind Algorithm (Disjoint Set)

With Disjoint Set Data Structure, we can easily perform the Union Find Algorithm. We can union the current gene with its neighbours (mutation Group) and then we can get the number of connected components for a Graph.

Here is the class implementation for a Disjoint Set using the hash table. If we know the number of groups and the number of groups is relatively small, we can initialize the group array (static).

class UnionFind:
    def __init__(self):
        self.parents = {}
        self.sizes = {}

    def union(self, a, b):
        a = self.find(a)
        b = self.find(b)
        if a == b:
            return False
        if self.sizes[a] <= self.sizes[b]:
            a, b = b, a
        self.parents[b] = a
        self.sizes[a] += self.sizes[b]
        return True

    def find(self, x):
        if x not in self.parents:
            self.parents[x] = x
            self.sizes[x] = 1
            return x
        if self.parents[x] != x:
            self.parents[x] = self.find(self.parents[x])
        return self.parents[x]

    @property
    def root_count(self):
        return sum(x == p for x, p in self.parents.items())

The Disjoint Set has union, find and get count operation. We tend to merge the component with less sizes to larger ones. The union and find complexity is amortized O(1) constant.

Then the following is the Python code that uses Disjoint Set to find the number of connected components (Mutation Gene Groups) using UnionFind Algorithm:

class Solution:
    def solve(self, genes):        
        uf = UnionFind()
        g = set(genes)
        for i in genes:
            for j in range(len(i)):
                for x in ('A', 'C', 'G', 'T'):
                    n = i[:j] + x + i[j+1:]
                    if n in g:
                        uf.union(i, n)
        return uf.root_count

Find Gene Mutation Groups

–EOF (The Ultimate Computing & Technology Blog) —

701 words
Last Post: Teaching Kids Programming - Find Gene Mutation Groups via Breadth First Search Algorithm
Next Post: Teaching Kids Programming - Find First Recurring Character using Hash Table/Set

The Permanent URL is: Teaching Kids Programming – Find Gene Mutation Groups via UnionFind Algorithm (Disjoint Set) (AMP Version)

Exit mobile version