[208. Implement Trie (Prefix Tree)] (알고리즘)

September 9, 2023 4 분 소요

Trie

문제

A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

Implement the Trie class:

Trie() Initializes the trie object.
void insert(String word) Inserts the string word into the trie.
boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.

Example 1:

Input [“Trie”, “insert”, “search”, “search”, “startsWith”, “insert”, “search”] [[], [“apple”], [“apple”], [“app”], [“app”], [“app”], [“app”]] Output [null, null, true, false, true, null, true]

Explanation Trie trie = new Trie(); trie.insert(“apple”); trie.search(“apple”); // return True trie.search(“app”); // return False trie.startsWith(“app”); // return True trie.insert(“app”); trie.search(“app”); // return True

Constraints:

1 <= word.length, prefix.length <= 2000
word and prefix consist only of lowercase English letters.
At most 3 * 104 calls in total will be made to insert, search, and startsWith.

스스로 고민

접근법

트라이 자료구조 만들기
새로 입력 받은 문자가 Trie에 있는 경우 true를 반환 그렇지 않으면 false를 반환
이전에 삽입된 문자열 중에 접두사를 가진 문자열이 있는 경우 true 반환 그렇지 않으면 false
- 해당 단어로 시작하거나 연관이 있는 문자열 정도로 이해하면 편하다.

의사코드

TrieNode를 만들어준다 링크드 리스트처럼 하나의 노드이다. 한 가지 특이한 점은 딕셔너리를 사용하며 딕셔셔너리 키 값에 노드를 여러개를 넣는다고 상상? 하면 편한다.
그리고 문자가 끝나는 위치를 알 수 있게 is_end_of_word 라는 인스턴스 변수를 만들어준다.
TrieNode를 사용할 Trie 클래스를 만들어준다.
여기 안에 단어를 삽입할 insert, search 그리고 해당 접미사 접두사로 시작하는지 검사할 수 있는 startsWith 매서드를 만들어준다.
코드가 너무 길어져서 구현코드 부분에서 따로 설명을 하는 게 좋을 것 같다.

구현 코드

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def startsWith(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True
    
trie = Trie()
trie.insert("apple")
print(trie.search("apple"))   # True
print(trie.search("app"))     # False
print(trie.startsWith("app")) # True
trie.insert("app")
print(trie.search("app"))     # True

insert 부분

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

root 로 사용할 TrieNode를 만들어준다.
반복문을 통해 단어를 쪼개준다.
예를 들어 apple이라는 단어가 들어왔을 경우 반복문으로 a , p , p , l , e 이렇게 문자로 각각 쪼개주고 self.root에 저장된 TrieNode를 node로 선언해준다.
TrieNode 에는 children 이라는 딕셔너리와 단어가 끝났는지 판별해주는 is_end_of_word 가 존재한다.
위의 코드를 이해하기 위해 apple이라는 단어가 들어가면 어떻게 될지 상상해보자.

trie = Trie()
trie.insert("apple")

반복문에서 쪼개진 apple 중 첫 번째 문자 a는 node.children 에 존재하는지 검사를 한다.
딕셔너리에는 아무것도 존재하지 않으므로 a 라는 키 값에 TrieNode 가 벨류 값으로 들어가진다.
그럼 처음에는 node 가 root 노드였지면 현재는 a 키 값으로 새롭게 열린 노드다.
두 번째 p 는 어떻게 될까? root 노드에는 a 가 있지만 딕셔너리 a 키 값으로 TriNode가 새롭게 열렸고 a 키 값으로 열린 TrieNode 안 children 에는 값이 아무것도 없다 (비어있는 딕셔너리)
그럼 위와 같이 키 p 값으로 새로운 TrieNode 가 새롭게 만들어진다.
세 번째 p도 이전 p 키 값으로 새롭게 만들어진 TrieNode 안에는 아무것도 없으니 또 새로운 TrieNode 가 만들어진다
a -> p -> p -> l - > e 이런 식으로 만들어진다.
e 는 반복문이 끝나는 지점으로 반복문이 다 끝난 후에 node.is_end_of_word = True 가 실행되면서 해당 단어가 끝난걸 알려준다.

search 부분

    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

위에서 만들어진 노드들을 쭉쭉 타고 들어가서 반복문이 끝나고 그 지점의 is_end_of_word 값을 리턴해준다.
apple의 경우 p에서 아직 끝나지 않은 상태인데 app 이라는 단어를 검색할 경우 p의 마지막부분에서 반복문이 끝나고 이 때 return으로 p의 is_end_of_word 가 반환된다.
apple을 저장할 때 p 의 값은 디폴트가 False였으니 False가 반환고 이를 통해 해당 단어가 없다고 판단을 할 수 있다.
아예 일치하지 않는 단어일 경우 Falsle를 반환한다.

startsWith 부분

    def startsWith(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

노드가 계속 열린다면 해당 단어는 True를 반환하고 중간에 끊기거나 없다면 False를 반환해서 접두사가 일치 하는지 알 수 있다.

시간 복잡도와 공간 복잡도

시간 복잡도:

insert 함수: 단어를 Trie에 삽입하는 데는 O(M)의 시간이 걸린다. 여기서 M은 삽입하려는 단어의 길이다.
search 함수: 단어를 Trie에서 찾는 데는 O(M)의 시간이 걸린다. 여기서 M은 찾고자 하는 단어의 길이다.
startsWith 함수: 주어진 접두사로 시작하는 단어가 Trie에 있는지 확인하는 데는 O(P)의 시간이 걸린다. 여기서 P는 접두사의 길이다.

공간 복잡도:

Trie 자료 구조는 모든 가능한 단어의 문자열을 저장하지 않고 각 문자 노드가 가리키는 자식 노드에 따라 공간을 사용한다. 따라서 공간 복잡도는 Trie에 저장된 총 문자 수에 비례한다.
만약 Trie에 N개의 단어를 삽입한다면 최악의 경우 모든 단어가 서로 다른 문자로 이루어져 있다고 가정하면, 전체 Trie의 공간 복잡도는 대략 O(N)입니다.

회고 과정

처음에 개념을 이해하는 게 조금 어려워서 시간이 많이 걸렸다.
문제를 좀 더 풀어봐야 확실히 이해할 수 있을 것 같다.

다른 사람 코드를 보고 그 기준으로 어떤 부분을 개선 할 수 있는지 최종 회고

class Trie:

    def __init__(self):
        self.m = {}

    def insert(self, word: str) -> None:
        cur  = self.m
        for c in word:
            if c not in cur:
                cur[c] = {}
            cur = cur[c]
        cur["#"] = ""

    def search(self, word: str) -> bool:
        cur = self.m
        for c in word:
            if c not in cur:
                return False
            cur  = cur[c]
        return "#" in cur

    def startsWith(self, prefix: str) -> bool:
        cur  = self.m
        for c in prefix:
            if c not in cur:
                return False
            cur = cur[c]
        return True

굳이 노드를 만들지 않고 오로지 딕셔너리로만 문제를 해결한 케이스다.
항상 쉽게쉽게 생각하려고 노력을 해야겠다.

Twitter Facebook LinkedIn

[208. Implement Trie (Prefix Tree)] (알고리즘)

문제

스스로 고민

접근법

의사코드

구현 코드

insert 부분

search 부분

startsWith 부분

시간 복잡도와 공간 복잡도

회고 과정

다른 사람 코드를 보고 그 기준으로 어떤 부분을 개선 할 수 있는지 최종 회고

공유하기

댓글남기기

참고

[Basic JavaScript] 객체에 대한 메모리 주소 with Redux 잘 사용하기

[Ninja Java] Q. Java의 final. finally, finallize의 차이는?

[Daily] DI / DIP

[Daily] 다형성 Java vs Python vs TypeScript 비교