STL likeなradix treeを実装した - NO!と言えるようになりたい

radix tree，PATRICIA tree，PATRICIA trie，基数木，パトリシア木，など色々な名前で呼ばれている木構造をC++で実装した．radix treeとは，IPのルーティングテーブルや転置インデックスなどに使われるデータ構造である．詳細はWikipediaでも見て欲しい．激しく車輪の再発明だが，なかなか良さげな実装が無かったのでしょうがない．しょぼい実装だが，折角作ったので公開しておくとともに，自分が忘れないために使い方をメモしておく．

ソースはここから入手出来る．http://github.com/ytakano/radix_tree

使い方

基本的な使い方はSTLのmapと同じとなる．以下はサンプルコード．

#include <string>
#include <iostream>

#include "radix_tree.hpp"

using namespace std;

radix_tree<string, int> tree;

void
insert()
{
        tree["apache"]    = 0;
        tree["afford"]    = 1;
        tree["available"] = 2;
        tree["affair"]    = 3;
        tree["avenger"]   = 4;
        tree["binary"]    = 5;
        tree["bind"]      = 6;
        tree["brother"]   = 7;
        tree["brace"]     = 8;
        tree["blind"]     = 9;
}

void
longest_match(string key)
{
        radix_tree<string, int>::iterator it;

        it = tree.longest_match(key);

        cout << "longest_match(\"" << key << "\"):" << endl;

        if (it != tree.end()) {
                cout << "    " << it->first << ", " << it->second << endl;
        } else {
                cout << "    failed" << endl;
        }
}

void
prefix_match(string key)
{
        vector<radix_tree<string, int>::iterator> vec;
        vector<radix_tree<string, int>::iterator>::iterator it;

        tree.prefix_match(key, vec);

        cout << "prefix_match(\"" << key << "\"):" << endl;

        for (it = vec.begin(); it != vec.end(); ++it) {
                cout << "    " << (*it)->first
                     << ", " << (*it)->second
                     << endl;
        }
}

void
greedy_match(string key)
{
        vector<radix_tree<string, int>::iterator> vec;
        vector<radix_tree<string, int>::iterator>::iterator it;

        tree.greedy_match(key, vec);

        cout << "greedy_match(\"" << key << "\"):" << endl;

        for (it = vec.begin(); it != vec.end(); ++it) {
                cout << "    " << (*it)->first
                     << ", " << (*it)->second
                     << endl;
        }
}

void
traverse()
{
        radix_tree<string, int>::iterator it;

        cout << "traverse:" << endl;
        for (it = tree.begin(); it != tree.end(); ++it) {
                cout << "    " << it->first << ", " << it->second << endl;
        }
}

int
main(int argc, char *argv)
{
        insert();

        longest_match("binder");
        longest_match("bracelet");
        longest_match("apple");

        prefix_match("aff");
        prefix_match("bi");
        prefix_match("a");

        greedy_match("avoid");
        greedy_match("bring");
        greedy_match("attack");

        traverse();

        return 0;
}

これを実行すると，以下のように出力される．

longest_match("binder"):
    bind, 6
longest_match("bracelet"):
    brace, 8
longest_match("apple"):
    failed
prefix_match("aff"):
    affair, 3
    afford, 1
prefix_match("bi"):
    binary, 5
    bind, 6
prefix_match("a"):
    affair, 3
    afford, 1
    apache, 0
    available, 2
    avenger, 4
greedy_match("avoid"):
    available, 2
    avenger, 4
greedy_match("bring"):
    brace, 8
    brother, 7
greedy_match("attack"):
    affair, 3
    afford, 1
    apache, 0
    available, 2
    avenger, 4
traverse:
    affair, 3
    afford, 1
    apache, 0
    available, 2
    avenger, 4
    binary, 5
    bind, 6
    blind, 9
    brace, 8
    brother, 7

解説

まず，radix_tree.hpp, radix_tree_node.hpp, radix_tree_it.hppをインクルード可能な場所に置き，インクルードする．

#include "radix_tree.hpp"

次に，radix_treeの実態を定義する．テンプレートの第一引数はkeyの型で，第二引数はvalueの型となる．

radix_tree<string, int> tree;

データのインサートは，STLのmapと同じように行える．

tree["apache"]    = 0;
tree["afford"]    = 1;

insert関数を使っても同じことが出来る．

tree.insert(std::pair<string, int>("key", 100));

データの削除はerase関数を使う．

tree.erase("key");

データの検索は，[]演算子を使うか，find関数を使うと行える．

std::cout << tree["key"] << std::endl;

radix_tree<string, int>::iterator it = tree.find("key");
std::cout << it->second << std::endl;

iteratorを使った木の走査も可能である．

radix_tree<string, int>::iterator it;

for (it = tree.begin(); it != tree.end(); ++it) {
    cout << "    " << it->first << ", " << it->second << endl;
}

サンプルプログラムの場合，データ走査を行っているtraverse関数を呼び出すと以下のように表示される．

traverse:
    affair, 3
    afford, 1
    apache, 0
    available, 2
    avenger, 4
    binary, 5
    bind, 6
    blind, 9
    brace, 8
    brother, 7

longest_match関数を使うと，keyを元に最長一致でデータを検索する．ルーティングテーブルのあれである．上のデータに対して，"binder"をkeyにしてlongest_match関数を呼び出してみる．

radix_tree<string, int>::iterator it;
it = tree.longest_match("binder");
cout << it->first << ", " << it->second;

得られる値は"bind"となる．

bind, 6

prefix_match関数を使うと，prefix searchが行える．これは正規表現の"key*"と同じである．上のデータに対して，"aff"をkeyにしてprefix_match関数を呼び出してみる．

vector<radix_tree<string, int>::iterator> vec;
vector<radix_tree<string, int>::iterator>::iterator it;

tree.prefix_match("aff", vec);

for (it = vec.begin(); it != vec.end(); ++it) {
    cout << "    " << (*it)->first
         << ", " << (*it)->second
         << endl;
}

得られる値は"affair"と"afford"となる．

    affair, 3
    afford, 1

greedy_match関数は少し特殊な検索となる．この関数は，与えられたkeyと前方が最も長く一致するデータの集合を返す．実際に例を見た方が理解できるかも知れない．上のデータに対して，"avoid"をkeyにしてgreedy_match関数を呼び出してみる．

vector<radix_tree<string, int>::iterator> vec;
vector<radix_tree<string, int>::iterator>::iterator it;

tree.greedy_match("avoid", vec);

for (it = vec.begin(); it != vec.end(); ++it) {
    cout << "    " << (*it)->first
         << ", " << (*it)->second
         << endl;
}

すると，得られるのは，"available"と"avenger"となる．

    available, 2
    avenger, 4

なぜなら，"avoid"の前方に最も長く一致するのが，"availabe"と"avenger"であるからである．他の"affair"などは前方の一文字しか一致しないため，結果には含まれない．実は，このgreedyな検索をやりたいがために車輪を再発明したようなものである．

今回は，keyの型をstringとしたが，他の型を使いたい場合は，以下の4つの関数を各自定義する必要がある．

// 部分列を返す関数
type_of_key radix_substr(const type_of_key &key, int begin, int num);
// 結合関数
type_of_key radix_join(const type_of_key &key1, const type_of_key &key2);
// 長さを返す関数
int         radix_length(const type_of_key &key);

string型の場合は，既に定義されているため，再定義する必要はない．ちなみに，string型の上記関数は以下の通りとなっている．

std::string
radix_substr(const std::string &str, int begin, int num)
{
        return str.substr(begin, num);
}

std::string
radix_join(const std::string &str1, const std::string &str2)
{
        return str1 + str2;
}

int
radix_length(const std::string &str) {
        return str.size();
}

また，keyとして用いるクラスは，[], ==, < 演算子をオーバーロードしている必要がある．

TODO

今のところ，イテレータはforwardイテレータのみである．本来ならbidirectionalイテレータにして，reverseイテレータも定義する必要があるが，今のところ困らないので作っていない．

iteratorを引数にしたerase関数なども作っていない．これらは後で追加する予定．