codingecho

日々の体験などを書いてます

Full-text search for Japanese with ngram full-text parser

TL;DR

Create an index with ngram full-text parser.

CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;

ngram Full-Text Parser

We want to do a full-text search for searching our entire texts that exist about 150k rows. Its text is written in Japanese.

Our table like this;

create table message_log
(
    id int(11) unsigned not null,
    message varchar(255) default '' not null,
    primary key (id)
);

If message column would fill out in English or other space-separated languages, you can create a full-text index.

CREATE FULLTEXT INDEX idx_message_log on message_log.message (message);

However, we treat the message as Japanese. In this case, we cannot get any message, because Japanese is not the text space-separated words.

For instance, Japanese like this;

好きなメンバーとその理由を教えて下さい!

not space-separated;

好きな メンバー と その 理由 を 教えて 下さい!

We assume we have this record.

id message
1 好きなメンバーとその理由を教えて下さい!
2 好きな メンバー と その 理由 を 教えて 下さい!

We find messages with the full-text search function.

SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');

And then get this result.

id message
2 好きな メンバー と その 理由 を 教えて 下さい!

We expect to can get all records. However, it does not include the text 好きなメンバーとその理由を教えて下さい!.

So we create an index with ngram full-text parser.

CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;

Again, find messages.

SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');

And then we get ID 1 and 2 that we expected.

id message
1 好きなメンバーとその理由を教えて下さい!
2 好きな メンバー と その 理由 を 教えて 下さい!