TL;DR
Create an index with ngram full-text parser.
CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;
ngram Full-Text Parser
We want to do a full-text search for searching our entire texts that exist about 150k rows. Its text is written in Japanese.
Our table like this;
create table message_log
(
id int(11) unsigned not null,
message varchar(255) default '' not null,
primary key (id)
);
If message
column would fill out in English or other space-separated languages, you can create a full-text index.
CREATE FULLTEXT INDEX idx_message_log on message_log.message (message);
However, we treat the message as Japanese. In this case, we cannot get any message, because Japanese is not the text space-separated words.
For instance, Japanese like this;
好きなメンバーとその理由を教えて下さい!
not space-separated;
好きな メンバー と その 理由 を 教えて 下さい!
We assume we have this record.
id | message |
---|---|
1 | 好きなメンバーとその理由を教えて下さい! |
2 | 好きな メンバー と その 理由 を 教えて 下さい! |
We find messages with the full-text search function.
SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');
And then get this result.
id | message |
---|---|
2 | 好きな メンバー と その 理由 を 教えて 下さい! |
We expect to can get all records. However, it does not include the text 好きなメンバーとその理由を教えて下さい!
.
So we create an index with ngram full-text parser.
CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;
Again, find messages.
SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');
And then we get ID 1 and 2 that we expected.
id | message |
---|---|
1 | 好きなメンバーとその理由を教えて下さい! |
2 | 好きな メンバー と その 理由 を 教えて 下さい! |