I participated in ISUCON for the first time.
I acquired some knowledge to find out bottleneck in HTTP server and MySQL. Our team used alp for HTTP response analysis and pt-query-digest for MySQL slow query analysis.
I try to introduce these tools that were useful in our situation or might also other teams(or situations).
How to find the bottleneck of web APIs on web application
We need to analyze access log of API generated with Nginx to decrease API's response time.
So, I used alp to analyze access log.
Download alp binary proper your architecture, usually amd64 if you use Linux, from here
$ wget https://github.com/tkuchiki/alp/releases/download/v0.3.1/alp_linux_amd64.zip
$ unzip alp_linux_amd64.zip
Change Nginx log formart to LTSV format
$ sudo vim /etc/nginx/nginx.conf
log_format ltsv "time:$time_local"
"\thost:$remote_addr"
"\tforwardedfor:$http_x_forwarded_for"
"\treq:$request"
"\tstatus:$status"
"\tmethod:$request_method"
"\turi:$request_uri"
"\tsize:$body_bytes_sent"
"\treferer:$http_referer"
"\tua:$http_user_agent"
"\treqtime:$request_time"
"\tcache:$upstream_http_x_cache"
"\truntime:$upstream_http_x_runtime"
"\tapptime:$upstream_response_time"
"\tvhost:$host";
access.log ltsv;
Analyze Nginx log
$ ./alp -f /var/log/nginx/access.log
The list analyzed is like this;
+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+
| COUNT | MIN | MAX | SUM | AVG | P1 | P50 | P99 | STDDEV | MIN(BODY) | MAX(BODY) | SUM(BODY) | AVG(BODY) | METHOD | URI |
+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+
| 12 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 318.000 | 318.000 | 3816.000 | 318.000 | GET | /favicon.ico |
| 1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 4162.000 | 4162.000 | 4162.000 | 4162.000 | GET | /js/chat.js |
| 6 | 0.770 | 1.932 | 7.167 | 1.194 | 0.770 | 0.988 | 1.628 | 0.432 | 327999.000 | 327999.000 | 1967994.000 | 327999.000 | GET | /icons/b1c8c5bc9b026507ae62e6bf62cf55f70f4ac3d8.png |
| 7 | 0.212 | 1.967 | 7.447 | 1.064 | 0.212 | 0.998 | 1.501 | 0.536 | 105173.000 | 105173.000 | 736211.000 | 105173.000 | GET | /icons/e58ed7194ed4a99d9d6ada9e3383983edcbd1edc.png |
| 301 | 0.099 | 3.752 | 473.578 | 1.573 | 0.549 | 1.507 | 2.921 | 0.407 | 0.000 | 3005.000 | 885578.000 | 2942.120 | GET | /fetch |
+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+
This table shows the API response times that are listed in ascending order of time.
We started to improve the response time on worst. This case, we should start at /fetch
and /icon
.
How to find bottleneck queries on MySQL
To find bottleneck of SQL queries, we used pt-query-digest
.
Install percona toolkit
See this document.
$ wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
$ sudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb
$ sudo apt-get update
$ sudo apt-cache search percona
$ sudo apt-get install percona-toolkit
$ sudo vim /etc/mysql/my.cnf
# Enable slow query log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query.log
Analyze slow query log
$ pt-query-digest /var/log/mysql/slow_query.log
The list analyzed is like this;
# 660ms user time, 30ms system time, 39.37M rss, 121.42M vsz
# Current date: Sat Oct 21 14:54:35 2017
# Hostname: app1053
# Files: /var/log/mysql/slow-query.log
# Overall: 1.50k total, 23 unique, 2.73 QPS, 0.01x concurrency ___________
# Time range: 2017-10-21T05:45:19 to 2017-10-21T05:54:29
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 3s 1us 608ms 2ms 5ms 20ms 103us
# Lock time 78ms 0 7ms 51us 131us 174us 36us
# Rows sent 1.96k 0 100 1.33 0.99 8.72 0
# Rows examine 2.91M 0 9.77k 1.98k 9.33k 3.74k 0
# Query size 90.88k 20 202 61.88 136.99 25.68 62.76
# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ============= ===== ====== ===== ==============
# 1 0x16A13978F3B7FCFA 1.7328 55.1% 6 0.2888 0.06 SELECT image
# 2 0x4866C96BA295C56E 0.7782 24.7% 301 0.0026 0.00 SELECT message
# 3 0xEBDB62606B914A34 0.4282 13.6% 90 0.0048 0.00 INSERT channel
# 4 0x91283C307570F439 0.0334 1.1% 4 0.0083 0.00 SELECT message
# 5 0x41883793B68D128E 0.0302 1.0% 100 0.0003 0.00 SELECT user
# MISC 0xMISC 0.1427 4.5% 1003 0.0001 0.0 <18 ITEMS>
# Query 1: 6 QPS, 1.73x concurrency, ID 0x16A13978F3B7FCFA at byte 190178
# Scores: V/M = 0.06
# Time range: 2017-10-21T05:46:17 to 2017-10-21T05:46:18
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 6
# Exec time 55 2s 203ms 608ms 289ms 580ms 136ms 242ms
# Lock time 1 840us 109us 193us 140us 185us 29us 152us
# Rows sent 5 103 10 23 17.17 22.53 4.06 17.65
# Rows examine 0 5.87k 1001 1001 1001 1001 0 1001
# Query size 0 474 79 79 79 79 0 79
# String:
# Databases isubata
# Hosts app1051
# Users isucon
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms ################################################################
# 1s
# 10s+
# Tables
# SHOW TABLE STATUS FROM `isubata` LIKE 'image'\G
# SHOW CREATE TABLE `isubata`.`image`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT * FROM image WHERE name = '78a9228a393eb2621f346fc6a5e099d5bc373f76.png'\G
# Query 2: 1.08 QPS, 0.00x concurrency, ID 0x4866C96BA295C56E at byte 177211
# Scores: V/M = 0.00
# Time range: 2017-10-21T05:46:16 to 2017-10-21T05:50:54
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 20 301
# Exec time 24 778ms 2ms 9ms 3ms 5ms 966us 2ms
# Lock time 15 12ms 30us 208us 39us 66us 15us 33us
# Rows sent 15 301 1 1 1 1 0 1
# Rows examine 98 2.87M 9.77k 9.77k 9.77k 9.77k 0 9.77k
# Query size 19 17.58k 58 60 59.81 59.77 0.85 59.77
# String:
# Databases isubata
# Hosts app1052 (298/99%), app1051 (3/0%)
# Users isucon
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms ################################################################
# 10ms
# 100ms
# 1s
# 10s+
# Tables
# SHOW TABLE STATUS FROM `isubata` LIKE 'message'\G
# SHOW CREATE TABLE `isubata`.`message`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT COUNT(*) as cnt FROM message WHERE channel_id = '733'\G
This table shows the queries time are listed in desacending order of time.
In this case, we should start to improve the queryt time that is SELECT * FROM image WHERE name = '78a9228a393eb2621f346fc6a5e099d5bc373f76.png'\G
. For example to improve it, use caching on Nginx.