Usage ===== Pre-requisites -------------- * **hcprequestanalytics** has been installed as described in chapter :doc:`10_install` * Either the binary has placed in the $PATH or the Python virtual environment has been activated and **hcprequestanalytics** can be started successfully:: $ hcprequestanalytics --version hcprequestanalytics: v.1.3.2 (2017-10-10/Sm) * HCP internal logs have been downloaded into an empty folder:: $ ls -lh HCPLogs-* -rw-r--r--@ 1 tsimons 273924073 4.3M Sep 13 07:44 HCPLogs-hcp72.archivas.com-acc-20170913-0742.zip * Enough free space is available to uncompress the log package and all *http_gateway_request.log* files contained in it Running **hcprequestanalytics** ------------------------------- Running **hcprequestanalytics** is a two step process: 1. Create and load the database from an HCP log package:: $ hcprequestanalytics load -d hcp72.db HCPLogs-hcp72.archivas.com-acc-20170913-0742.zip un-packing HCPLogs-hcp72.archivas.com-acc-20170913-0742.zip un-packing access logs for node 192.168.0.176 un-packing access logs for node 192.168.0.177 un-packing access logs for node 192.168.0.178 un-packing access logs for node 192.168.0.179 unpacking HCPLogs-hcp72.archivas.com-acc-20170913-0742.zip took 5.762 seconds reading node 176 - ./tmpdkllzu3y/.../20170812-0316/http_gateway_request.log.0 - 5,295 records reading node 176 - ./tmpdkllzu3y/.../20170813-0341/http_gateway_request.log.0 - 944 records [..] lots of more entries listed here [..] reading node 179 - ./tmpdkllzu3y/.../20170913-0328/http_gateway_request.log.0 - 1 records reading node 179 - ./tmpdkllzu3y/.../20170913-0743/http_gateway_request.log.0 - 0 records loading database with 590,734 records took 30.288 seconds $ ls -lh hcp72.db -rw-r--r-- 1 tsimons 273924073 109M Oct 10 16:58 hcp72.db You can repeat the loading for more Log packages, in which case the existing database will be used. Of course, you'll want to load Logs from a single HCP into the database, as results would be falsified, otherwise! .. Warning:: **hcprequestanalytics** doesn't check for duplicate records. That means, if you load the database with the same log package twice, the query results will be falsified, as well. 2. Run queries against the database .. Tip:: **hcprequestanalytics analyze** starts as much subprocesses as CPUs are available. Using that pool of subprocesses, it runs queries in parallel. On a 4-CPU system, the overal runtime should go down to roughtly a quarter; the limiting factors are descibed in the :doc:`80_goodtoknow` chapter. :: $ hcprequestanalytics analyze -d hcp72.db -p hcp72 scheduling these queries for analytics using 8 parallel process(es): 500_highest_throughput : The 500 records with the highest throughput (Bytes/sec) 500_largest : The records with the 500 largest requests 500_worst_latency : The records with the 500 worst latencies clientip : No. of records per client IP address clientip_httpcode : No. of records per http code per client IP address clientip_request_httpcode : No. of records per http code per request per client IP address count : No. of records, overall day : No. of records per day day_hour : No. of records per hour per day day_hour_req : No. of records per request per hour per day day_req : No. of records per request per day day_req_httpcode : No. of records per http code per request per day node : No. of records per node node_req : No. of records per request per node node_req_httpcode : No. of records per http code per request per node percentile_req : No. of records per request analysis, including percentiles for size and latency percentile_throughput_128kb : No. of records per request, with percentiles on throughput (Bytes/sec) for objects >= 128KB req : No. of records per request req_httpcode : No. of records per http code per request req_httpcode_node : No. of records per node per http code per request wait for queries finishing: count : 0.290 seconds 500_worst_latency : 0.761 seconds 500_highest_throughput : 1.298 seconds clientip : 1.436 seconds 500_largest : 1.951 seconds clientip_httpcode : 2.017 seconds day : 2.244 seconds clientip_request_httpcode : 2.553 seconds day_hour : 3.269 seconds node : 1.522 seconds percentile_throughput_128kb : 0.665 seconds node_req : 2.444 seconds day_req : 3.385 seconds day_hour_req : 3.972 seconds day_req_httpcode : 3.439 seconds node_req_httpcode : 2.643 seconds req : 1.400 seconds req_httpcode : 1.483 seconds req_httpcode_node : 1.284 seconds percentile_req : 17.030 seconds analytics finished after 20.094 seconds .. Tip:: You can run selected queries by adding them to the end of the command:: $ hcprequestanalytics -d hcp72.db analyze -p hcp72 req count This will run just the *req* and the *count* query. It's also possible to select a group of queries by adding an asteriks:: $ hcprequestanalytics -d hcp72.db analyze -p hcp72 'req*' This will run all queries beginning with *req*. Anyhow, you now have an **xlsx** (Excel) file with the results per query:: $ ls -lh *.xlsx -rw-r--r-- 1 tsimons 273924073 178K Oct 10 17:02 hcp72-analyzed.xlsx .. image:: _static/xlsx.png If you prefer comma-separated-value (CSV) files, just add ``-c`` to the analyze command:: $ hcprequestanalytics analyze -d hcp72.db -p hcp72 -c scheduling these queries for analytics using 8 parallel process(es): 500_highest_throughput : The 500 records with the highest throughput (Bytes/sec) 500_largest : The records with the 500 largest requests 500_worst_latency : The records with the 500 worst latencies clientip : No. of records per client IP address clientip_httpcode : No. of records per http code per client IP address clientip_request_httpcode : No. of records per http code per request per client IP address count : No. of records, overall day : No. of records per day day_hour : No. of records per hour per day day_hour_req : No. of records per request per hour per day day_req : No. of records per request per day day_req_httpcode : No. of records per http code per request per day node : No. of records per node node_req : No. of records per request per node node_req_httpcode : No. of records per http code per request per node percentile_req : No. of records per request analysis, including percentiles for size and latency percentile_throughput_128kb : No. of records per request, with percentiles on throughput (Bytes/sec) for objects >= 128KB req : No. of records per request req_httpcode : No. of records per http code per request req_httpcode_node : No. of records per node per http code per request wait for queries finishing: count : 0.323 seconds 500_worst_latency : 0.805 seconds clientip : 1.309 seconds 500_highest_throughput : 1.315 seconds day : 1.797 seconds clientip_httpcode : 1.807 seconds 500_largest : 2.188 seconds clientip_request_httpcode : 2.616 seconds node : 1.440 seconds day_hour : 2.970 seconds percentile_throughput_128kb : 0.627 seconds node_req : 2.144 seconds day_req : 2.890 seconds day_hour_req : 3.454 seconds day_req_httpcode : 3.087 seconds req : 1.222 seconds node_req_httpcode : 2.385 seconds req_httpcode : 1.237 seconds req_httpcode_node : 1.410 seconds percentile_req : 17.067 seconds analytics finished after 19.720 seconds You now have one **csv** file per query:: $ ls -lh *.csv -rw-r--r-- 1 tsimons 273924073 87K Oct 10 17:05 hcp72-500_highest_throughput.csv -rw-r--r-- 1 tsimons 273924073 86K Oct 10 17:05 hcp72-500_largest.csv -rw-r--r-- 1 tsimons 273924073 77K Oct 10 17:05 hcp72-500_worst_latency.csv -rw-r--r-- 1 tsimons 273924073 462B Oct 10 17:05 hcp72-clientip.csv -rw-r--r-- 1 tsimons 273924073 1.9K Oct 10 17:05 hcp72-clientip_httpcode.csv -rw-r--r-- 1 tsimons 273924073 3.0K Oct 10 17:05 hcp72-clientip_request_httpcode.csv -rw-r--r-- 1 tsimons 273924073 18B Oct 10 17:05 hcp72-count.csv -rw-r--r-- 1 tsimons 273924073 2.0K Oct 10 17:05 hcp72-day.csv -rw-r--r-- 1 tsimons 273924073 7.8K Oct 10 17:05 hcp72-day_hour.csv -rw-r--r-- 1 tsimons 273924073 18K Oct 10 17:05 hcp72-day_hour_req.csv -rw-r--r-- 1 tsimons 273924073 6.1K Oct 10 17:05 hcp72-day_req.csv -rw-r--r-- 1 tsimons 273924073 8.7K Oct 10 17:05 hcp72-day_req_httpcode.csv -rw-r--r-- 1 tsimons 273924073 359B Oct 10 17:05 hcp72-node.csv -rw-r--r-- 1 tsimons 273924073 1.2K Oct 10 17:05 hcp72-node_req.csv -rw-r--r-- 1 tsimons 273924073 3.5K Oct 10 17:05 hcp72-node_req_httpcode.csv -rw-r--r-- 1 tsimons 273924073 1.1K Oct 10 17:05 hcp72-percentile_req.csv -rw-r--r-- 1 tsimons 273924073 506B Oct 10 17:05 hcp72-percentile_throughput_128kb.csv -rw-r--r-- 1 tsimons 273924073 371B Oct 10 17:05 hcp72-req.csv -rw-r--r-- 1 tsimons 273924073 1.0K Oct 10 17:05 hcp72-req_httpcode.csv -rw-r--r-- 1 tsimons 273924073 3.5K Oct 10 17:05 hcp72-req_httpcode_node.csv