

  • hcprequestanalytics has been installed as described in chapter Installation

  • Either the binary has placed in the $PATH or the Python virtual environment has been activated and hcprequestanalytics can be started successfully:

    $ hcprequestanalytics --version
      hcprequestanalytics: v.1.3.2 (2017-10-10/Sm)
  • HCP internal logs have been downloaded into an empty folder:

    $ ls -lh HCPLogs-*
    -rw-r--r--@  1 tsimons  273924073   4.3M Sep 13 07:44
  • Enough free space is available to uncompress the log package and all http_gateway_request.log files contained in it

Running hcprequestanalytics

Running hcprequestanalytics is a two step process:

  1. Create and load the database from an HCP log package:

    $ hcprequestanalytics load -d hcp72.db
            un-packing access logs for node
            un-packing access logs for node
            un-packing access logs for node
            un-packing access logs for node
    unpacking took 5.762 seconds
            reading node 176 - ./tmpdkllzu3y/.../20170812-0316/http_gateway_request.log.0 - 5,295 records
            reading node 176 - ./tmpdkllzu3y/.../20170813-0341/http_gateway_request.log.0 - 944 records
                lots of more entries listed here
            reading node 179 - ./tmpdkllzu3y/.../20170913-0328/http_gateway_request.log.0 - 1 records
            reading node 179 - ./tmpdkllzu3y/.../20170913-0743/http_gateway_request.log.0 - 0 records
    loading database with 590,734 records took 30.288 seconds
    $ ls -lh hcp72.db
    -rw-r--r--  1 tsimons  273924073   109M Oct 10 16:58 hcp72.db

    You can repeat the loading for more Log packages, in which case the existing database will be used.

    Of course, you’ll want to load Logs from a single HCP into the database, as results would be falsified, otherwise!


    hcprequestanalytics doesn’t check for duplicate records. That means, if you load the database with the same log package twice, the query results will be falsified, as well.

  2. Run queries against the database


    hcprequestanalytics analyze starts as much subprocesses as CPUs are available. Using that pool of subprocesses, it runs queries in parallel. On a 4-CPU system, the overal runtime should go down to roughtly a quarter; the limiting factors are descibed in the Good to know chapter.

    $ hcprequestanalytics analyze -d hcp72.db -p hcp72
    scheduling these queries for analytics using 8 parallel process(es):
            500_highest_throughput        : The 500 records with the highest throughput (Bytes/sec)
            500_largest                   : The records with the 500 largest requests
            500_worst_latency             : The records with the 500 worst latencies
            clientip                      : No. of records per client IP address
            clientip_httpcode             : No. of records per http code per client IP address
            clientip_request_httpcode     : No. of records per http code per request per client IP address
            count                         : No. of records, overall
            day                           : No. of records per day
            day_hour                      : No. of records per hour per day
            day_hour_req                  : No. of records per request per hour per day
            day_req                       : No. of records per request per day
            day_req_httpcode              : No. of records per http code per request per day
            node                          : No. of records per node
            node_req                      : No. of records per request per node
            node_req_httpcode             : No. of records per http code per request per node
            percentile_req                : No. of records per request analysis, including percentiles for size and latency
            percentile_throughput_128kb   : No. of records per request, with percentiles on throughput (Bytes/sec) for objects >= 128KB
            req                           : No. of records per request
            req_httpcode                  : No. of records per http code per request
            req_httpcode_node             : No. of records per node per http code per request
    wait for queries finishing:
            count                         : 0.290 seconds
            500_worst_latency             : 0.761 seconds
            500_highest_throughput        : 1.298 seconds
            clientip                      : 1.436 seconds
            500_largest                   : 1.951 seconds
            clientip_httpcode             : 2.017 seconds
            day                           : 2.244 seconds
            clientip_request_httpcode     : 2.553 seconds
            day_hour                      : 3.269 seconds
            node                          : 1.522 seconds
            percentile_throughput_128kb   : 0.665 seconds
            node_req                      : 2.444 seconds
            day_req                       : 3.385 seconds
            day_hour_req                  : 3.972 seconds
            day_req_httpcode              : 3.439 seconds
            node_req_httpcode             : 2.643 seconds
            req                           : 1.400 seconds
            req_httpcode                  : 1.483 seconds
            req_httpcode_node             : 1.284 seconds
            percentile_req                : 17.030 seconds
    analytics finished after 20.094 seconds


    You can run selected queries by adding them to the end of the command:

    $ hcprequestanalytics -d hcp72.db analyze -p hcp72 req count

    This will run just the req and the count query.

    It’s also possible to select a group of queries by adding an asteriks:

    $ hcprequestanalytics -d hcp72.db analyze -p hcp72 'req*'

    This will run all queries beginning with req.

    Anyhow, you now have an xlsx (Excel) file with the results per query:

    $ ls -lh *.xlsx
    -rw-r--r--  1 tsimons  273924073   178K Oct 10 17:02 hcp72-analyzed.xlsx

    If you prefer comma-separated-value (CSV) files, just add -c to the analyze command:

    $ hcprequestanalytics analyze -d hcp72.db -p hcp72 -c
    scheduling these queries for analytics using 8 parallel process(es):
            500_highest_throughput        : The 500 records with the highest throughput (Bytes/sec)
            500_largest                   : The records with the 500 largest requests
            500_worst_latency             : The records with the 500 worst latencies
            clientip                      : No. of records per client IP address
            clientip_httpcode             : No. of records per http code per client IP address
            clientip_request_httpcode     : No. of records per http code per request per client IP address
            count                         : No. of records, overall
            day                           : No. of records per day
            day_hour                      : No. of records per hour per day
            day_hour_req                  : No. of records per request per hour per day
            day_req                       : No. of records per request per day
            day_req_httpcode              : No. of records per http code per request per day
            node                          : No. of records per node
            node_req                      : No. of records per request per node
            node_req_httpcode             : No. of records per http code per request per node
            percentile_req                : No. of records per request analysis, including percentiles for size and latency
            percentile_throughput_128kb   : No. of records per request, with percentiles on throughput (Bytes/sec) for objects >= 128KB
            req                           : No. of records per request
            req_httpcode                  : No. of records per http code per request
            req_httpcode_node             : No. of records per node per http code per request
    wait for queries finishing:
            count                         : 0.323 seconds
            500_worst_latency             : 0.805 seconds
            clientip                      : 1.309 seconds
            500_highest_throughput        : 1.315 seconds
            day                           : 1.797 seconds
            clientip_httpcode             : 1.807 seconds
            500_largest                   : 2.188 seconds
            clientip_request_httpcode     : 2.616 seconds
            node                          : 1.440 seconds
            day_hour                      : 2.970 seconds
            percentile_throughput_128kb   : 0.627 seconds
            node_req                      : 2.144 seconds
            day_req                       : 2.890 seconds
            day_hour_req                  : 3.454 seconds
            day_req_httpcode              : 3.087 seconds
            req                           : 1.222 seconds
            node_req_httpcode             : 2.385 seconds
            req_httpcode                  : 1.237 seconds
            req_httpcode_node             : 1.410 seconds
            percentile_req                : 17.067 seconds
    analytics finished after 19.720 seconds

    You now have one csv file per query:

    $ ls -lh *.csv
    -rw-r--r--  1 tsimons  273924073    87K Oct 10 17:05 hcp72-500_highest_throughput.csv
    -rw-r--r--  1 tsimons  273924073    86K Oct 10 17:05 hcp72-500_largest.csv
    -rw-r--r--  1 tsimons  273924073    77K Oct 10 17:05 hcp72-500_worst_latency.csv
    -rw-r--r--  1 tsimons  273924073   462B Oct 10 17:05 hcp72-clientip.csv
    -rw-r--r--  1 tsimons  273924073   1.9K Oct 10 17:05 hcp72-clientip_httpcode.csv
    -rw-r--r--  1 tsimons  273924073   3.0K Oct 10 17:05 hcp72-clientip_request_httpcode.csv
    -rw-r--r--  1 tsimons  273924073    18B Oct 10 17:05 hcp72-count.csv
    -rw-r--r--  1 tsimons  273924073   2.0K Oct 10 17:05 hcp72-day.csv
    -rw-r--r--  1 tsimons  273924073   7.8K Oct 10 17:05 hcp72-day_hour.csv
    -rw-r--r--  1 tsimons  273924073    18K Oct 10 17:05 hcp72-day_hour_req.csv
    -rw-r--r--  1 tsimons  273924073   6.1K Oct 10 17:05 hcp72-day_req.csv
    -rw-r--r--  1 tsimons  273924073   8.7K Oct 10 17:05 hcp72-day_req_httpcode.csv
    -rw-r--r--  1 tsimons  273924073   359B Oct 10 17:05 hcp72-node.csv
    -rw-r--r--  1 tsimons  273924073   1.2K Oct 10 17:05 hcp72-node_req.csv
    -rw-r--r--  1 tsimons  273924073   3.5K Oct 10 17:05 hcp72-node_req_httpcode.csv
    -rw-r--r--  1 tsimons  273924073   1.1K Oct 10 17:05 hcp72-percentile_req.csv
    -rw-r--r--  1 tsimons  273924073   506B Oct 10 17:05 hcp72-percentile_throughput_128kb.csv
    -rw-r--r--  1 tsimons  273924073   371B Oct 10 17:05 hcp72-req.csv
    -rw-r--r--  1 tsimons  273924073   1.0K Oct 10 17:05 hcp72-req_httpcode.csv
    -rw-r--r--  1 tsimons  273924073   3.5K Oct 10 17:05 hcp72-req_httpcode_node.csv