Introducing our AWS S3 Logs Parser PHP package

A client of ours wanted to have daily stats about downloads of media files hosted on AWS S3, as well as information about the traffic generated from these downloads. Unfortunately AWS doesn’t have a service or an API which could return the information we needed. We had to find another way to solve this case.

Overview

A client of ours - a multinational pharmaceutical corporation with tens of thousands of employees around the world, contracted us to develop a platform for creating, maintaining, and consuming internal podcast channels and episodes. The goal of the project was to allow all the company employees to access and subscribe/download podcasts when connected to the company internal network and using common public podcast apps.

The project

The project is built on top of the Laravel PHP Framework and uses many of the services provided by Amazon Web Services (AWS). The library can be managed from the web app which is responsible only for providing the users with the content’s source. All audio files used in the podcast’s episodes, as well as the podcasts’ XML feeds, are stored in AWS Amazon S3 and can be used directly from the dedicated S3 buckets (which are also accessible only via the company’s internal network).

The problem

The client wanted to have daily stats on the episodes downloads and information about the traffic generated by the podcasts subscribers.

When planning the availability of the platform we decided that we should not have the podcasts consumption depend on the web application. We can’t afford to interrupt the users’ streaming process in case the webserver falls down and is temporarily unavailable for some reason. Such server issues should only affect the web application and the content administrators, and not the thousands of the company’s members who listen to the podcasts via their client apps.

Additionally we wanted to keep the architecture of the project as clean as possible so we wouldn’t have to plan any auto-scaling features in the near future in case of consumption increase. In order to solve these issues we decided to use Amazon S3 directly as a content source because we didn’t want to rely on a proxy script to serve the audio files.

However, after reviewing what stats we can get from AWS S3 we found out that unfortunately AWS doesn’t have a service or an API which could return the information we needed. We had to find another way to solve this case.

The solution

AWS Amazon S3 has an option to activate Amazon S3 Server Access Logging. It provides detailed records of the requests made to a bucket. Each access log record provides details about every single access request, such as the requester, bucket name, request time, request action, response status and an error code, if relevant. By default, logging is disabled. When it is enabled, logs are saved to a bucket in the same AWS Region as the source bucket. Well, sounds like all of the information we need, but in raw format.

We built an open source PHP package to parse Amazon S3 access logs into a readable JSON format. It can be integrated in any PHP platform (Laravel, Symfony, Drupal and so on) via composer. S3 Logs Parser gets the total number of downloads and the transferred bytes for every bucket's file per day. You can easily collect this information daily with a cron job and store it in your local database to see bucket's files usage. The setup of the parser is easy and straightforward so here is a step-by-step integration guide.

Note: If you don’t have a bucket already you have to sign up for a new Amazon AWS account and obtain your credentials. Once you’re ready with this step and have an active S3 account you can proceed to create your bucket via the AWS Management Console. Then finally you have to enable logging so S3 will start to deliver access logs for your source bucket to a target bucket that you choose. Otherwise our parser won’t be able to access the information you need.

<?php

use S3LogsParser\S3LogsParser;

$S3LogsParser = new S3LogsParser([
    'version' => 'latest',
    'region' => $awsBucketRegion,
    'access_key' => $awsAccessKey,
    'secret_key' => $awsSecretKey,
]);

$output = $S3LogsParser->getStats($awsBucketName, $awsBucketPrefix, $date);
dd($output);

In order to initialize the service you will need a few input parameters - the details of the bucket itself and a few access keys parameters which you can obtain from AWS. Then you can easily use the parser object to collect the stats you need. The output of the result should look like this:

{
    "success":true,
    "statistics":{
        "bucket":"bn-test",
        "prefix":"bp-2018-10-31",
        "data":{
            "test.png":{
                "downloads":4,
                "bandwidth":4096
            },
            "test2.png":{
                "downloads":2,
                "bandwidth":2048
            }
        }
    }
}

If you are having any troubles, please don’t hesitate to open a ticket on GitHub - https://github.com/mtrdesign/s3-logs-parser/issues