Ingest Mobile Threat Events from Zimperium zConsole in Amazon Kinesis Data Streams & Kinesis Data Analytics

Prologue: In uncertain times; we search for security, stability, and certainty. This could be economic security, financial security, information security, or even securing your next roll of toilet paper. I want to thank medical experts around the world who have tirelessly worked around the clock to combat the COVID-19 pandemic, and help bring that feeling of security back. Their selfless actions are an inspiration to us all, and I am forever thankful for their sacrifice.

Thank you to the businesses who have stepped up and shown the best of themselves. Examples include; American Airlines & Hyatt offering complimentary vacations to medical experts, Samsung and Google offering free device repairs to medical experts, VMware partnering with NHS to develop a contact tracing application, while Google and Apple would partner to develop API's for contact tracing. Leaders outside the medical community have stepped up and shown the best of themselves as well. Jack Dorsey would donate more than 1/5th ($1 billion dollars) of his net worth to fight COVID-19. Marc Benioff would charter wide-body cargo flights full of PPE to the Bay Area. Thank you for your charity, and your generosity. There is no better time than this to use the hashtag, #TechForGood

On to the good stuff...

Hanalei Bay, Kauai

One of the ways we can find certainty is in data. Data about the past and present tells us where we've been, where we are, and gives us an idea of where we need to go. Data can help us make the right decisions, deliver better outcomes, and reduce disk. To add to this; the value of data changes over time, and where we store data changes over time. For instance, we might stream (ingest) data at extremely high volumes for real-time analysis, visualizing on dashboards, or even contact tracing. This gives us actionable data to infer from right now, right this second. 

Once the data is ingested with a distributed high-performance stream processing framework, the data can only be retained for so long before it has to be off-loaded. This is due to the technical limitations of stream processing, as the data often resides entirely in memory.

Examples to expand on this; A CEO or CISO might find value in real-time dashboard updates from data ingested with a stream processing framework such as Kafka, Kinesis or NiFi before being processed and offloaded to Cassandra, HBase or even an OLTP relational database like PostgreSQL. A financial analyst working on a quarterly 10-Q/yearly 10-K might find value in a OLTP database. Finally a Demand Planner might find value deriving information from a data warehouse with OLAP queries. 

Amazon Kinesis is comparable to Apache Kafka in many ways. Kafka is a pub/sub messaging broker developed by LinkedIn to handle an enormous volume of message on inexpensive hardware. For an idea of the scale that Kafka brings to organizations in 2019, Linkedin is pushing over 7 trillion messages a day on Kafka. 

To stream mobile threat events to Kinesis, and scale to handle trillions of consumers; you don't have to write your Kinesis producer using KPL. instead you can leverage the integration Zimperium has built in to zConsole. So, let's take a look at how to integrate zConsole with Kinesis Data Streams, Kinesis Data Analytics, and eventually Kinesis Data Firehose.

We will start by logging in to our Zimperium zConsole.
  1. Click 'Manage'
  2. Click 'Integrations' followed by 'Threat Reporting', and then 'Add Integration'
  3. Click 'Amazon Kinesis'
  4.  To fill out the following form fields, we need to login to our AWS console and setup our Kinesis Data Stream
  5. In a web browser, login to https://aws.amazon.com/ and click 'Sign in to the Console'
  6. Under 'Analytics' is where Kinesis is located, or if you have worked with Kinesis recently; you can find Kinesis under 'Recently Visited Services'
  7. Click 'Create data stream'

  8. Enter a name for the data stream. In this example; we will use zIPS, as it will be ingesting streaming threat data from zIPS. Take note of the number of open shards. For lab/demonstration purposes; 1 shard should be fine. For production, depending on the number of KCL workers instantiated (consumers of data), and volume of data/records the data stream is ingesting per second; you will have to size appropriately. Use the 'Shard Estimator' for further guidance. Click 'Create data stream'


  9. Copy the Amazon Resource Name (arn) value and paste it in a text editor for later use

  10. To create a user, in the upper left hand corner of the AWS console; click on 'Services' and click 'IAM"

  11.  Click 'Users'
  12. Click 'Add user to group', followed by 'Create group'

  13. Name the group, specify the following policies, and click 'Create Group'

    Policies associated with the group

  14. Finally click 'Create User'

  15. Copy the 'Access key ID' and 'Secret access Key'

  16. Head back over to the Zimperium zConsole, and fill in either;
    AWS Region Name
    Stream Name
    AWS Access Key ID
    AWS Secret Access Key

    OR
    AWS Region Name
    Stream Name
    Assumed Rule Name (Amazon Resource Name)

    My Kinesis environment is in the us-east-2 region. My Stream Name is zIPS, and I have pasted the Access Key ID/Secret Access Key in the example below. Click 'Go on'





  17. Source: https://aws.amazon.com/kinesis/data-streams/?nc=sn&loc=2&dn=2

    At this time, the Data Stream is setup. Threat data is stream in real time to your Kinesis Data Stream. The following infographic from Amazon does a great job showing where we can send the data. Lambda functions, Apache Spark, an EC2 instance, or Kinesis Data Analytics. I modified the 'Threat Severity Filter Settings' to export 'Normal and Above' threats. With over 100 threats in my Kinesis Data Stream; I have enough event entries to setup Kinesis Data Analytics with Apache Flink or SQL.


Kinesis Data Analytics Setup


  1.  In Kinesis, click on 'Data Analytics' on the left side of the screen

  2.  Click 'Create application'
  3. Name the application and, choose either SQL or Apache Flink. In this example, I will choose SQL.

  4. Now that you have an application created, you can connect the streaming data source you just created by clicking 'Connect streaming data'

  5. Specify the data source we just created.

  6. In 'Access Permissions', specify an existing IAM role, or have AWS create it for you automatically
  7. Click 'Discovery Schema' for the Data Analytics application to discovery the data schema.
    Note: This is only possible if you have enough recently streamed threat data.


    Failure due to lack of recently streamed threat data will look like the following;

  8. Once your schema is discovered, it will look as follows;

  9. Your application is ready, but not running. To run your application, click 'Go to SQL Editor'

  10. The following prompt will appear, prompting you to start the application, before taken to the 'Real-time analytics' SQL editor, where you can run SQL queries against your streaming data. When viewing 'Real-time analytics', new results are displayed every 2-10 seconds. If you are in a production Zimperium environment with a fair amount of devices (say 10,000 or more devices), you will begin to see threat data showing up. If you are in a decent size environment with more than 100-200k devices; you are likely going to see mobile threat data showing up consistently





After creating your SQL query, you are done! You have a Kinesis Data Stream setup, you have a Kinesis Data Analytics application setup.

You can offload streaming events to a data lake using a Kinesis Firehose Delivery Stream. Examples of where you can forward data to include Amazon S3 (Simple Storage Service for objects), Amazon Elasticsearch (distributed full-text search engine based on Lucerne), Amazon Redshift (relational data warehouse capable of scaling to petabytes), or Splunk (operational intelligence tool) - depending on your use case.  

As always, watch your cloud service bills. Colleagues at VMware often inquire about the cost associated with environments that I run in Azure, Google Cloud Platform, and Amazon Web Services.  To satisfy your curiosity; I'll go ahead and share what a small Kinesis lab costs (Firehose, Data Stream, Analytics). The dashboard below was taken on May 20th. AWS does not currently provide Kinesis as part of their always free tier

The EC2 instances are not related to Zimperium or Kinesis. The EC2 costs are spot instances for other workloads that I haven't yet migrated to GCP with Velostrata.

Stay tuned for upcoming posts covering Kinesis Data Firehose. 

Mahalo,
Ryan Pringnitz

Comments

Popular posts from this blog

Delivering Managed Configurations (key/value pairs) to Android applications with Workspace ONE UEM profiles

Clean up duplicate identities and users from Workspace ONE using REST API's and PowerShell

How to use Square's OkHttp Java library to access Workspace ONE UEM API's