Applied Data Science and Machine Learning for Cybersecurity


This interactive course will teach security professionals how to use data science techniques to quickly manipulate and analyze network and security data and ultimately uncover valuable insights from this data. The course will cover the entire data science process from data preparation, feature engineering and selection, exploratory data analysis, data visualization, machine learning, model evaluation and optimization and finally, implementing at scale—all with a focus on security related problems.

Participants will learn how to read in data in a variety of common formats then write scripts to analyze and visualize that data. A non-exhaustive list of what will be covered include:

  • Using machine learning to detect network attacks within your organization
  • Hunting anomalous indicators of compromise and reducing false positives
  • Quickly and efficiently parsing executables, log files, pcap and extracting artifacts from them
  • Writing scripts to efficiently read and manipulate CSV, XML, and JSON files
  • Using the Pandas library to quickly manipulate tabular data
  • Preprocessing raw security data for machine learning and feature engineering
  • Building, applying and evaluating machine learning algorithms to identify potential threats
  • Automating the process of tuning and optimizing machine learning models
  • Using supervised learning algorithms such as Random Forests, Naive Bayes, K-Nearest Neighbors (K-NN) and Support Vector Machines (SVM) to classify malicious URLs and identify SQL Injection
  • Applying unsupervised learning algorithms such as K-Means Clustering to detect anomalous behavior
  • Rapidly and effectively visualizing data using Python

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.

A real-time capture the flag (CTF) will run throughout the class to help you sharpen your data science skills.


Key Takeaways

By the end of the course, students will be able to:
  • Use the python data science ecosystem to rapidly prepare, explore and visualize cybersecurity data
  • Build and evaluate common machine learning models and apply these techniques to cybersecurity use cases
  • Develop unsupervised models to uncover anomalies and other patterns in their cybersecurity data.


Who Should Take this Course

Anyone who wishes to incorporate automated data analysis, machine learning and data science into their cybersecurity work. Particularly those working in the following job roles:

  • Security Analyst
  • SOC Analysts
  • SOC Engineers
  • CND Analysts
  • Security Monitoring
  • System Administrators
  • Cyber Threat Investigators
  • Individuals working on a network hunt team


Audience Skill Level



Student Requirements

This is a hands-on course. To get the most out of the class and labs, students should be comfortable coding in Python as well as understand common security and network concepts.


What Students Should Bring

Students should bring a laptop with either:
  • Virtualbox (or VMWare) installed, 6GB of RAM and 10GB of storage.
  • Anaconda and IPython installed.

We strongly recommend using the virtual machine we will provide as it will give the best student experience.



What Students Will Be Provided With

A preconfigured virtual machine (VM) containing all the software needed for the class. The VM will also contain:
  • All course slides, notebooks, reference sheets and handouts/documentation
  • Skeleton code examples for in-class exercises
* Students will also be provided with access to our website which will have additional exercises.

  • $1000 USD



Call-  +8801568320150