Python is used widely by many data analysts and scientists to explore, visualize, and analyze data. Since Python is open source, most and if not all of its data science capabilities are built upon a few very important packages that introduce visualization capabilities and new data structures. These packages include numpy, pandas, matplotlib, plotly, and scikitlearn, to name a few. It is the most widely used general-purpose programming language by data scientists.
Tech Kits
Beginner
Pandas
Length: 30-60 minutes
Description: Pandas is a Python package used by data analysts and other professionals to analyze and explore datasets. For this, it comes with two datatypes: the Series and the Dataframe. Pandas have functions used to manipulate these data types, convert from datasets, and gain statistical insights from them.
Intermediate
Matplotlib and Plotly
Length: 30-60 minutes
Description: Plotly and matplotlib are both data visualization libraries in Python. Matplotlib is very powerful, as almost every single function introduces tons of customizability for visualizing pandas datasets. Meanwhile, Plotly is much more visually appealing and allows for interactive visualizations similar to that of Tableau. They are both very important tools to visualize data and with that, analyze it.
Advanced
Exploratory Data Analysis
Length: 30-60 minutes
Description: Exploratory data analysis is the sum of what you have done previously in the past two tech kits. It is cleaning, analyzing, and visualizing data in order to arrive at some kind of conclusion. Of course, the questions asked to arrive at such conclusions are up to whoever is performing the analysis. Therefore, this is the best way to practice data analysis that could be done in the real world, where data analysts have to make these decisions themselves independently in order to reach a certain goal.
Projects
Resources
Visual Studio Code
Type: Application
Description: VSCode is a free, open source application that allows users to edit code with the help of built0in programming features.
Python 3.5
Type: Programming Language
Description: Python is an interpreted high-level programming language for general-purpose programming. Version 3.5 is a part of the many new versions that continuously are being put out.
Google Colab
Type: Development Environment
Description: Google Colab is a code development environment that runs in the browser using Google Cloud and utilizes cloud computing.
Plotly Library
Type: Python library
Description: Plotly is a Python library that is visually appealing and allows for interactive visualization similar to that of Tableau.
Pandas Library
Type: Python library
Description: Pandas is a Python package used by data analysts and other professions to analyze and explore datasets.
Matplotlib Library
Type: Python library
Description: Matplotlib is very powerful data visualization library in Python, as almost every single function introduces tons of customizability for visualizing pandas datasets.