TU ACM logo.
blog

Basic Data Visualization in Python, Pt. 1

Let's set up a basic python program.
Posted 29 February 2020 at 12:36 AM
By Joseph Mellor

We're going to break this tutorial into several articles: one to set up a python program, a few to cover the features of python we'll use, and one to actually write the program. This article will focus on our general goal for the program and setting you up for python starting from scratch.

An Unusual Pattern

Zipf's Law predicts that the nth most common item in a data set shows up 1/n times as often the most common item. For example, the second most common item should show up half as often as the most common item and the twentieth most common item should show up one twentieth as often as the most common element. This empirical law was originally derived from word counts, but it seems to apply to many other things like cities and their populations (the second most populous city has half as many people as the most populous people, etc.). In this tutorial, we're going to try to verify this law for a large body of text (specifically, Moby Dick because it's public domain and uses mostly ASCII characters).

We're going to focus mainly on the python in this tutorial, so we won't go into Zipf's Law any more than we have to, but if you want to know more, you can check out the Wikipedia article on Zipf's Law or the more entertaining Vsauce video "The Zipf Mystery". We're also going to be using a modified, continuous version of the Zipf Distribution in this article, known as the Pareto Distribution (a.k.a. the 80-20 Rule).

Getting Started with Python

You will need:

  1. A text editor.
  2. A python interpreter which you can easily get using the Windows Subsytem for Linux or any POSIX (Linux, MacOS, etc.) terminal.

You can also get started with python (though I wouldn't recommend it as a permanent solution) using repl.it. You don't need to create an account, but not doing so means you might not be able to access your programs later. Doing so means you would neither need a text editor nor a python interpreter, but you will need both in the future, so you might as well get them set up now.

How to Read the Terminals

Since this tutorial will focus on using the terminal for many reasons (most notably installing what you need to write programs in almost any language with any libraries just requires that you know the name of what you need and that using multiple programming languages together in one environment is trivial using the terminal), we are going to type stuff in the terminal.

I will display the full terminal after you've typed in everything, including output. You should only type things that come after the dollar sign and on the same line. For example, if you were on Mac and you should type the phrase mkdir -p example and hit the Return key, then type the phrase cd example and hit the Return key, the terminal would look like

comp:~ user$ mkdir -p example
comp:~ user$ cd example
comp:example user$

If it turns out to be a lot of output, then I'll say something like A bunch of text output discussing ... and I'll indent it. For example, if you're on Linux or the Windows Subsystem for Linux, I'll do something like

user@comp:~$ sudo apt update
[sudo] password for user:

    A bunch of text discussing update information

user@comp:~$

Text Editors

First, you are going to need a text editor. You will not be able to write python code without a text editor or an IDE that includes a text editor. Put simply, a text editor will take the characters you type and put them directly into a file. It will not add anything else but the characters. If you type the letters "abcd" into a file in a text editor and save it, the file will contain just those four characters (maybe some metadata about the file itself, too). Microsoft Word (Windows), Google Docs (Browsers), Libre Office (Linux), Pages (Mac), etc. are not text editors, as they save additional data about the text, such as the font, color, which text is bold, which text is italicized, whether you have formulas, etc. I will list several text editors along with a brief description. If you do not see a program on this list, it is probably not a text editor.

If you are totally new to programming, I would recommend using a GUI text editor so you don't have to learn two things at once. If you have Eclipse, go ahead and use it. If you don't like Eclipse, use Visual Studio Code.

Installing Visual Studio Code

On MacOS, type

comp:~ user$ brew update

    A bunch of text discussing update information

comp:~ user$ brew cask install visual-studio-code

    A bunch of text discussing installation information

On Linux, follow these instructions on installing Visual Studio Code on Linux.

On Windows, follow these instructions on installing Visual Studio Code on Windows.

Python Interpreter

If you're on Windows, get the Windows Subsystem for Linux, which only requires you to download the Ubuntu app from the Windows store, turn on a setting (shown in the linked article), restart your computer, and enter a username and a password into the Ubuntu app and you're all set. Remember this username and password, as you will need it later. After this point, the rest of the tutorial will focus on using a Linux, MacOS, or Windows Subsystem for Linux terminal. While the python code is independent of how you run it, there are certain operations that you need to do for every setup you have to make sure it runs properly, and I currently want to focus on the terminal.

Independent Python Interpreter


While there is an independent python interpreter for Windows, I would strongly recommend that you use the Windows Subsystem for Linux because as soon as you want to do anything outside of python, you're stuck, especially if you want the programs to interact in any way.

If you're on Ubuntu, Debian, Linux Mint (or any derived distro), or the Windows Subsystem for Linux, type the following into the terminal.

user@comp:~$ sudo apt update

    A bunch of text discussing update information

user@comp:~$ sudo apt install -y python python3 python-pip python3-pip

    A bunch of text discussing installation information

The -y just answers "yes" to the prompt that comes up. If you're on another Linux distro, replace sudo apt install with whatever package manager you use for your distro.

If you're on MacOS, use the command

comp:~ user$ brew install python python3 python-pip python3-pip

Once you've sucessfully gotten to this point, you will not have to do any part of this process again as long as you don't uninstall anything. You will have to do everything after this point for each project, but it's not nearly as much.

Making the Directory

If you're on Windows, open up Ubuntu and type

user@comp:~$ ln -s /mnt/c/Users/[user] win-home
user@comp:~$ mkdir -p win-home/dev/py_data_vis
user@comp:~$ cd win-home/dev/py_data_vis
user@comp:~/win-home/dev/py_data_vis$

where [user] is your username on your Windows computer.

If you're on Linux, open up the terminal and type

user@comp:~$ mkdir -p dev/py_data_vis
user@comp:~$ cd dev/py_data_vis
user@comp:~/dev/py_data_vis$

If you're on Mac, open up the terminal and type

comp:~ user$ mkdir -p dev/py_data_vis
comp:~ user$ cd dev/py_data_vis
comp:py_data_vis user$

From this point onwards, you should remain in this directory and there should be no difference between operating systems. We have created an actual directory (a.k.a. a folder) on your system that you can access normally, through the file explorer. For Windows, this folder will be C:\Users\[user]\dev\py_data_vis.

Creating Our New Python Script

In your text editor of choice, create a new file called word_counter.py and put it in the py_data_vis directory. This directory is ~/dev/py_data_vis on Linux and Mac and it is C:\Users\[user]\dev\py_data_vis on Windows. If you're using a graphical text editor, click File in the top left corner and click New File (there may be shortcuts like Ctrl+N, but this method works for almost all graphical text editors). If it prompts you to give the file a name and a place to save it, set the name to word_counter.py and save it in the py_data_vis directory. If you're on Linux or Mac and you start out in a directory called / with folders like bin, boot, dev, etc, and home, move to /home/[user]/dev/py_data_vis, where [user] is your username.

Visual Studio Code will prompt you for the name and where to save the file only after you save the file. I would suggest saving immediately and following the instructions in the previous paragraph.

If you're using a terminal text editor, you can type the command [editor] word_counter.py, where [editor] is vim, emacs, or nano.

On Linux, your terminal should look like

user@comp:~$ sudo apt update

    A bunch of text discussing update information

user@comp:~$ sudo apt install -y python python3 python-pip python3-pip

    A bunch of text discussing installation information

user@comp:~$ mkdir -p dev/py_data_vis
user@comp:~$ cd dev/py_data_vis
user@comp:~/dev/py_data_vis$

On Mac, your terminal should look like

comp:~ user$ brew update

    A bunch of text discussing update information

comp:~ user$ brew cask install visual-studio-code

    A bunch of text discussing installation information

comp:~ user$ brew install python python3 python-pip python3-pip

    A bunch of text discussing installation information

comp:~ user$ mkdir -p dev/py_data_vis
comp:~ user$ cd dev/py_data_vis
comp:py_data_vis user$

On Windows, your terminal should look like

user@comp:~$ sudo apt update

    A bunch of text discussing installation information

user@comp:~$ sudo apt install -y python python3 python-pip python3-pip

    A bunch of text discussing installation information

user@comp:~$ ln -s /mnt/c/Users/[user] win-home
user@comp:~$ mkdir -p win-home/dev/py_data_vis
user@comp:~$ cd win-home/dev/py_data_vis
user@comp:~/win-home/dev/py_data_vis$

Once you have the file opened and your current directory in the terminal is the py_data_vis directory, there will be no differences in this tutorial between operating systems and I will refer exclusively to either the terminal or the text editor.

Making an Executable Python Script

We'll have to tell the computer two things to make word_counter.py execute properly:

  1. We have permission to execute the file.
  2. Use the python3 interpreter to execute the file.

Changing Permissions

In your terminal, type

user@comp:~/dev/py_data_vis$ ls -l word_counter.py
-rw-r--r-- 1 user group     23 Mar  3 14:52 word_counter.py
user@comp:~/dev/py_data_vis$ sudo chmod +x word_counter.py
[sudo] password for user:
user@comp:~/dev/py_data_vis$ ls -l word_counter.py
-rwxr-xr-x 1 user group     23 Mar  3 14:52 word_counter.py

ls -l word_counter.py will list the read/​write/​execute permissions for word_counter.py along with the date of the last change to the file. The leftmost - means that the file is just a regular file (as opposed to a directory or other things). After that, every three characters refers to different classes of users: current user, current group, and everyone else. r refers to read permissions, w refers to write permissions, and x refers to execute permissions. You can therefore read the output of ls -l word_counter.py as

A normal file can be read and modified by user, read by users in the group group, and read by everyone else. It has 23 bytes. It was last modified on March 3 at 14:52 (i.e. 2:52 PM) and the name of the file is word_counter.py.

You'll see that there's a bit of a problem since we want to execute the program but no one has permission to execute the program. We can change this with the chmod command. Since we'll have to run it as an admin (a.k.a. root on Linux and MacOS), we'll put sudo in front of the command, which stands for "super user do". We add the option +x to add execute permissions for everyone (you can do it for individual users but it won't matter here). Lastly, we want to change the permissions for the file word_counter.py, so we'll need to add that into the command, meaning we have to type sudo chmod +x word_counter.py.

You'll see the password prompt pop up for your password. On the Windows Susbsytem for Linux, type in the password that you first typed into the terminal. On Linux or Mac, type in the same password you use to log into your computer. Do not be worried if nothing shows up when you type — it's a security measure to prevent people from seeing any information about your password. Hit Enter/Return to confirm your password.

After running ls -l word_counter.py again, you'll see that the script now has a bunch of x's, meaning anyone can execute it.

Telling the Computer to Use python3 to Execute word_​counter​.py

In word_counter.py in your text editor, type

1
#!/usr/bin/env python3

This line of code is a comment in python since comments start with a # in python, but scripts executed from the terminal will read the first line and look for a #! (known as a shebang), which tells the terminal to use the python3 interpreter when executing this file.

Obligatory Hello, World!

Since your first program that does anything must print Hello, World! by tradition, we're going to modify word_counter.py to print out Hello, World!, but then we're going to remove it immediately afterwards since we don't need it. In python, we just need to type print("Hello, World") in word_counter.py, so our file should now look like

1
2
3
#!/usr/bin/env python3

print("Hello, World!")

Save this file and run it from the terminal using ./word_counter.py, which should look like

user@comp:~/dev/py_data_vis$ ./word_counter.py
Hello, World!
user@comp:~/dev/py_data_vis$

Then, remove the Hello, World! line and save the file again.

Summary

Now that we have an executable python script, we could start writing some python code. First, I would like to devote an article to covering the features of python we'll use in this program since these features are independent of this specific program and you can use them in any python program.

A picture of Joseph Mellor, the author.

Joseph Mellor is a Senior at TU majoring in Physics, Computer Science, and Math. He is also the chief editor of the website and the author of the tumd markdown compiler. If you want to see more of his work, check out his personal website.
Credit to Allison Pennybaker for the picture.