We're going to break this tutorial into several articles: one to set up a python program, a few to cover the features of python we'll use, and one to actually write the program. This article will focus on our general goal for the program and setting you up for python starting from scratch.
Zipf's Law predicts that the nth most common item in a data set shows up 1/n times as often the most common item. For example, the second most common item should show up half as often as the most common item and the twentieth most common item should show up one twentieth as often as the most common element. This empirical law was originally derived from word counts, but it seems to apply to many other things like cities and their populations (the second most populous city has half as many people as the most populous people, etc.). In this tutorial, we're going to try to verify this law for a large body of text (specifically, Moby Dick because it's public domain and uses mostly ASCII characters).
We're going to focus mainly on the python in this tutorial, so we won't go into Zipf's Law any more than we have to, but if you want to know more, you can check out the Wikipedia article on Zipf's Law or the more entertaining Vsauce video "The Zipf Mystery". We're also going to be using a modified, continuous version of the Zipf Distribution in this article, known as the Pareto Distribution (a.k.a. the 80-20 Rule).
You will need:
You can also get started with python (though I wouldn't recommend it as a permanent solution) using repl.it. You don't need to create an account, but not doing so means you might not be able to access your programs later. Doing so means you would neither need a text editor nor a python interpreter, but you will need both in the future, so you might as well get them set up now.
Since this tutorial will focus on using the terminal for many reasons (most notably installing what you need to write programs in almost any language with any libraries just requires that you know the name of what you need and that using multiple programming languages together in one environment is trivial using the terminal), we are going to type stuff in the terminal.
I will display the full terminal after you've typed in everything, including
output. You should only type things that come after the dollar sign and on the
same line. For example, if you were on Mac and you should type the phrase mkdir
-p example
and hit the Return key, then type the phrase cd example
and hit
the Return key, the terminal would look like
comp:~ user$ mkdir -p example comp:~ user$ cd example comp:example user$
If it turns out to be a lot of output, then I'll say something like A bunch of
text output discussing ...
and I'll indent it. For example, if you're on Linux
or the Windows Subsystem for Linux, I'll do something like
user@comp:~$ sudo apt update [sudo] password for user: A bunch of text discussing update information user@comp:~$
First, you are going to need a text editor. You will not be able to write python code without a text editor or an IDE that includes a text editor. Put simply, a text editor will take the characters you type and put them directly into a file. It will not add anything else but the characters. If you type the letters "abcd" into a file in a text editor and save it, the file will contain just those four characters (maybe some metadata about the file itself, too). Microsoft Word (Windows), Google Docs (Browsers), Libre Office (Linux), Pages (Mac), etc. are not text editors, as they save additional data about the text, such as the font, color, which text is bold, which text is italicized, whether you have formulas, etc. I will list several text editors along with a brief description. If you do not see a program on this list, it is probably not a text editor.
If you are totally new to programming, I would recommend using a GUI text editor so you don't have to learn two things at once. If you have Eclipse, go ahead and use it. If you don't like Eclipse, use Visual Studio Code.
On MacOS, type
comp:~ user$ brew update A bunch of text discussing update information comp:~ user$ brew cask install visual-studio-code A bunch of text discussing installation information
On Linux, follow these instructions on installing Visual Studio Code on Linux.
On Windows, follow these instructions on installing Visual Studio Code on Windows.
If you're on Windows, get the Windows Subsystem for Linux, which only requires you to download the
Ubuntu app from the Windows store, turn on a setting (shown in the linked
article), restart your computer, and enter a username and a password into the
Ubuntu app and you're all set. Remember this username and password, as you
will need it later. After this point, the rest of the tutorial will focus on
using a Linux, MacOS, or Windows Subsystem for Linux terminal. While the
python
code is independent of how you run it, there are certain operations
that you need to do for every setup you have to make sure it runs properly, and
I currently want to focus on the terminal.
While there is an independent python interpreter for Windows, I would strongly
recommend that you use the Windows Subsystem for Linux because as soon as you want to do anything
outside of python
, you're stuck, especially if you want the programs to
interact in any way.
If you're on Ubuntu, Debian, Linux Mint (or any derived distro), or the Windows Subsystem for Linux, type the following into the terminal.
user@comp:~$ sudo apt update A bunch of text discussing update information user@comp:~$ sudo apt install -y python python3 python-pip python3-pip A bunch of text discussing installation information
The -y
just answers "yes" to the prompt that comes up. If you're on another
Linux distro, replace sudo apt install
with whatever package manager you use
for your distro.
If you're on MacOS, use the command
comp:~ user$ brew install python python3 python-pip python3-pip
Once you've sucessfully gotten to this point, you will not have to do any part of this process again as long as you don't uninstall anything. You will have to do everything after this point for each project, but it's not nearly as much.
If you're on Windows, open up Ubuntu and type
user@comp:~$ ln -s /mnt/c/Users/[user] win-home user@comp:~$ mkdir -p win-home/dev/py_data_vis user@comp:~$ cd win-home/dev/py_data_vis user@comp:~/win-home/dev/py_data_vis$
where [user]
is your username on your Windows computer.
If you're on Linux, open up the terminal and type
user@comp:~$ mkdir -p dev/py_data_vis user@comp:~$ cd dev/py_data_vis user@comp:~/dev/py_data_vis$
If you're on Mac, open up the terminal and type
comp:~ user$ mkdir -p dev/py_data_vis comp:~ user$ cd dev/py_data_vis comp:py_data_vis user$
From this point onwards, you should remain in this directory and there should be
no difference between operating systems. We have created an actual directory
(a.k.a. a folder) on your system that you can access normally, through the file
explorer. For Windows, this folder will be C:\Users\[user]\dev\py_data_vis
.
In your text editor of choice, create a new file called word_counter.py
and
put it in the py_data_vis
directory. This directory is ~/dev/py_data_vis
on
Linux and Mac and it is C:\Users\[user]\dev\py_data_vis
on Windows. If you're
using a graphical text editor, click File in the top left corner and
click New File (there may be shortcuts like Ctrl+N, but this method works
for almost all graphical text editors). If it prompts you to give the file a
name and a place to save it, set the name to word_counter.py
and save it in
the py_data_vis
directory. If you're on Linux or Mac and you start out in a
directory called /
with folders like bin
, boot
, dev
, etc
, and home
,
move to /home/[user]/dev/py_data_vis
, where [user]
is your username.
Visual Studio Code will prompt you for the name and where to save the file only after you save the file. I would suggest saving immediately and following the instructions in the previous paragraph.
If you're using a terminal text editor, you can type the command [editor]
word_counter.py
, where [editor]
is vim
, emacs
, or nano
.
On Linux, your terminal should look like
user@comp:~$ sudo apt update A bunch of text discussing update information user@comp:~$ sudo apt install -y python python3 python-pip python3-pip A bunch of text discussing installation information user@comp:~$ mkdir -p dev/py_data_vis user@comp:~$ cd dev/py_data_vis user@comp:~/dev/py_data_vis$
On Mac, your terminal should look like
comp:~ user$ brew update A bunch of text discussing update information comp:~ user$ brew cask install visual-studio-code A bunch of text discussing installation information comp:~ user$ brew install python python3 python-pip python3-pip A bunch of text discussing installation information comp:~ user$ mkdir -p dev/py_data_vis comp:~ user$ cd dev/py_data_vis comp:py_data_vis user$
On Windows, your terminal should look like
user@comp:~$ sudo apt update A bunch of text discussing installation information user@comp:~$ sudo apt install -y python python3 python-pip python3-pip A bunch of text discussing installation information user@comp:~$ ln -s /mnt/c/Users/[user] win-home user@comp:~$ mkdir -p win-home/dev/py_data_vis user@comp:~$ cd win-home/dev/py_data_vis user@comp:~/win-home/dev/py_data_vis$
Once you have the file opened and your current directory in the terminal is the
py_data_vis
directory, there will be no differences in this tutorial between
operating systems and I will refer exclusively to either the terminal or the
text editor.
We'll have to tell the computer two things to make word_counter.py
execute
properly:
python3
interpreter to execute the file.
In your terminal, type
user@comp:~/dev/py_data_vis$ ls -l word_counter.py -rw-r--r-- 1 user group 23 Mar 3 14:52 word_counter.py user@comp:~/dev/py_data_vis$ sudo chmod +x word_counter.py [sudo] password for user: user@comp:~/dev/py_data_vis$ ls -l word_counter.py -rwxr-xr-x 1 user group 23 Mar 3 14:52 word_counter.py
ls -l word_counter.py
will list the read/write/execute
permissions for word_counter.py
along with the date of the last change to the
file. The leftmost -
means that the file is just a regular file (as opposed to
a directory or other things). After that, every three characters refers to
different classes of users: current user, current group, and everyone else. r
refers to read permissions, w
refers to write permissions, and
x
refers to execute permissions. You can therefore read the output of
ls -l word_counter.py
as
A normal file can be read and modified byuser
, read by users in the groupgroup
, and read by everyone else. It has23
bytes. It was last modified onMarch 3
at14:52
(i.e. 2:52 PM) and the name of the file isword_counter.py
.
You'll see that there's a bit of a problem since we want to execute the program
but no one has permission to execute the program. We can change this with the
chmod
command. Since we'll have to run it as an admin (a.k.a. root on Linux
and MacOS), we'll put sudo
in front of the command, which stands for "s
uper
u
ser do
". We add the option +x
to add
ex
ecute permissions for everyone
(you can do it for individual users but it won't matter here). Lastly, we want
to change the permissions for the file word_counter.py
, so we'll need to add
that into the command, meaning we have to type sudo chmod +x word_counter.py
.
You'll see the password prompt pop up for your password. On the Windows Susbsytem for Linux, type in the password that you first typed into the terminal. On Linux or Mac, type in the same password you use to log into your computer. Do not be worried if nothing shows up when you type — it's a security measure to prevent people from seeing any information about your password. Hit Enter/Return to confirm your password.
After running ls -l word_counter.py
again, you'll see that the script now has
a bunch of x
's, meaning anyone can execute it.
python3
to Execute word_counter.py
In word_counter.py
in your text editor, type
1 | #!/usr/bin/env python3
|
This line of code is a comment in python
since comments start with a
#
in python
, but scripts executed from the terminal will read the first line
and look for a #!
(known as a shebang), which tells the terminal to use
the python3
interpreter when executing this file.
Hello, World!
Since your first program that does anything must print Hello, World!
by tradition,
we're going to modify word_counter.py
to print out Hello, World!
, but then we're
going to remove it immediately afterwards since we don't need it. In python
,
we just need to type print("Hello, World")
in word_counter.py
, so our file
should now look like
1 2 3 | #!/usr/bin/env python3 print("Hello, World!") |
Save this file and run it from the terminal using ./word_counter.py
, which
should look like
user@comp:~/dev/py_data_vis$ ./word_counter.py Hello, World! user@comp:~/dev/py_data_vis$
Then, remove the Hello, World!
line and save the file again.
Now that we have an executable python
script, we could start writing some
python
code. First, I would like to devote an article to covering the features of python
we'll use in this program
since these features are independent of this specific program and you can use
them in any python
program.