Basic Data Visualization in Python, Pt. 5
We're going to cover how to bring in features not built into python.
Posted 31 March 2020 at 9:33 AM
This is the fifth article in the series Basic Data
Visualization in Python. In this article, we're going to go beyond the
features built into the python language. If you haven't read the previous article yet, I would suggest reading
it.
Topics Covered
import
and Modules
re
and Regular Expressions
- Installing Libraries with
pip3
It's kind of a weird article because we're combining program entry points with
the more general topic of importing modules, but they should flow into each
other.
import
and Modules
You might notice that up until this point, we've been writing everything we need
in one file. As our programs grow in size and complexity, we'll often want to
break it apart into multiple files for both ease on the programmers who'll work
on the project and reuse of code.
For example, say you're writing a large project and you end up writing some
graphing software in your code. Once you move onto another project, you notice
that you're going to need the same graphing software from last time. If you had
a separate file or directory/folder in which you put everything related to the
graphing software, you would just need to copy the graphing software files into
your directory (or better yet, put them in some sort of shared directory). If,
instead, you had put everything into one file, you would have to scour the file
for all the relevant functions and then copy them into the file for the second
program. It's not going to be fun for you.
All programming languages have ways to combat this problem. In C/C++
, you use
the #include
macro, declarations, and linking (see the C series I've written for more
information). In Java
, you have one class per file and you have to use import
statements to use them. In python
, you also use an import
statement but it
works differently from the import
statement in Java
.
Put simply, the import
statement in python will run a python file from top to
bottom as if you were to execute it like we normally have and add it to the blob
that is your program.
To see this process in action, create another file in the same directory as
example.py
(which should be py_data_vis
if you were following this
tutorial). Let's call this new file impex.py
for imp
ort ex
ample. In
impex.py
, type the following code:
| print("This was printed from impex.py!")
|
Notice that we didn't put #!/usr/bin/env python3
in impex.py
. Once we've
started the python interpreter, which we'll do by running example.py
, anything
example.py
imports will be run with the python interpreter, anything imported
by anything imported by example.py
will also be run with the python
interpreter, etc. If we wanted to run impex.py
, we would have to add
#!/usr/bin/env python3
as the first line and then make it an executable with
sudo chmod +x impex.py
from the terminal. See The first article in
this series for more detailed information.
In example.py
, type the following code:
| #!/usr/bin/env python3
import impex
print("This was printed from example.py!")
|
WARNING: impex.py
and
example.py
must be in the same directory or else this program will not
work.
Anyway, if you run example.py
, the exact sequence of events will be
example.py
starts running
example.py
imports impex.py
impex.py
will run from top to bottom
impex.py
finishes running
example.py
starts up again after the import
statement
example.py
runs until it reaches the end of the file
which will give you the output
comp:py_data_vis user$ ./example.py
This was printed from impex.py!
This was printed from example.py!
If, instead, you were to flip the order of the import
statement and the
print
statement in example.py
, you'll see that they print in the opposite
order, assuming you don't modify impex.py
. The following code
| #!/usr/bin/env python3
print("This was printed from example.py!")
import impex
|
will output
user@comp:~/dev/py_data_vis$ ./example.py
This was printed from example.py!
This was printed from impex.py!
Lastly, try running the following code and see what happens:
| #!/usr/bin/env python3
import impex
import impex
print("This was printed from example.py!")
|
user@comp:~/dev/py_data_vis$ ./example.py
This was printed from impex.py!
This was printed from example.py!
Once a file has been imported, it will not run again for the rest of the
program, no matter how many times you import it. It might not make sense yet
because we haven't actually gone over the main use of importing. Once we go over
it, it should make sense.
Why Use import
You're probably thinking to yourself that all we've really done is split our
file into two files which execute one after another, and you might be asking
yourself why we would even have an import statement. In general, though, we
don't run functions like print
in files we want to import
, we just define
functions, classes, and other things. A more realistic example of importing
would have impex.py
look like
| def area_of_circle(r):
return 3.1415926 * r ** 2
def volume_of_cylinder(radius, height):
return height * area_of_circle(radius)
# A bunch of other functions
|
and example.py
look like
| #!/usr/bin/env python3
import impex
print(impex.volume_of_cylinder(10, 2))
|
We defined multiple functions in impex.py
, then used those functions using the
syntax impex.func
, where func
is the function we want to access. Knowing
that this is how we intend to use import
statements, it should make sense that
we only want to run a file once. It doesn't make sense to rerun the file to
create new functions, objects, and classes just because some other file wants
to also access them.
import
Shortcuts
Take the last example, in which we imported a function from another file. Let's
also say that we only want that one function. We can use the syntax from _____
import _____
, where the first blank is the file from which we want to import
the second blank. For example, we want to import volume_of_cylinder
from
impex.py
, so we would type
| #!/usr/bin/env python3
from impex import volume_of_cylinder
print(volume_of_cylinder(10, 2))
|
which would produce the same output. volume_of_cylinder
is a bit long and
we're only calculating one volume, so we might actually want to shorted it if
possible. We can use our old friend as
to rename the function, like so
| #!/usr/bin/env python3
from impex import volume_of_cylinder as vc
print(vc(10, 2))
|
Note that if we tried using impex.volume_of_cylinder
, impex.area_of_circle
,
impex.vc
, or volume_of_cylinder
in the last example instead of vc
, we
would get an error because we've told the python interpreter only to recognize
vc
. We could have added the previous import
statements import impex
and
from impex import volume_of_cylinder
to be able to use everything except
impex.vc
.
Now, let's change it up and say we want both volume_of_cylinder
and
area_of_circle
. In that case, we can combine the from impex import ...
statements into one, like so
| #!/usr/bin/env python3
from impex import volume_of_cylinder as vc, area_of_circle as ac
print(vc(10, 2) + ac(4))
|
We might get naming collisions if we're not careful and by making the names too
small, we may not see the problem in adding a volume to an area. You still have
to pick good variable names when importing functions.
Lastly, let's say we want to keep everything under the impex
name, but we
might want to shorten the name impex
. In that case, we can use
| #!/usr/bin/env python3
import impex as im
print(im.volume_of_cylinder(10, 2) + im.area_of_circle(4))
|
Use whatever works best for your project and keeps the meaning clear.
What's a Module?
A module is another name for a python file. That's it.
re
and Regular Expressions
Regular Expressions are a powerful programming tool for searching text.
For example, let's say you want to search a large file for all the email
addresses. In English an email address is a username (one or more letters,
digits, underscores, periods, plus signs, or dashes), followed by an @
,
followed by a website name (one or more of letters, digits, or dashes)
followed by a .
, followed by what is apparently known as a top-level domain like
com or org (letters, digits, dashes, or periods). To search the file, you could
use the following regular expression:
r"[-a-zA-Z0-9_.+]+@[-a-zA-Z0-9]+\.[-a-zA-Z0-9.]+"
,
which you can read as
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 | [ Any characters between the square brackets
- The - character
a-z The lowercase ASCII letters
A-Z The uppercase ASCII letters
0-9 Any of the digits from 0 to 9
_ The _ character
. The . character
+ The + character
]
+ One or more of whatever came before the +.
In this case, one or more of any of the
characters in the preceding square brackets
@ The @ character
[ Any characters between the square brackets
-
a-z
A-Z
0-9
]
+ One or more of the whatever came before the +.
\. The . character
[
-
a-z
A-Z
0-9
.
]
+
|
Regular expressions can get kind of complicated, but in our case, we're just
going to replace a bunch of characters with spaces so we can split
on more
than just whitespace, which I
brought up in the last article.
Anyway, to use regexes in python, we have to import the module re
, which is a
python file somewhere deep in your system files. In example.py
, type the
following and see what happens.
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | #!/usr/bin/env python3
import re
string = '''I heard him say, "Hey, John! Come over here.", then John said, "Why?", then he said "I saw 7 chickens."'''
print("Without Replacing Irrelevant Characters")
print("------------------------------------------")
print(string)
print('\n'.join(map(str, string.split())))
print()
print("With Replacing Irrelevant Characters"
print("------------------------------------------")
cleaned_string = re.sub(r"[^a-zA-Z0-9\s]", " ", string)
print(cleaned_string)
print('\n'.join(map(str, cleaned_string.split())))
|
I understand that the code is a little weird, but the output should be the
original string, then each word found using string.split()
on its own line,
then the string with all the irrelevant characters replaced with spaces, then
each word found using cleaned_string.split()
with each word on its own line.
I had to use the str.join
function (works in almost the exact opposite
way as split
by joining together a list of strings into one string separated
by some delimeter (the newline, \n
, in this case)), the map
function
(which returns a new list using the function provided as its first argument on
each of the elements of the list provided as its second argument), and the
str
function (which converts its argument into a string). None of it is
relevant to the current project of writing a program to verify Zipf's Law for
some sample text, but it's the easiest way to print out each word on its own
line. It also serves as a reminder that python has a lot of features and we
can't cover everything.
comp:py_data_vis user$ ./example.py
Without Replacing Irrelevant Characters
------------------------------------------
I heard him say, "Hey, John! Come over here.", then John said, "Why?", then he said "I saw 7 chickens."
I
heard
him
say,
"Hey,
John!
Come
over
here.",
then
John
said,
"Why?",
then
he
said
"I
saw
7
chickens."
With Replacing Irrelevant Characters
------------------------------------------
I heard him say Hey John Come over here then John said Why then he said I saw 7 chickens
I
heard
him
say
Hey
John
Come
over
here
then
John
said
Why
then
he
said
I
saw
7
chickens
Notice that re.sub
seems to have substituted a space (second argument to
re.sub
) for everything (square brackets []
) that wasn't (carat as first
character in square brackets [^]
) a letter (a-zA-Z
), a number (0-9
), or
whitespace (\s
) in the string string
(third argument to re.sub
). You
should also notice that all the words in cleaned_string
only contain letters
or numbers, but some of the words in string
contain punctuation. For the basic
word counter we're going to implement, we'll need words without punctuation so
that we count "I
and I
as the same word.
Although regular expressions are extremely useful, we will only need them for
the purpose of removing all punctuation.
Installing Libraries with pip3
To end this article, let's discuss how you can get more libraries using pip3
.
First, make sure you have it installed (See the first article where we type commands involving the
word "install" into the terminal). If you don't and you're on Windows or
Linux, type the command sudo apt install -y python-pip python3-pip
into the
terminal. If you're on Mac, type the command brew install python-pip
python3-pip
. You should already have python in the terminal, otherwise you
wouldn't be able to run any of these scripts.
Anyway, using pip3
is easy. If you're on Linux, Windows, or Mac, use the
command pip3 install [libraries]
, where libraries is a list of libraries you
want to install, separated by spaces. We're going to need the libraries
matplotlib
and scipy
, so we should type the command
user@comp:~/dev/py_data_vis$ pip3 install matplotlib scipy
A bunch of text relating to installing matplotlib and scipy
Just for fun, here's what it would look like in the Mac terminal:
comp:py_data_vis user$ pip3 install matplotlib scipy
A bunch of text relating to installing matplotlib and scipy
Then, all you have to do to use these libraries is import them like you did with
re
. To be clear, these libraries have submodules, which you can access using
.
, like so
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit as fit
As you can imagine, these are the two functions and modules we'll need.
scipy.optimize.curve_fit
will allow us to fit a curve to some data and
matplotlib.pyplot
will allow us to plot the data.
Note on Using pip
On some distros (notably any distro with the pacman
installer like Arch or
Manjaro), it's better to install these libraries using the system wide package
manager instead. For example, to install matplotlib
and scipy
, I would
instead type the command
user@comp:~/dev/py_data_vis$ pacman -S python-scipy python-matplotlib
I won't go into more detail for any specific distro or operating system because
I want you to know that some distros do it differently.
What's Next
I think we should have enough to actually write word_counter.py
, so we're
going to write it in the next article Basic Data
Visualization, Pt. 6. After that, I'm going to try to recommend a few useful
libraries and send you on your way.