TU ACM logo.
blog

Basic Data Visualization in Python, Pt. 4

We're going to cover a few features built into python.
Posted 30 March 2020 at 6:27 PM
By Joseph Mellor

This is the fourth article in the series Basic Data Visualization in Python. In this article, we're going to cover several features built into the python language. If you haven't read the previous article yet, I would suggest reading it.

Topics Covered

I'm going to have to combine with and as since with generally needs an as and we won't cover the other big use of as.

The open Statement

open, put simply, allows us to open files. It takes two arguments:

  1. The name of the file you want to open (either with a relative or an absolute path).
  2. The mode, or what you want to do with the file (read, write, append).

Relative and Absolute Paths

An absolute path is the easiest to explain, but it's usually not used as often. An absolute uses a universally defined location as its base. On Mac and Linux, / or ~ are the most common ways to start an absolute path. On Windows, C:\ is the most common way.

A relative path uses the current location as a base. It uses ../ on Mac and Linux and ..\ on Windows (but not the Windows Subsystem for Linux) to move up a directory. Other than that, they're the same. If you had a file in ~/dev/py_data_vis and you were currently in the directory ~/dev/py_data_vis, you could access it using either ~/dev/​py_data_vis/​filename or just filename. If you were in ~/dev/ and you wanted to access the same file, you could use py_data_vis/​filename or ~/dev/​py_data_vis/​filename. Lastly, if you were in py_data_vis and you wanted to access a file in dev, you could use ../filename.

The mode

A mode is a short string that tells open how to process the file. There are several modes, all of which are explained by this StackOverflow answer. I'll do a quick summary:

  1. "r": Open the file just for reading and not modifying. If the file does not exist, it will throw an error.
  2. "w": Open the file just for writing. If the file does not exist, the file will be created. Lastly, "w" will overwrite the file if it already exists.
  3. "a": Open the file just for appending. Unlike "w", "a" will not overwrite the file and instead append to the end of the file. If the file does not exist, it will be created.

You can add a "b" after every one of these to use binary mode, something I've never personally used. More importantly, you can add a "+" at the end, which will adds the ability to both read and write the file, with everything else remaining the same. For example, "r+" will not overwrite the file, but "w+" will. "a+" will open the file for both appending and reading.

Acutally Using open

Download sample-text.txt, the file we'll be analyzing. It's the text of the novel Moby Dick from Project Gutenberg, but I removed some of the Gutenberg stuff because it could intefere with the data. Then, move it into py_data_vis. On Linux, use:

user@comp:~/dev/py_data_vis$ mv ~/Downloads/sample-text.txt .

That last line uses the mv command to move ~/Downloads/​sample-text.txt to the current directory, ~/dev/​py_data_vis.

On Mac, use:

comp:py_data_vis user$ mv ~/Downloads/sample-text.txt .

On Windows, use:

user@comp:win-home/dev/py_data_vis$ mv ~/win-home/Downloads/sample-text.txt .

If you don't have a file called win-home in ~ on the Windows Susbsystem for Linux, use the following commands.

user@comp:/mnt/c/Users/[user]/dev/py_data_vis$ ln -s /mnt/c/Users/[user] win-home
user@comp:/mnt/c/Users/[user]/dev/py_data_vis$ cd ~/win-home/dev/py_data_vis
user@comp:~/win-home/dev/py_data_vis$ mv ~/win-home/Downloads/sample-text.txt .

Anyway, now that we have a file in py_data_vis to mess with, let's create a basic program to read from the file and tell us if the word the shows up anywhere in the file. Open up py_data_vis/example.py and type the following.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!/usr/bin/env python3

reader = open("sample-text.txt", "r")
have_seen_the = False
for line in reader:
    if "the" in line:
        have_seen_the = True
        break

if have_seen_the:
    print('The word "the" is in the file "sample-text.txt".')
else:
    print('The word "the" is NOT in the file "sample-text.txt".')

All good, right?

The with Statement

There's acutally a large mistake I'm making in the code that won't harm your computer in this case, but it could lead to memory leaks if you coded like this all the time, as the computer may not realize that you're done reading the file and keep it open. You need a corresponding close call, which you can add after the for loop. Even adding the close has its own problems, so we're going to turn to the with statement, which will always clean up the file after we're done with it. If we were to rewrite the previous code with a with statement, it would look like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/usr/bin/env python3

have_seen_the = False
with open("sample-text.txt", "r") as reader:
    for line in reader:
        if "the" in line:
            have_seen_the = True
            break

# reader has been closed

if have_seen_the:
    print('The word "the" is in the file "sample-text.txt".')
else:
    print('The word "the" is NOT in the file "sample-text.txt".')

In general, you want to open files using a with open(..., ...) as variable: statement. Once you exit the indentation, the with statement will close the file.

The with ... as Statement

In a few specific circumstances, you can use as to create an alias for the output of whatever comes before it. The with statement is the only situation in which we can use an as statement.

The pass Statement

pass does literally nothing. It's used as a placeholder for code you haven't written yet to make sure the code you do have still works without crashing. As an example, we're going to need to open a file in later, but we won't have established what we need to do, so we're going to add a pass statement so our code still runs. It isn't strictly necessary, as you can always just wait to write the code, but it might be useful if you want to debug a program. Here's an example usage:

with open("sample-text.txt", "r") as reader:
    pass

The above code will open sample-text.txt for reading, do nothing, and then clean up. If we didn't have the pass statement, we would get an error.

The split Function

The split function will take a string and split it into a list based on its first argument. If it has no arguments, it will split based on whitespace. You can also specify the number of splits you want to perform with another argument. Check out the full specification for the python split function if you have more questions. Also, you might want to reread the previous article.

For example, type the following in example.py:

1
2
3
4
5
#!/usr/bin/env python3

string = "The quick brown fox jumped over the lazy dog."
list_of_words = string.split()
print(list_of_words)

Why is the Order of split Wrong?


Unlike all the functions we've seen up to this point, we call this function using the syntax string.split() instead of split(string), where string is the text we want to split. Under the hood, it's essentially str.split(string), where str is the string class in python, str.split is the split method in the str class, and its first argument is string. I'll say it again for clarity: when you see object.method(args), it is equivalent to class.method(object, args), where object is an instance of class. It's not really a big deal. Until you actually make a class yourself, you may not understand what exactly is happening, but it's not a big deal.

I know the name list of words might give away what string.split() will return (which is how you name variables well), but try to guess what this will output.

user@comp:~/win-home/dev/py_data_vis$ ./example.py
['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog.']
user@comp:~/win-home/dev/py_data_vis$

Yeah, it's a list of words in the string.

More Complex Splitting

Notice that 'dog.' includes the period because split used whitespace to separate the words. If we want to split on something other than whitespace and something that isn't just a specific sequence of characters, we normally replace all the characters we want to split with spaces and then just use split without an argument. We're going to need to do this later, so keep this fact in mind.

lambda Functions

lambda functions are short, one line functions usually to implement simple tasks or mathematical functions. The have the syntax:

#!/usr/bin/env python3

length_func = lambda a, b, c: (a ** 2 + b ** 2 + c ** 2) ** (1 / 2)
print(length_func(3, 4, 12))

which will print out

user@comp:~/win-home/dev/py_data_vis$ ./example.py
13.0
user@comp:~/win-home/dev/py_data_vis$

Once you define the lambda function and store it in a variable, you can then use that variable like a function. You can also do the same thing with functions since everything in python is an object, but it doesn't come up as often.

1
2
3
4
5
6
7
#!/usr/bin/env python3

def length(a, b, c):
    return (a ** 2 + b ** 2 + c ** 2) ** (1 / 2)

length_func = length
print(length_func(3, 4, 12))

The above code and the lambda code will print out the same exact result. Just think of lambda functions as unnamed, one line functions and you'll be fine.

The sorted Function

The sorted function takes in a list and returns a sorted list in ascending order. Here's an example:

1
2
3
4
5
#!/usr/bin/env python3

unsorted_list = [ 15, 25, 62, 1, 25, 85, 41, 95 ]
sorted_list = sorted(unsorted_list)
print(sorted_list)

I'm not going to ask you for the output because it's too obvious: it's a list sorted in ascending order.

comp:py_data_vis user$ ./example.py
[1, 15, 25, 25, 41, 62, 85, 95]
comp:py_data_vis user$

Now, let's say you wanted to sort the list in descending order. You could sort the list, then reverse it, but you can also tell sorted to sort the list in descending order directly, using the reversed keyword argument.

1
2
3
4
5
#!/usr/bin/env python3

unsorted_list = [ 15, 25, 62, 1, 25, 85, 41, 95 ]
sorted_list = sorted(unsorted_list, reversed=True)
print(sorted_list)

Say instead of numbers, you had a list of tuples and you wanted to sort them by the second element. You could then use the key keyword argument.

1
2
3
4
5
6
#!/usr/bin/env python3

unsorted_list = [ (15, "apple"), (25, "bottom"), (62, "jeans"), (1, "boots"),
(25, "with"), (85, "the"), (41, "fur"), (95, "and") ]
sorted_list = sorted(unsorted_list, key = lambda x: x[1])
print(sorted_list)

If you wanted to sort the list in place, you could use the sort() method, which works just like split in that you use the syntax list_to_sort.sort(). Other than that, it's exactly the same and you can use key =... and reversed = .... If you want to know more, look up "sort list python ..." where the "..." are other conditions you want to put on it.

What's Next

While there are a lot of functions and other constructs built into python, it would kind of suck if those functions and constructs were all you can use. Luckily, python has a huge repository of libraries that will allow us to use functionality that we ourselves haven't written. In the next article, Basic Data Visualization, Pt. 5. We're going to talk about program structure, a few specific libraries, and how to install libraries from the command line using pip3.

A picture of Joseph Mellor, the author.

Joseph Mellor is a Senior at TU majoring in Physics, Computer Science, and Math. He is also the chief editor of the website and the author of the tumd markdown compiler. If you want to see more of his work, check out his personal website.
Credit to Allison Pennybaker for the picture.