TU ACM logo.
blog

Basic Data Visualization in Python, Pt. 2

The basic ways of organizing and manipulating data in python.
Posted 5 March 2020 at 8:33 AM
By Joseph Mellor

This is the second article in the series Basic Data Visualization in Python. In this article, we're going to cover the basics of data manipulation in python using the data structures built into the language. If you haven't read the first article yet, I would suggest reading it so you can set up python for this project.

Topics Covered

Since How python Handles Statements is a short but necessary topic, I'm including it within this article instead of putting it in its own article.

How python Handles Statements

python will read your statements from top to bottom in the file.

Unlike languages like C, C++, and Java, statements in python end when the line ends. Furthermore, instead of using curly braces, python uses indentation and a colon to indicate which statements are grouped together.

Statements in C, C++, and Java:

statement;

control_statement (data) {
    statement;
    statement;
    ...
}

Statements in python:

statement

compound_statement (data):
    statement
    statement
    ...

We'll go into more detail for the compound_statement syntax later in the next article.

Experimenting

I encourage you to use the python interpreter to experiment with any of these concepts that I'll introduce in this article. You can start it up using the command python3 in the terminal (Terminal/Terminal Emulator on Linux and Mac and the Ubuntu App if you're using the Windows Subsystem for Linux. Read the linked article for how to set up the Windows Subsystem for Linux) like so:

user@comp:~/dev/py_data_vis$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The next three lines after you type Enter/Return are just version information about which python you're using. As long as you see a >>>, you're probably fine. You can then type any expression and see the result pop up. For example, if you were on MacOS and you wanted to see the result of 3 + 2, you would type

comp:py_data_vis user$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on mac
Type "help", "copyright", "credits" or "license" for more information.
>>> 3 + 2
5

If you want to see the value of a variable, you can just type the name of the variable like so:

user@comp:~/dev/py_data_vis$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3 + 2
>>> a
5

Once you're finished experimenting, exit using Ctrl + D to return to the terminal.

Objects

Unlike languages like C, C++, and Java, there are no explicitly declared types. Instead (if you'll excuse the analogy), each variable is like a large box that can hold anything of any type inside it, where the box is completely separate from its contents. Objects can have types, but the variables themselves do not. When you use a variable in your code, the python interpreter will take whatever object is stored in the variable and do what the code says to do.

You can assign variables using the syntax [variable] = [Object], like so:

a = 7
b = "Test"
c = 1.3
a = "Test 2"

If represents an arbitrary operation (such as +, -, *, etc., which we'll discuss more in the next section), then a •= b is equivalent to a = a • b. For example,

comp:py_data_vis user$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on mac
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> a += 6
>>> a
11

since a += 6 would expand out to a = a + 6a = 5 + 6a = 11.

Numbers

As in most programming languages, there are two types of numbers: integers and floating point numbers. Integers are the the numbers {0, 1, 2, 3, ...} and their negatives and they convert directly into Two's Complement Integers. Unlike other languages, they don't have a fixed width, meaning you don't have to worry about integer overflow. Floating point numbers are built into the hardware, so they generally follow the IEEE-754 floating point precision format, specifically the Double-Precision Floating-Point Format. Floating point numbers are numbers written in scientific notation (1.23400 × 103, but in base two. You have ~16 decimal digits of precision and you can represent numbers from ~10-308 to 10308.

If you would not describe something using scientific notation, you probably should use an integer. To elaborate, if a value is discrete, like the number of elements in a list, the number of people in a store, or the amount of money you have (you can't have half a cent and you wouldn't say "I have $1.23400 × 103 in my bank account), use an integer. If a value is continuous, like the amount of water you have in a bottle, the average height of all the people in a store, or the exchange rates between currencies, use a floating point number.

You can make an integer by declaring a number without a decimal point and a floating point number by declaring a number with a decimal point, like so:

integer = 1
floating_point = 1.

As you would expect, you can add, subtract, multiply, divide, and use parentheses to group numbers, but you can also take the remainder of a division, raise them to powers, and perform integer division (rounds to the integer closest to zero), like so

addition_example = 1 + 2                # 1 + 2         ->   3
subtraction_example = 1 - 2             # 1 - 2         ->  -1
multiplication_example = 2 * 3          # 2 * 3         ->   6
division_example = 1 / 2                # 1 / 2         ->   0.5
remainder_example = 20 % 7              # 20 % 7        ->   6
square_example = 5 ** 2                 # 5 ** 2        ->  25
cube_example = 5 ** 3                   # 5 ** 3        -> 125
square_root_example = 25 ** (1 / 2)     # 25 ** (1 / 2) ->   5.0
integer_division_example = 1 // 2       # 1 // 2        ->   0
parentheses_example = (1 + 2) / 2       # (1 + 2) / 2   ->   1.5

If you want, you can experiment with this after this article to learn more. Try using the python interpreter (instructions are found in the section Experimenting above) to answer the following questions (mouse over the rectangles below to see the solutions):

Lists

Lists in python are modifiable ordered collections of objects of modifiable length. You can create a list using [e0, e1, e2,...], and you access elements from that list using a[i], where a is the list and i is the number of elements after the first element. You can treat a[i] just like a variable. For example,

>>> a = [ 1, 2, 3 ]
>>> a[0]
1
>>> a[1]
2
>>> a[0] = 7
>>> a[0]
7
>>> a
[ 7, 2, 3 ]

As you can see, we created a list with elements {1, 2, 3}, printed out the first two, and changed the first element using a[0] = 7. We can also go backwards through the list using a[i], with i being negative. The last element is -1, the second to last element is -2, etc. Lastly, we can take sections of a list (known as list slicing) using a[start:stop:step], where start is the first element we want, stop is the element after the last element we want, and step is how many elements we want to go over. The easiest way to see this in action is to see a bunch of examples, so try to guess what the following statements will print out:

>>> a = [ 1, 2, 3, 4, 5, 6 ]
>>> a[-1]
>>> a[0:3]
>>> a[2:-1]
>>> a[2:]
>>> a[::2]
>>> a[1::2]
>>> a[::-1]
>>> a[1::-1]
>>> a[-1:0:-1]
>>> a[:0:-1]
>>> a[-1:0:-2]

You can see what they actually print out by typing them into the python interpreter in the terminal, but you should write out your guesses first so you can see exactly what you think these statements will do. In doing so, you'll be able to seewhere you're making your mistakes in understanding this topic and then fix them. If you can't access a terminal but you want to continue, the answers are below. Mouse over them to see them.

>>> a = [ 1, 2, 3, 4, 5, 6 ]
>>> a[-1]
6
>>> a[0:3]
[1, 2, 3]
>>> a[2:-1]
[3, 4, 5]
>>> a[2:]
[3, 4, 5, 6]
>>> a[::2]
[1, 3, 5]
>>> a[1::2]
[2, 4, 6]
>>> a[::-1]
[6, 5, 4, 3, 2, 1]
>>> a[1::-1]
[2, 1]
>>> a[-1:0:-1]
[6, 5, 4, 3, 2]
>>> a[:0:-1]
[6, 5, 4, 3, 2]
>>> a[-1:0:-2]
[6, 4, 2]
>>>

You can concatenate lists using list_a + list_b. For example:

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> a + b
[1, 2, 3, 4, 5, 6]
>>>

Lists can also have duplicated elements, like so

>>> a = [1, 1, 2]
>>> a[0]
1
>>> a[1]
1
>>>

Since lists are objects, you can make a list of lists:

>>> a = [ [1, 2], [3, 4] ]
>>> a[0]
[1, 2]
>>> a[0][1]
2
>>> a[1]
[3, 4]
>>> a[1][0]
3
>>>

Lastly, since each element of a list is a variable, lists can hold different types of objects (though doing so likely means you should rethink your approach).

>>> [1, 1.5, 'This is a string.']
[1, 1.5, 'This is a string.']
>>>

Strings

Strings, as in most programming languages, represent text in python. Unlike other languages, there are four ways to declare a string in python:

After you declare a string using one of these four ways, there is no difference between how you use them.

Try typing the following phrases into the terminal to see what I mean:

>>> 'example'
>>> "example"
>>> '''example'''
>>> """example"""

>>> 'There's an error here.'
>>> "There's no error here."
>>> "He said, "This will cause an error.""
>>> 'He said, "This will not cause an error."'

>>> 'He said, "There's an error here."'
>>> "He said, "There's an error here.""
>>> """He said, "There's no error here.""""
>>> '''He said, "There's no error here."'''

>>> 'This statement
...
... will have an error.'
>>> "This statement
...
... will have an error."
>>> """This statement
...
... will not have an error."""
>>> '''This statement
...
... will not have an error.'''
>>>

The empty lines are just to divide the exercises into small chunks and the ... will show up when you type a statement that goes onto the next line. For example, you should try (stop if you have an error) to type 'This statement, hit Enter/Return twice, then type will have an error.'. Try to predict what will happen when you type these statements and then explain the results.

As before, answers are below if you can't type it into your terminal.

First four:

>>> 'example'
'example'
>>> "example"
'example'
>>> '''example'''
'example'
>>> """example"""
'example'

As you can see, all four produce the same output.

Next four:

>>> 'There's an error here.'
  File "<stdin>", line 1
    'There's an error here.'
           ^
SyntaxError: invalid syntax
>>> "There's no error here."
"There's no error here."
>>> "He said, "This will cause an error.""
  File "<stdin>", line 1
    "He said, "This will cause an error.""
                  ^
SyntaxError: invalid syntax
>>> 'He said, "This will not cause an error."'
'He said, "This will not cause an error."'

The first string throws an error because it reaches a single quote, meaning it's reached the end of the string (since the string started with a single quote) and it doesn't know what to do with everything after the second single quote. It's reading the text as ('There')s, where the parentheses represent where strings start and end and it stops reading the line when it hits an error.

The third string throws an error because it reaches a double quote, meaning it's reached the end of the string (since the string started with a double quote) and it doesn't know what to do with everything after the second double quote. Like with the first string, it will read the text as ("He said, ")This and stops reading the line once it hits an error.

Next four:

>>> 'He said, "There's an error here."'
  File "<stdin>", line 1
    'He said, "There's an error here."'
                     ^
SyntaxError: invalid syntax
>>> "He said, "There's an error here.""
  File "<stdin>", line 1
    "He said, "There's an error here.""
                   ^
SyntaxError: invalid syntax
>>> """He said, "There's an error here.""""
  File "<stdin>", line 1
    """He said, "There's an error here.""""
                                          ^
SyntaxError: EOL while scanning string literal
>>> '''He said, "There's no error here."'''
'He said, "There\'s no error here."'

There's an error in the first three because the strings are terminated on the text before the next space immediately after the quote marks. The first two are explained in the last answer, but the third one happens because """" is split up like """ " instead of " """. You'll also notice that the last one prints out the quote mark in the string with a \' before it, which tells the interpreter that the quote mark should not be interpreted as ending the string.

Last four:

>>> 'This statement
  File "<stdin>", line 1
    'This statement
                  ^
SyntaxError: EOL while scanning string literal
>>> "This statement
  File "<stdin>", line 1
    "This statement
                  ^
SyntaxError: EOL while scanning string literal
>>> """This statement
...
... will not have an error."""
'This statement\n\nwill not have an error.'
>>> '''This statement
...
... will not have an error.'''
'This statement\n\nwill not have an error.'
>>>

You can't go onto the next line when using single quote and double quote strings, but you can with three single quotes or three double quotes.

After typing in the last two statements, you'll notice that the outputs seem to stay on the next line but they have two \n's where you hit Enter/Return twice. The \n is a special character that means go onto the next line and will not be displayed if you were to use the print function or you put one in a file. Instead, everything after the \n would move down a line. There are other special characters like \t for tab, but we won't go into them here.

Once you've declared a string, you can access it just like a list, with slicing, accessing individual elements, etc. You cannot, however, modify a string in place.

>>> a = "Test"
>>> a[0]
'T'
>>> a[0:2]
'Te'
>>> a[0] = 't'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

You can, however concatenate strings using + like you would a list:

>>> 'a' + 'b'
'ab'

Tuples

Tuples are immutable, ordered collections of data. They are denoted with parentheses around data:

>>> a = ('Joseph', 20)
>>> a[0]
'Joseph'
>>> a[1]
20
>>> a[1] + 1
21
>>> a[0] = 'Patrick'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Tuples can also be concatenated like strings:

>>> ('Joseph', [1, 2, 3], (1, 2)) + ('Jacob', 1, [1, 2])
('Joseph', [1, 2, 3], (1, 2), 'Jacob', 1, [1, 2])

Dictionaries

Dictionaries are known by many names (associative arrays, hash tables, unordered maps, etc.), but they match a key to a value. For example, let's say we want to associate a person with his/her age. We could use the person's name as a key and their age as a value, like so:

>>> people = { 'Joseph' : 20, 'Jacob' : 21, 'Patrick' : 20 }
>>>

We can access elements of people similar to how we accessed elements of a list, except that instead of using an index, we use the key, like so:

>>> people['Joseph']
20
>>> people['Patrick'] = 21
>>> people
{'Joseph': 20, 'Jacob': 21, 'Patrick': 21}
>>>

As you can see above, we set the value corresponding to 'Patrick' to be 21, which we would need to do when he reaches his 21th birthday. To add elements to a dictionary, we can use the same syntax we used to change the value for the key 'Patrick', namely dictionary[key] = value. For example,

>>> people['Jake'] = 46
>>> people
{'Joseph': 20, 'Jacob': 21, 'Patrick': 21, 'Jake': 46}
>>>

Notice that the elements are in the order in which we added them and not any sorted order.

Exercises

Just to make sure we're on the same page, I'm going to ask a few questions about the properties and uses of various objects in python.

  1. How would you store the text I don't really know what I want to do.?

    Any of the following would work:

    >>> string = "I don't really know what I want to do."
    >>> string = """I don't really know what I want to do."""
    >>> string = '''I don't really know what I want to do.'''
    >>> string = 'I don\'t really know what I want to do.'
    

    Note that the last one escapes the single quote in don't so it can use single quotes to enclose the string.

  2. How would you store the song lyrics exactly as shown below:
    Do you remember when
    Things seemed so eternal?
    Heroes were so real
    Their magic frozen in time
    >>> lyrics = """Do you remember when
    ... Things seemed so eternal?
    ... Heroes were so real
    ... Their magic frozen in time"""
    >>> lyrics = '''Do you remember when
    ... Things seemed so eternal?
    ... Heroes were so real
    ... Their magic frozen in time'''
    >>> lyrics = 'Do you remember when\nThings seemed so eternal?\nHeroes were so real\nTheir magic frozen in time'
    >>> lyrics = "Do you remember when\nThings seemed so eternal?\nHeroes were so real\nTheir magic frozen in time"
    

    The first two use triple quotes to include the newlines, while the last two use the \n escape character.

  3. List all the objects mentioned in this article and whether you can change them after you create them.
    • Lists: Yes
    • Strings: No
    • Tuples: No
    • Dictionaries: Yes

    Numbers are primitive to the point that there is no difference between changing them after you create them and replacing them with new ones.

  4. Assume that you have a program that will read temperature over the course of an hour over equally spaced intervals and then graph the results. What object should you use to store this information and what object should you use to represent temperature?

    First, you should be using floating point numbers to represent temperature since temperature is continuous. You should use a list to store all the information since you aren't working with text (which precludes strings) you need it to be ordered (which precludes dictionaries), and you need it to be mutable (which precludes tuples).

  5. If the temperature were not read over equally spaced intervals, but instead read at random times, what object should you use to store each individual data point (time and temperature)? Assume that the time recorded is in milliseconds after starting the measurement.

    Either two separate lists, with one containing time of measurement and the other containing temperature, or a list of tuples, with each tuple containing a time and a temperature. It's trivial to convert from one to the other, so you should choose the one that works best with your interface.

  6. What object would be best to store data for an event, assuming the event is defined by an event name, date, start time, end time, and place? Don't worry about the specific types of name, date, etc.

    Technically, you can use a list, dictionary, or a tuple, using an implicit ordering for the list and the tuple so that the first element is something like the name, the second element is the date, etc.

    >>> event = [ 'Event Name', date, start, end, place ]
    >>> event = ( 'Event Name', date, start, end, place )
    >>> event = { 'Name' : 'Event Name',
    ... 'Date' : date,
    ... 'Start Time' : start,
    ... 'End Time' : end,
    ... 'Place' : place
    ... }
    ...
    >>>
    

    To me, I would probably use an extension of a tuple known as a named tuple or a class, which allows me to get the name using event.name, but we haven't covered them. With tuples, you have a nice regularity of your data that will allow you to easily map it to a database and back. Plus, it uses the least amount of memory to store data. With dictionaries, you can update the structure of your events (say by adding something like a notes section) without having to update all the events. Plus, it makes it easier to find relevant data by topic. I wouldn't use a list as you end up getting the worst of both worlds.

  7. You are writing a program for a school that keeps track of students by a unique id number, their name, their grades, what year they'll graduate, and their majors. What would you use to store this information such that you can update and read information about a student given an id?

    If you are only going to read and modify data about a student given an id, then you should use a dictionary, with the key being the id and the value being a tuple (preferably a named tuple) with all the other information. It's important to note that as soon as you want something more complicated, such as printing out all the students with a 3.0 GPA or above, you'll probably want to switch to a database.

  8. What object should you use to store words and the number of times each word shows up in some text?

    A dictionary with the key for the word and the count as its value.

What's Next

In the next article, Basic Data Visualization, Pt. 3, we're going to cover Compound Statements, specifically the ones that focus on Control Flow.

A picture of Joseph Mellor, the author.

Joseph Mellor is a Senior at TU majoring in Physics, Computer Science, and Math. He is also the chief editor of the website and the author of the tumd markdown compiler. If you want to see more of his work, check out his personal website.
Credit to Allison Pennybaker for the picture.