>>> (9 ** 2 + 40 ** 2) ** (1 / 2) 41.0 >>>
This is the second article in the series Basic Data Visualization in Python. In this article, we're going to cover the basics of data manipulation in python using the data structures built into the language. If you haven't read the first article yet, I would suggest reading it so you can set up python for this project.
python
Handles Statements
Since How python
Handles Statements is a short but necessary topic, I'm
including it within this article instead of putting it in its own article.
python
Handles Statementspython
will read your statements from top to bottom in the file.
Unlike languages like C
, C++
, and Java
, statements in python
end when
the line ends. Furthermore, instead of using curly braces, python
uses
indentation and a colon to indicate which statements are grouped together.
Statements in C
, C++
, and Java
:
statement; control_statement (data) { statement; statement; ... }
Statements in python
:
statement compound_statement (data): statement statement ...
We'll go into more detail for the compound_statement
syntax later in the
next article.
I encourage you to use the python
interpreter to experiment with any of these
concepts that I'll introduce in this article. You can start it up using the
command python3
in the terminal (Terminal/Terminal Emulator on Linux and Mac
and the Ubuntu App if you're using the Windows Subsystem for Linux. Read the linked article for how
to set up the Windows Subsystem for Linux) like so:
user@comp:~/dev/py_data_vis$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The next three lines after you type Enter/Return are just version information
about which python you're using. As long as you see a >>>
,
you're probably fine. You can then type any expression and see the result pop
up. For example, if you were on MacOS and you wanted to see the result of 3 +
2
, you would type
comp:py_data_vis user$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on mac
Type "help", "copyright", "credits" or "license" for more information.
>>> 3 + 2
5
If you want to see the value of a variable, you can just type the name of the variable like so:
user@comp:~/dev/py_data_vis$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3 + 2
>>> a
5
Once you're finished experimenting, exit using Ctrl + D
to return to the
terminal.
Unlike languages like C
, C++
, and Java
, there are no explicitly declared
types. Instead (if you'll excuse the analogy), each variable is like a large box
that can hold anything of any type inside it, where the box is completely
separate from its contents. Objects can have types, but the variables themselves
do not. When you use a variable in your code, the python
interpreter will take
whatever object is stored in the variable and do what the code says to do.
You can assign variables using the syntax [variable] = [Object]
, like so:
a = 7 b = "Test" c = 1.3 a = "Test 2"
If •
represents an arbitrary operation (such as +
, -
, *
, etc.,
which we'll discuss more in the next section), then a •= b
is equivalent
to a = a • b
. For example,
comp:py_data_vis user$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 4.8.2] on mac
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> a += 6
>>> a
11
since a += 6
would expand out to a = a + 6
⇒ a = 5 + 6
⇒ a =
11
.
As in most programming languages, there are two types of numbers: integers and floating point numbers. Integers are the the numbers {0, 1, 2, 3, ...} and their negatives and they convert directly into Two's Complement Integers. Unlike other languages, they don't have a fixed width, meaning you don't have to worry about integer overflow. Floating point numbers are built into the hardware, so they generally follow the IEEE-754 floating point precision format, specifically the Double-Precision Floating-Point Format. Floating point numbers are numbers written in scientific notation (1.23400 × 103, but in base two. You have ~16 decimal digits of precision and you can represent numbers from ~10-308 to 10308.
If you would not describe something using scientific notation, you probably should use an integer. To elaborate, if a value is discrete, like the number of elements in a list, the number of people in a store, or the amount of money you have (you can't have half a cent and you wouldn't say "I have $1.23400 × 103 in my bank account), use an integer. If a value is continuous, like the amount of water you have in a bottle, the average height of all the people in a store, or the exchange rates between currencies, use a floating point number.
You can make an integer by declaring a number without a decimal point and a floating point number by declaring a number with a decimal point, like so:
integer = 1 floating_point = 1.
As you would expect, you can add, subtract, multiply, divide, and use parentheses to group numbers, but you can also take the remainder of a division, raise them to powers, and perform integer division (rounds to the integer closest to zero), like so
addition_example = 1 + 2 # 1 + 2 -> 3 subtraction_example = 1 - 2 # 1 - 2 -> -1 multiplication_example = 2 * 3 # 2 * 3 -> 6 division_example = 1 / 2 # 1 / 2 -> 0.5 remainder_example = 20 % 7 # 20 % 7 -> 6 square_example = 5 ** 2 # 5 ** 2 -> 25 cube_example = 5 ** 3 # 5 ** 3 -> 125 square_root_example = 25 ** (1 / 2) # 25 ** (1 / 2) -> 5.0 integer_division_example = 1 // 2 # 1 // 2 -> 0 parentheses_example = (1 + 2) / 2 # (1 + 2) / 2 -> 1.5
If you want, you can experiment with this after this article to learn more. Try
using the python
interpreter (instructions are found in the section Experimenting above) to answer the following
questions (mouse over the rectangles below to see the solutions):
python
code.
>>> (9 ** 2 + 40 ** 2) ** (1 / 2) 41.0 >>>
>>> toys_per_school = 200 // 14 >>> toys_left_over = 200 % 14 >>> toys_per_school 14 >>> toys_left_over 4 >>>
Note that we used integer division (//
) because we wanted an integer number of
toys and we wanted to round to zero.
10.5 / 1.25
, 10.5 // 1.25
, and 10.5 % 1.25
, and
explain the reasons for their outputs.
>>> 10.5 // 1.25 8.0 >>> 10.5 % 1.25 0.5 >>> 10.5 / 1.25 8.4
10.5 / 1.25
is exactly what you would get from just dividing the numbers,
10.5 // 1.25
rounds the result of 10.5 / 1.25
towards zero, and 10.5 %
1.25
is 10.5 - (10.5 // 1.25) * 1.25
, or the remainder when you divide 10.5
by an integer multiple of 1.25
.
Lists in python are modifiable ordered collections of objects of
modifiable length. You can create a list using [e0, e1, e2,...]
, and you
access elements from that list using a[i]
, where a
is the list and i
is
the number of elements after the first element. You can treat a[i]
just like a
variable. For example,
>>> a = [ 1, 2, 3 ] >>> a[0] 1 >>> a[1] 2 >>> a[0] = 7 >>> a[0] 7 >>> a [ 7, 2, 3 ]
As you can see, we created a list with elements {1, 2, 3}, printed out the
first two, and changed the first element using a[0] = 7
. We can also go backwards through the list using
a[i]
, with i
being negative. The last element is -1
, the second to
last element is -2
, etc. Lastly, we can take sections of a list (known as
list slicing) using a[start:stop:step]
, where start
is the first
element we want, stop
is the element after the last element we want, and
step
is how many elements we want to go over. The easiest way to see this in
action is to see a bunch of examples, so try to guess what the following
statements will print out:
>>> a = [ 1, 2, 3, 4, 5, 6 ] >>> a[-1] >>> a[0:3] >>> a[2:-1] >>> a[2:] >>> a[::2] >>> a[1::2] >>> a[::-1] >>> a[1::-1] >>> a[-1:0:-1] >>> a[:0:-1] >>> a[-1:0:-2]
You can see what they actually print out by typing them into the python
interpreter in the terminal, but you should write out your guesses first so
you can see exactly what you think these statements will do. In doing so,
you'll be able to seewhere you're making your mistakes in understanding this
topic and then fix them. If you can't access a terminal but you want to
continue, the answers are below. Mouse over them to see them.
>>> a = [ 1, 2, 3, 4, 5, 6 ] >>> a[-1] 6 >>> a[0:3] [1, 2, 3] >>> a[2:-1] [3, 4, 5] >>> a[2:] [3, 4, 5, 6] >>> a[::2] [1, 3, 5] >>> a[1::2] [2, 4, 6] >>> a[::-1] [6, 5, 4, 3, 2, 1] >>> a[1::-1] [2, 1] >>> a[-1:0:-1] [6, 5, 4, 3, 2] >>> a[:0:-1] [6, 5, 4, 3, 2] >>> a[-1:0:-2] [6, 4, 2] >>>
You can concatenate lists using list_a + list_b
. For example:
>>> a = [1, 2, 3] >>> b = [4, 5, 6] >>> a + b [1, 2, 3, 4, 5, 6] >>>
Lists can also have duplicated elements, like so
>>> a = [1, 1, 2] >>> a[0] 1 >>> a[1] 1 >>>
Since lists are objects, you can make a list of lists:
>>> a = [ [1, 2], [3, 4] ] >>> a[0] [1, 2] >>> a[0][1] 2 >>> a[1] [3, 4] >>> a[1][0] 3 >>>
Lastly, since each element of a list is a variable, lists can hold different types of objects (though doing so likely means you should rethink your approach).
>>> [1, 1.5, 'This is a string.'] [1, 1.5, 'This is a string.'] >>>
Strings, as in most programming languages, represent text in python
. Unlike
other languages, there are four ways to declare a string in python
:
'Example'
): Allow you to use double quotes inside a
string without escaping them.
"Example"
): Allow you to use single quotes inside a
string without escaping them.
'''Example'''
): Allow you to use single quotes,
double quotes, and newlines (what happens when you hit Enter/Return on your
computer) without escaping them.
"""Example"""
): Allow you to use single quotes,
double quotes, and newlines (what happens when you hit Enter/Return on your
computer) without escaping them.
After you declare a string using one of these four ways, there is no difference between how you use them.
Try typing the following phrases into the terminal to see what I mean:
>>> 'example' >>> "example" >>> '''example''' >>> """example""" >>> 'There's an error here.' >>> "There's no error here." >>> "He said, "This will cause an error."" >>> 'He said, "This will not cause an error."' >>> 'He said, "There's an error here."' >>> "He said, "There's an error here."" >>> """He said, "There's no error here."""" >>> '''He said, "There's no error here."''' >>> 'This statement ... ... will have an error.' >>> "This statement ... ... will have an error." >>> """This statement ... ... will not have an error.""" >>> '''This statement ... ... will not have an error.''' >>>
The empty lines are just to divide the exercises into small chunks and the ...
will show up when you type a statement that goes onto the next line. For
example, you should try (stop if you have an error) to type 'This statement
,
hit Enter/Return twice, then type will have an error.'
. Try to predict what
will happen when you type these statements and then explain the results.
As before, answers are below if you can't type it into your terminal.
First four:
>>> 'example' 'example' >>> "example" 'example' >>> '''example''' 'example' >>> """example""" 'example'
As you can see, all four produce the same output.
Next four:
>>> 'There's an error here.' File "<stdin>", line 1 'There's an error here.' ^ SyntaxError: invalid syntax >>> "There's no error here." "There's no error here." >>> "He said, "This will cause an error."" File "<stdin>", line 1 "He said, "This will cause an error."" ^ SyntaxError: invalid syntax >>> 'He said, "This will not cause an error."' 'He said, "This will not cause an error."'
The first string throws an error because it reaches a single quote, meaning it's
reached the end of the string (since the string started with a single quote) and
it doesn't know what to do with everything after the second single quote. It's
reading the text as ('There')s
, where the parentheses represent where strings
start and end and it stops reading the line when it hits an error.
The third string throws an error because it reaches a double quote, meaning it's
reached the end of the string (since the string started with a double quote) and
it doesn't know what to do with everything after the second double quote. Like
with the first string, it will read the text as ("He said, ")This
and stops
reading the line once it hits an error.
Next four:
>>> 'He said, "There's an error here."' File "<stdin>", line 1 'He said, "There's an error here."' ^ SyntaxError: invalid syntax >>> "He said, "There's an error here."" File "<stdin>", line 1 "He said, "There's an error here."" ^ SyntaxError: invalid syntax >>> """He said, "There's an error here."""" File "<stdin>", line 1 """He said, "There's an error here."""" ^ SyntaxError: EOL while scanning string literal >>> '''He said, "There's no error here."''' 'He said, "There\'s no error here."'
There's an error in the first three because the strings are terminated on the
text before the next space immediately after the quote marks. The first two are
explained in the last answer, but the third one happens because """"
is split
up like """ "
instead of " """
. You'll also notice that the last one prints
out the quote mark in the string with a \'
before it, which tells the
interpreter that the quote mark should not be interpreted as ending the string.
Last four:
>>> 'This statement File "<stdin>", line 1 'This statement ^ SyntaxError: EOL while scanning string literal >>> "This statement File "<stdin>", line 1 "This statement ^ SyntaxError: EOL while scanning string literal >>> """This statement ... ... will not have an error.""" 'This statement\n\nwill not have an error.' >>> '''This statement ... ... will not have an error.''' 'This statement\n\nwill not have an error.' >>>
You can't go onto the next line when using single quote and double quote strings, but you can with three single quotes or three double quotes.
After typing in the last two statements, you'll notice that the outputs seem to
stay on the next line but they have two \n
's where you hit Enter/Return twice.
The \n
is a special character that means go onto the next line and will not be
displayed if you were to use the print
function or you put one in a file.
Instead, everything after the \n
would move down a line. There are other
special characters like \t
for tab, but we won't go into them here.
Once you've declared a string, you can access it just like a list, with slicing, accessing individual elements, etc. You cannot, however, modify a string in place.
>>> a = "Test" >>> a[0] 'T' >>> a[0:2] 'Te' >>> a[0] = 't' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment
You can, however concatenate strings using +
like you would a list:
>>> 'a' + 'b' 'ab'
Tuples are immutable, ordered collections of data. They are denoted with parentheses around data:
>>> a = ('Joseph', 20) >>> a[0] 'Joseph' >>> a[1] 20 >>> a[1] + 1 21 >>> a[0] = 'Patrick' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment
Tuples can also be concatenated like strings:
>>> ('Joseph', [1, 2, 3], (1, 2)) + ('Jacob', 1, [1, 2]) ('Joseph', [1, 2, 3], (1, 2), 'Jacob', 1, [1, 2])
Dictionaries are known by many names (associative arrays, hash tables, unordered maps, etc.), but they match a key to a value. For example, let's say we want to associate a person with his/her age. We could use the person's name as a key and their age as a value, like so:
>>> people = { 'Joseph' : 20, 'Jacob' : 21, 'Patrick' : 20 } >>>
We can access elements of people
similar to how we accessed elements of a
list, except that instead of using an index, we use the key, like so:
>>> people['Joseph'] 20 >>> people['Patrick'] = 21 >>> people {'Joseph': 20, 'Jacob': 21, 'Patrick': 21} >>>
As you can see above, we set the value corresponding to 'Patrick'
to be 21
,
which we would need to do when he reaches his 21th birthday. To add
elements to a dictionary, we can use the same syntax we used to change the value
for the key 'Patrick'
, namely dictionary[key] = value
. For example,
>>> people['Jake'] = 46 >>> people {'Joseph': 20, 'Jacob': 21, 'Patrick': 21, 'Jake': 46} >>>
Notice that the elements are in the order in which we added them and not any sorted order.
Just to make sure we're on the same page, I'm going to ask a few questions about the properties and uses of various objects in python.
I don't really know what I want to do.
?
Any of the following would work:
>>> string = "I don't really know what I want to do." >>> string = """I don't really know what I want to do.""" >>> string = '''I don't really know what I want to do.''' >>> string = 'I don\'t really know what I want to do.'
Note that the last one escapes the single quote in don't
so it can use single
quotes to enclose the string.
Do you remember when
Things seemed so eternal?
Heroes were so real
Their magic frozen in time
>>> lyrics = """Do you remember when ... Things seemed so eternal? ... Heroes were so real ... Their magic frozen in time""" >>> lyrics = '''Do you remember when ... Things seemed so eternal? ... Heroes were so real ... Their magic frozen in time''' >>> lyrics = 'Do you remember when\nThings seemed so eternal?\nHeroes were so real\nTheir magic frozen in time' >>> lyrics = "Do you remember when\nThings seemed so eternal?\nHeroes were so real\nTheir magic frozen in time"
The first two use triple quotes to include the newlines, while the last two use
the \n
escape character.
Numbers are primitive to the point that there is no difference between changing them after you create them and replacing them with new ones.
First, you should be using floating point numbers to represent temperature since temperature is continuous. You should use a list to store all the information since you aren't working with text (which precludes strings) you need it to be ordered (which precludes dictionaries), and you need it to be mutable (which precludes tuples).
Either two separate lists, with one containing time of measurement and the other containing temperature, or a list of tuples, with each tuple containing a time and a temperature. It's trivial to convert from one to the other, so you should choose the one that works best with your interface.
Technically, you can use a list, dictionary, or a tuple, using an implicit ordering for the list and the tuple so that the first element is something like the name, the second element is the date, etc.
>>> event = [ 'Event Name', date, start, end, place ] >>> event = ( 'Event Name', date, start, end, place ) >>> event = { 'Name' : 'Event Name', ... 'Date' : date, ... 'Start Time' : start, ... 'End Time' : end, ... 'Place' : place ... } ... >>>
To me, I would probably use an extension of a tuple known as a named
tuple or a class, which allows me to get the name using event.name
,
but we haven't covered them. With tuples, you have a nice regularity of your
data that will allow you to easily map it to a database and back. Plus, it uses
the least amount of memory to store data. With dictionaries, you can update the
structure of your events (say by adding something like a notes section) without
having to update all the events. Plus, it makes it easier to find relevant data
by topic. I wouldn't use a list as you end up getting the worst of both worlds.
If you are only going to read and modify data about a student given an id, then you should use a dictionary, with the key being the id and the value being a tuple (preferably a named tuple) with all the other information. It's important to note that as soon as you want something more complicated, such as printing out all the students with a 3.0 GPA or above, you'll probably want to switch to a database.
A dictionary with the key for the word and the count as its value.
In the next article, Basic Data Visualization, Pt. 3, we're going to cover Compound Statements, specifically the ones that focus on Control Flow.