Python reference and notes
My first contact with python was seeing python 3.0.1 announcement in the Croatian IT magazine "Bug", however I started with Python 2.7 and worked a lot with 2.4 at the time. It wasn't until 2015 that I completely stopped using Python < 3. ![Python 3.0.1 announcement in the BUG magazine, April 2009](https://tom.kucar.org/files/img/python_origins.png) Python 3.0 was released in 2008 and Python 2 got to live to version 2.7.18, which was then finally discontinued in 2020. Python 2 was also deprecated in Debian 11. `print "Long live py2!"` # Learning python List of resources relevant to a learner. Updated Nov 2024 **Intro** Following resources are all freely available to read online. - [Official documentation](https://docs.python.org) (start with the Tutorial) - [Hitch hikers guide to python](https://docs.python-guide.org/) - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) 2nd ed - [Byte of python](https://python.swaroopch.com/) - [Python for Everybody](https://www.py4e.com/) (also available on [trinket](https://books.trinket.io/pfe/index.html)) - [Think Python](https://allendowney.github.io/ThinkPython/) (introduction to Python for people who have never programmed before) - [CS50 python](https://cs50.harvard.edu/python/2022/) **Advanced** - **Python Distilled** (2021) - essential core of the language, with code examples to illuminate how Python works and how to structure programs [3.6] - **Learning Python 6th Edition** (2025) - comprehensive, in-depth introduction to the core Python language ~ 1400pg, [3.12] - **Fluent python 2nd ed** (2022) - hands on guide to writing idiomatic Python 3 [3.10] - **Effective python 3rd ed** (2024) - “Pythonic” way of programming, write code that is effective, elegant and maintainable [3.13] - **Serious Python** (2018) - covers range of advanced topics (multithreading, designing APIs, dealing with databases, Python internals...) - **Python in a Nutshell** 4th ed (2023) - **Programming python 4th** ed (2011) - hard, a full-scale applications programming tutorial 1630 pages [3.2] (outdated) - **Practices of the Python Pro** (2019) - design and write professional-quality software. Examples & exercises ~ 250pg **Other** - [Python Morsels](https://www.pythonmorsels.com/) (online exercises) - **How to make mistakes in python** (outdated, but good concepts) - **Python Testing with pytest** (dated, are there newer alternatives?) - **Obey the Testing Goat!** (outdated, but good concepts) - **SQLAlchemy 2 In Practice** (2023) - Flaskr TDD (2023) - Intro to Flask, Test-Driven Development, and JavaScript [3.12] - **Flask web development 2nd ed** (2018) is much shorter, intermediate level, and more focused on flask on which it goes into detail compared to: - [The Flask mega tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world) (2024) which is bigger and includes many topics (such as full text search, ajax, javascript, localization and internationalization) and is intended for beginners. # Types and Operations ```python # Hello world program print("Hello, world!") ``` ## Comments ```python # Single line comments start with a number symbol. """ Multiline strings can be written using three "s, and are often used as documentation. """ ``` ## Main data types ```python boolean = True / False integer = 10 float = 10.01 string = "123abc" list = [ value1, value2, … ] dictionary = { key1:value1, key2:value2, …} ``` ## Operators ```python + addition - subtraction * multiplication / division ** exponent % modulus // floor division # Result of integer division truncated down both for positive and negative. 5 // 3 # => 1 5.0 // 3.0 # => 1.0 # works on floats too -5 // 3 # => -2 -5.0 // 3.0 # => -2.0 # The result of division is always a float 10.0 / 3 # => 3.3333333333333335 (9 / 3), (9.0 / 3), (9 // 3), (9 // 3.0) # => (3.0, 3.0, 3, 3.0) ``` Bitwise Operations ```python # As a rule of thumb, if you find yourself wanting to flip bits in # Python, you should think about which language you’re really coding. x = 1 # 1 decimal is 0001 in bits x << 2 # Shift left 2 bits: 0100 => 4 x | 2 # Bitwise OR (either bit=1): 0011 => 3 x & 1 # Bitwise AND (both bits=1): 0001 => 1 # bit strings and hex literals, Learning python, 5th ed, page 154 ``` ## Comparison & booleans: ```python == equal != different > higher < lower >= higher or equal <= lower or equal # Comparisons can be chained! 1 < 2 < 3 # => True 2 < 3 < 2 # => False and logical AND or logical OR not logical NOT # (is vs. ==) is checks if two variables refer to the same object, but == checks # if the objects pointed to have the same values. a = [1, 2, 3, 4] # Point a at a new list, [1, 2, 3, 4] b = a # Point b at what a is pointing to b is a # => True, a and b refer to the same object b == a # => True, a's and b's objects are equal b = [1, 2, 3, 4] # Point b at a new list, [1, 2, 3, 4] b is a # => False, a and b do not refer to the same object b == a # => True, a's and b's objects are equal # None is an object None # => None # Don't use the equality "==" symbol to compare objects to None # Use "is" instead. This checks for equality of object identity. "etc" is None # => False None is None # => True # None, 0, and empty strings/lists/dicts/tuples all evaluate to False. # All other values are True bool(0) # => False bool("") # => False bool([]) # => False bool({}) # => False bool(()) # => False ``` ## Variables ```python # There are no declarations, only assignments. # Convention is to use lower_case_with_underscores some_var = 5 some_var # => 5 # Accessing a previously unassigned variable is an exception. # See Control Flow to learn more about exception handling. some_unknown_var # Raises a NameError # if can be used as an expression # Equivalent of C's '?:' ternary operator "yahoo!" if 3 > 2 else 2 # => "yahoo!" ``` ## Special characters ``` # comment \ multiline commands joining \n Newline \\ Backslash (\) \' Single quote (') \" Double quote (") \a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \v ASCII Vertical Tab (VT) \ooo Character with octal value ooo (1,3) \xhh Character with hex value hh ``` ## Console input/output ```python print([object, ...][, sep=' '][, end='\n'][, file=sys.stdout][, flush=False]) print("hi") # => hi print("there are", 365, "days in year.") # => there are 365 days in year. # By default the print function also prints out a newline at the end. # Use the optional argument end to change the end string. print("Hello, World", end="!") # => Hello, World! print("hmm", end=' ') # output space instead of newline lst = [1,2,3] print(*lst, sep='|') # print list elements seperated by | # => 1|2|3 # Simple way to get input data from console, returns data as string # Note: In earlier versions of Python, input() method was named as raw_input() name = input("enter your name: ") num = int(input("enter a number: ")) print('-' * 80) # 80 dashes #print to file print('Hello World', file=open('file.txt', 'w')) ``` ## Strings ```python 'hm' # single quoted "hmmm" # double quoted '''multi line''' # multi line string r'\temp\spam' # raw string (no escapes) b'sp\xc4m' # byte string u'sp\u00c4m' # Unicode string len('hello') # string length => 5 msg = 'hello' if msg < 'n': print('a-m') # strings are compared character at a time... else: print ('n-z') # ...in lexicographic order. => a-m 'e' in msg # returns boolean => True ## Sequence Operations 'hello' + 'world' # concentrate strings => helloworld 'Tom'*2 # multiply string by integer => TomTom msg[-1] # The last item in msg => 'o' msg[len(S)-1] # Negative indexing, the hard way => 'o' msg[:] # All of S as a top-level copy (0:len(S)) => hello # A string can be treated like a list of characters "test"[2] # => s string = "This is a string" string[0] # => 'T' string[0:4] # => 'This' temp = ['s','p','a','m'] ",".join(temp) # => 's,p,a,m' spam = "spam" spam[1:3], spam[1:], spam[:-1] # => ('pa', 'pam', 'spa') spam = spam + 'Burger' # To change a string, make a new one => SpamBurger S = 'abcdefghijklmnop' S[1:10:2] # => 'bdfhj' S[::2] # => 'acegikmo' # reverse a sequence "hello"[::−1] # => 'olleh' ## Immutability # Strings are immutable. string[0] = 'z' # Immutable objects cannot be changed ...error text omitted... TypeError: 'str' object does not support item assignment string = 'z' + S[1:] string # => 'zpam' # Changing text in place using list ## S = 'shrubbery' L = list(S) # Expand to a list: [...] L # => ['s', 'h', 'r', 'u', 'b', 'b', 'e', 'r', 'y'] L[1] = 'c' # Change it in place ''.join(L) # Join with empty delimiter => 'scrubbery' # Changing text in place using bytearray ## B = bytearray(b'spam') # A bytes/list hybrid (ahead) B.extend(b'eggs') # 'b' needed in 3.X, not 2.X B # B[i] = ord(c) works here too bytearray(b'spameggs') B.decode() # Translate to normal string => 'spameggs' ## Type-Specific Methods string.find('is') # Find the offset of a substring => 2 string.replace(" ", "-") # => 'This-is-a-string' string # => 'This is a string' line = 'aaa,bbb,ccccc,dd' # Split on a delimiter into a list of substrings line.split(',') # => ['aaa', 'bbb', 'ccccc', 'dd'] S = 'spam' # Upper- and lowercase conversions S.upper() # => 'SPAM' # Content tests: isalpha, isdigit, etc. S.isalpha() # => True line = 'aaa,bbb,ccccc,dd\n' # Remove whitespace characters on the right side line.rstrip() # => 'aaa,bbb,ccccc,dd' # Combine two operations line.rstrip().split(',') # => ['aaa', 'bbb', 'ccccc', 'dd'] # Formating ## "hello {}".format('world') # => hello world # You can repeat the formatting arguments to save some typing. "{0} be nimble, {0} be quick, {0} jump over the {1}".format("Jack", "candle stick") # => "Jack be nimble, Jack be quick, Jack jump over the candle stick" # You can use keywords if you don't want to count. "{name} wants to eat {food}".format(name="Bob", food="lasagna") # => "Bob wants to eat lasagna" # Separators, decimal digits '{:,.2f}'.format(296999.2567) # => '296,999.26' # Digits, padding, signs '%.2f | %+05d' % (3.14159, −42) # => '3.14 | −0042' # Dictionary based formating reply = """ Greetings... Hello %(name)s! Your age is %(age)s """ values = {'name': 'Bob', 'age': 40} print(reply % values) # Greetings... # Hello Bob! # Your age is 40 import sys 'My {1[kind]} runs {0.platform}'.format(sys, {'kind': 'pc'}) # => 'My pc runs linux' '{.platform:>10} = {[kind]:<10}'.format(sys, dict(kind='laptop')) # => ' win32 = laptop ' data = dict(platform=sys.platform, kind='laptop') 'My {kind:<8} runs {platform:>8}'.format(**data) # => 'My laptop runs linux' '{0:,d}'.format(999999999999) # => '999,999,999,999' '{:,.2f}'.format(296999.2567) # => '296,999.26' from formats import commas, money '%s' % money(296999.2567) # => '$296,999.26' [commas(x) for x in (9999999, 8888888)] # => ['9,999,999', '8,888,888'] ## Getting help dir(string) # => ['__add__', '__class__', ... 'upper', 'zfill'] help(sting.replace) #=> Help on built-in function replace:\n replace(...)\n ... ``` ## Lists ```python ## Sequence Operations li = [] # Lists store sequences other_li = [4, 5, 6] # You can start with a prefilled list # Access a list like you would any array li[0] # Look at the last element li[-1] # Looking out of bounds is an IndexError li[4] # Raises an IndexError # You can look at ranges with slice syntax. # The start index is included, the end index is not # (It's a closed/open range for you mathy types.) li = [2,3,4,5] li[1:3] # => [3, 4] li[2:] # Omit the beginning and return the list li[:3] # Omit the end and return the list li[:-1] # return all but the last element li[::2] # Select every second entry li[::-1] # Return a reversed copy of the list # Use any combination of these to make advanced slices li[start:end:step] # Make a one layer deep copy using slices li2 = li[:] # (li2 is li) will result in false. li[i:j] = otherlist # replace ith to jth element with otherlist L = ['spam', 'Spam', 'SPAM!'] L[0:2] = ['eat', 'more'] L # => ['eat', 'more', 'SPAM!'] ## Nesting M = [[1, 2, 3], # A 3 × 3 matrix, as nested lists [4, 5, 6], # Code can span lines if bracketed [7, 8, 9]] M # => [[1, 2, 3], [4, 5, 6], [7, 8, 9]] M[1] # Get row 2 # => [4, 5, 6] M[1][2] # Get row 2, then get item 3 within the row # => 6 ## Type Specific Operations # Add stuff to the end of a list with append li = [] li.append(1) # li is now [1] li.append(2) # li is now [1, 2] li.append(3) # li is now [1, 2, 3] # Remove from the end with pop li.pop() # => 3 and li is now [1, 2] del li[1] # remove 1 del li[i:j] # delete range # Remove first occurrence of a value li.remove(2) # li is now [1, 3] li.remove(2) # Raises a ValueError as 2 is not in the list # Insert an element at a specific index li.insert(1, 2) # li is now [1, 2, 3] again # Get the index of the first item found matching the argument li.index(2) # => 1 li.index(4) # Raises a ValueError as 4 is not in the list # You can add lists # Note: values for li and for other_li are not modified. li + other_li # => [1, 2, 3, 4, 5, 6] # Concatenate lists with "extend()" li.extend(other_li) # Now li is [1, 2, 3, 4, 5, 6] len(li) # Examine the length with "len()" set(li) # remove duplicate elements from a list li.sort() # sort list items li.clear() # removes all items from the list li.reverse() li.count(item) sum(li) zip(list1, list2) # zip is a container, returns iterator sorted(li) # returns sorted list but doesn't change list itself ",".join(li) # join list elements with , # Check for existence in a list with "in" 1 in li # => True ``` ## Dictionaries ```python # Dictionaries store mappings from keys to values empty_dict = {} # Here is a prefilled dictionary filled_dict = {"one": 1, "two": 2, "three": 3} bob1 = dict(name='Bob', job='dev', age=40) # Keywords bob1 # => {'age': 40, 'name': 'Bob', 'job': 'dev'} bob2 = dict(zip(['name', 'job', 'age'], ['Bob', 'dev', 40])) # Zipping bob2 # => {'job': 'dev', 'name': 'Bob', 'age': 40} # Adding to a dictionary filled_dict.update({"four":4}) # => {"one": 1, "two": 2, "three": 3, "four": 4} filled_dict["four"] = 4 # another way to add to dict filled_dict # => {'two': 2, 'three': 3, 'four': 4, 'one': 1} # From Python 3.5 you can also use the additional unpacking options {'a': 1, **{'b': 2}} # => {'a': 1, 'b': 2} {'a': 1, **{'a': 2}} # => {'a': 2} # Note keys for dictionaries have to be immutable types. This is to ensure that # the key can be converted to a constant hash value for quick look-ups. # Immutable types include ints, floats, strings, tuples. invalid_dict = {[1,2,3]: "123"} # => Raises a TypeError: unhashable type: 'list' valid_dict = {(1,2,3):[1,2,3]} # Values can be of any type, however. # Look up values with [] # works for keys, throws KeyError for values filled_dict["one"] # => 1 # Use "get()" method to avoid the KeyError filled_dict.get("one") # => 1 filled_dict.get("four") # => None # The get method supports a default argument when the value is missing filled_dict.get("one", 4) # => 1 filled_dict.get("four", 4) # => 4 # Get all keys as an iterable with "keys()". We need to wrap the call in list() # to turn it into a list. We'll talk about those later. Note - Dictionary key # ordering is not guaranteed. Your results might not match this exactly. list(filled_dict.keys()) # => ["three", "two", "one"] # Get all values as an iterable with "values()". Once again we need to wrap it # in list() to get it out of the iterable. Note - Same as above regarding key ordering. list(filled_dict.values()) # => [3, 2, 1] # Check for existence of keys in a dictionary with "in" "one" in filled_dict # => True 1 in filled_dict # => False # "setdefault()" inserts into a dictionary only if the given key isn't present filled_dict.setdefault("five", 5) # filled_dict["five"] is set to 5 filled_dict.setdefault("five", 6) # filled_dict["five"] is still 5 # Remove keys from a dictionary with del del filled_dict["one"] # Removes the key "one" from filled dict # pop() removes the item associated to the key and returns its value filled_dict.pop("four") # => 4 # sorting D = {'a': 1, 'b': 2, 'c': 3} KS = list(D.keys()) # Unordered keys list KS # => ['c', 'a', 'b'] KS.sort() # sorted keys list KS # => ['a', 'b', 'c'] for key in sorted(D): print(key, '=>', D[key]) # a => 1 # b => 2 # c => 3 dict.items() # returns a list of pairs (key,value) dict.clear() # removes all keys-values from the dictionary dict.copy() # returns a copy of the dictionary ``` ## Tuples ```python # Tuples are like lists but are immutable. tup = (1, 2, 3) tup[0] # => 1 tup[0] = 3 # Raises a TypeError # Note that a tuple of length one has to have a comma after the last element but # tuples of other lengths, even zero, do not. type((1)) # => type((1,)) # => type(()) # => # Tuples also have type-specific callable methods tup.index(2) # => 2; 2 appears at offset 1 tup.count(1) # => 1; 1 appears only once # You can do most of the list operations on tuples too len(tup) # => 3 tup + (4, 5, 6) # => (1, 2, 3, 4, 5, 6) tup[:2] # => (1, 2) 2 in tup # => True # Like lists and dictionaries, tuples support mixed types and nesting, # but they don’t grow and shrink because they are immutable T = 'spam', 3.0, [11, 22, 33] T[1] # => 3 T[2][1] # => 22 T.append(4) # => AttributeError: 'tuple' object has no attribute 'append' # sorting T = ('cc', 'aa', 'dd', 'bb') tmp = list(T) tmp.sort() T = tuple(tmp) R # => ('aa', 'bb', 'cc', 'dd') sorted(T) # Or use the sorted built-in, and save two steps # => ['aa', 'bb', 'cc', 'dd'] # You can unpack tuples (or lists) into variables a, b, c = (1, 2, 3) # a is now 1, b is now 2 and c is now 3 # You can also do extended unpacking a, *b, c = (1, 2, 3, 4) # a is now 1, b is now [2, 3] and c is now 4 # Tuples are created by default if you leave out the parentheses d, e, f = 4, 5, 6 # Now look how easy it is to swap two values e, d = d, e # d is now 5 and e is now 4 # named tuples - tuple/class/dictionary hybrid from collections import namedtuple # Import extension type Rec = namedtuple('Rec', ['name', 'age', 'jobs']) # Make a generated class bob = Rec('Bob', age=40.5, jobs=['dev', 'mgr']) # A named-tuple record bob # => Rec(name='Bob', age=40.5, jobs=['dev', 'mgr']) bob[0], bob.jobs # Access by position and attribute respectively # => ('Bob', ['dev', 'mgr']) O = bob._asdict() # Dictionary-like form O['name'], O['jobs'] # Access by key too => ('Bob', ['dev', 'mgr']) 0 # => OrderedDict([('name', 'Bob'), ('age', 40.5), ('jobs', ['dev', 'mgr'])]) ``` ## Files ```python # Writhing to file f = open('data.txt', 'w') # Make a new file in output mode ('w' is write) f.write('Hello\n') # => 6 Return number of items written in Python 3.X f.write('world\n') f.close() #lClose to flush output buffers to disk # Reading file f = open('data.txt') # 'r' (read) is the default processing mode text = f.read() # Read entire file into a string text # => 'Hello\nworld\n' print(text) # print interprets control characters # Hello # world text.split() # File content is always a string # => ['Hello', 'world'] # File Context Managers for line in open('data.txt'): print(line) # safer way to deal with files with open(r'C:\code\data.txt') as myfile: # safer way to deal with files for line in myfile: ...use line here... # CVS import csv rdr = csv.reader(open('csvdata.txt')) for row in rdr: print(row) # ... # ['a', 'bbb', 'cc', 'dddd'] # ['11', '22', '33', '44'] # Pickle: storing native python objects D = {'a': 1, 'b': 2} F = open('datafile.pkl', 'wb') import pickle pickle.dump(D, F) # Pickle any object to file F.close() F = open('datafile.pkl', 'rb') E = pickle.load(F) # Load any object from file E # => {'a': 1, 'b': 2} # Storing objects in JSON name = dict(first='Bob', last='Smith') rec = dict(name=name, job=['dev', 'mgr'], age=40.5) rec # => {'job': ['dev', 'mgr'], 'name': {'last': 'Smith', 'first': 'Bob'}, 'age': 40.5} import json json.dump(rec, fp=open('testjson.txt', 'w'), indent=4) print(open('testjson.txt').read()) #{ # "job": [ # "dev", . . . P = json.load(open('testjson.txt')) P # => {'job': ['dev', 'mgr'], 'name': {'last': 'Smith', 'first': 'Bob'}, 'age': 40.5} # Storing Packed Binary Data: struct # packing F = open('data.bin', 'wb') # Open binary output file import struct data = struct.pack('>i4sh', 7, b'spam', 8) # Make packed binary data data # => b'\x00\x00\x00\x07spam\x00\x08' F.write(data) # Write byte string F.close() # unpacking F = open('data.bin', 'rb'); data = F.read() # Get packed binary data values = struct.unpack('>i4sh', data) values # => (7, b'spam', 8) ``` ## Sets ```python empty_set = set() # Sets store...sets # Initialize a set with a bunch of values. Yeah, it looks a bit like a dict. Sorry. some_set = {1, 1, 2, 2, 3, 4} # some_set is now {1, 2, 3, 4} # Similar to keys of a dictionary, elements of a set have to be immutable. invalid_set = {[1], 1} # => Raises a TypeError: unhashable type: 'list' valid_set = {(1,), 1} # Add one more item to the set filled_set = some_set filled_set.add(5) # filled_set is now {1, 2, 3, 4, 5} # Do set intersection with & other_set = {3, 4, 5, 6} filled_set & other_set # => {3, 4, 5} {1, 2, 3}.intersection((1, 3, 5)) # => {1,3} # Do set union with | filled_set | other_set # => {1, 2, 3, 4, 5, 6} {1, 2, 3}.union([3, 4]) # => {1, 2, 3, 4} # Do set difference with - {1, 2, 3, 4} - {2, 3, 5} # => {1, 4} # Do set symmetric difference with ^ {1, 2, 3, 4} ^ {2, 3, 5} # => {1, 4, 5} # Check if set on the left is a superset of set on the right {1, 2} >= {1, 2, 3} # => False # Check if set on the left is a subset of set on the right {1, 2} <= {1, 2, 3} # => True {1, 2, 3}.issubset(range(-5, 5)) # => True # Check for existence in a set with in 2 in filled_set # => True 10 in filled_set # => False # useful example engineers = {'bob', 'sue', 'ann', 'vic'} managers = {'tom', 'sue'} 'bob' in engineers # Is bob an engineer? =>True engineers & managers # Who is both engineer and manager? => {'sue'} managers - engineers # Managers who are not engineers => {'tom'} ``` # Statements and syntax ## flow control ```python # Let's just make a variable some_var = 5 # Here is an if statement. Indentation is significant in Python! # Convention is to use four spaces, not tabs. # This prints "some_var is smaller than 10" if some_var > 10: print("some_var is totally bigger than 10.") elif some_var < 10: # This elif clause is optional. print("some_var is smaller than 10.") else: # This is optional too. print("some_var is indeed 10.") ``` ## Iteration ```python for item in ["a", "b", "c"]: # item becoms a, b, c for i in range(4): # 0 to 3 for i in range(4, 8): # 4 to 7 for i in range(0, 8, 2): # 0,2,4,6 for key, val in dict.items(): # gets key,val from dict for c in "hacker": print(c, end=' ') # => h a c k e r for c in 'spam': print(c.upper(), end="") # => SPAM x = 0 # While loops go until a condition is no longer met. while x < 4: print(x) x += 1 # Shorthand for x = x + 1 ``` ## Iterables ```python # Python offers a fundamental abstraction called the Iterable. # An iterable is an object that can be treated as a sequence. # The object returned by the range function, is an iterable. filled_dict = {"one": 1, "two": 2, "three": 3} our_iterable = filled_dict.keys() print(our_iterable) # => dict_keys(['one', 'two', 'three']). This is an object that implements our Iterable interface. # We can loop over it. for i in our_iterable: print(i) # Prints one, two, three # However we cannot address elements by index. our_iterable[1] # Raises a TypeError # An iterable is an object that knows how to create an iterator. our_iterator = iter(our_iterable) # Our iterator is an object that can remember the state as we traverse through it. # We get the next object with "next()". next(our_iterator) # => "one" # It maintains state as we iterate. next(our_iterator) # => "two" next(our_iterator) # => "three" # After the iterator has returned all of its data, it raises a StopIteration exception next(our_iterator) # Raises StopIteration # You can grab all the elements of an iterator by calling list() on it. list(filled_dict.keys()) # => Returns ["one", "two", "three"] ``` # Functions, generators, modules and packages ## Functions ```python # Use "def" to create new functions def add(x, y): print("x is {} and y is {}".format(x, y)) return x + y # Return values with a return statement # Calling functions with parameters add(5, 6) # => prints out "x is 5 and y is 6" and returns 11 # Another way to call functions is with keyword arguments add(y=6, x=5) # Keyword arguments can arrive in any order. # You can define functions that take a variable number of # positional arguments def varargs(*args): return args varargs(1, 2, 3) # => (1, 2, 3) # You can define functions that take a variable number of # keyword arguments, as well def keyword_args(**kwargs): return kwargs keyword_args(big="foot", loch="ness") # => {"big": "foot", "loch": "ness"} # You can do both at once, if you like def all_the_args(*args, **kwargs): print(args) print(kwargs) all_the_args(1, 2, a=3, b=4) # => (1, 2) \n {"a": 3, "b": 4} # When calling functions, you can do the opposite of args/kwargs! # Use * to expand tuples and use ** to expand kwargs. args = (1, 2, 3, 4) kwargs = {"a": 3, "b": 4} all_the_args(*args) # equivalent to all_the_args(1, 2, 3, 4) all_the_args(**kwargs) # equivalent to all_the_args(a=3, b=4) all_the_args(*args, **kwargs) # equivalent to all_the_args(1, 2, 3, 4, a=3, b=4) # Returning multiple values (with tuple assignments) def swap(x, y): return y, x # Return multiple values as a tuple without the parenthesis. # (Note: parenthesis have been excluded but can be included) x = 1 y = 2 x, y = swap(x, y) # => x = 2, y = 1 # (x, y) = swap(x,y) # Again parenthesis have been excluded but can be included. #you can also swap values as following x, y = y, x # Function Scope x = 5 def set_x(num): # Local var x not the same as global variable x x = num # => 43 print(x) # => 43 def set_global_x(num): global x print(x) # => 5 x = num # global var x is now set to 6 print(x) # => 6 set_x(43) set_global_x(6) # Python has first class functions def create_adder(x): def adder(y): return x + y return adder add_10 = create_adder(10) add_10(3) # => 13 ``` ## Anonymous functions ```python (lambda x: x > 2)(3) # => True (lambda x, y: x ** 2 + y ** 2)(2, 1) # => 5 # There are built-in higher order functions list(map(add_10, [1, 2, 3])) # => [11, 12, 13] list(map(max, [1, 2, 3], [4, 2, 1])) # => [4, 2, 3] list(filter(lambda x: x > 5, [3, 4, 5, 6, 7])) # => [6, 7] # We can use list comprehensions for nice maps and filters # List comprehension stores the output as a list which can itself be a nested list [add_10(i) for i in [1, 2, 3]] # => [11, 12, 13] [x for x in [3, 4, 5, 6, 7] if x > 5] # => [6, 7] # You can construct set and dict comprehensions as well. {x for x in 'abcddeef' if x not in 'abc'} # => {'d', 'e', 'f'} {x: x**2 for x in range(5)} # => {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} ``` ## Comprehensions ```python # Comprehensions are constructs that allow sequences to be built from other # sequences. Python 2.0 introduced list comprehensions and Python 3.0 comes with # dictionary and set comprehensions. # Say we need to obtain a list of all the integers in a sequence and then square them: a_list = [1, '4', 9, 'a', 0, 4] squared_ints = [ e**2 for e in a_list if type(e) == types.IntType ] print(squared_ints) # [ 1, 81, 0, 16 ] # Much the same results can be achieved using the built in functions, # map, filter and the anonymous lambda function. # Note that function calls in Python are expensive. #The filter function applies a predicate to a sequence: filter(lambda e: type(e) == types.IntType, a_list) #Map modifies each member of a sequence: map(lambda e: e**2, a_list) # The two can be combined: map(lambda e: e**2, filter(lambda e: type(e) == types.IntType, a_list)) ``` ## Generators ```python ``` ## Decorators ```python ``` ## Modules ```python # Python modules are just ordinary Python files. You # can write your own, and import them. The name of the # module is the same as the name of the file. # You can import modules import math print(math.sqrt(16)) # => 4.0 # You can get specific functions from a module from math import ceil, floor print(ceil(3.7)) # => 4.0 print(floor(3.7)) # => 3.0 # You can import all functions from a module. not recommended from math import * # You can shorten module names import math as m math.sqrt(16) == m.sqrt(16) # => True # You can find out which functions and attributes # are defined in a module. import math dir(math) # If you have a Python script named math.py in the same # folder as your current script, the file math.py will # be loaded instead of the built-in Python module. # This happens because the local folder has priority # over Python's built-in libraries. # commonly used builtin modules math.floor(d) #floor to next lower integer math.trunc(d) #drop decimal digits import random random.random() #random floating-point number between 0 and 1 random.randint(1, 10) #random int between 1 and 10 random.choice(['Life of Brian', 'Holy Grail', 'Meaning of Life']) # shuffle items randomly suits = ['hearts', 'clubs', 'diamonds', 'spades'] random.shuffle(suits) suits # => ['spades', 'hearts', 'diamonds', 'clubs'] ``` ## Casting ```python str(x) #converts x to string list(x) #converts x to a list int(x) #converts x to a integer number float(x) #converts x to a float number ``` ## Other built-in functions ```python ord(S) #converts from one-character string to integer character code chr(I) #converts from integer code back to char (ascii only) min(L) #returns the minimum value in L max(L) #returns the maximum value in L sum(L) #returns the sum of the values in L abs(n) #returns the absolute value of n pow(n1,n) #returns the exponentiation (power) of n1 on n round(n1,n) #returns the n1 number rounded to n digits range(n1,n2,n) #numbers from n1 to n2 in steps of n type(x) #returns the type of x (string, float…) help(s) #prints help about x ``` ## Regex ```python import re re.match(r'^[aeiou]', str) re.sub(r'^[aeiou]', '?', str) re.sub(r'(xyz)', r'\1', str) expr = re.compile(r'^...$') expr.match(...) expr.sub(...) ``` # Classes and inheritance ## Classes ```python # We use the "class" statement to create a class class Human: # A class attribute. It is shared by all instances of this class species = "H. sapiens" # Basic initializer, this is called when this class is instantiated. # Note that the double leading and trailing underscores denote objects # or attributes that are used by Python but that live in user-controlled # namespaces. Methods(or objects or attributes) like: __init__, __str__, # __repr__ etc. are called special methods (or sometimes called dunder methods) # You should not invent such names on your own. def __init__(self, name): # Assign the argument to the instance's name attribute self.name = name # Initialize property self._age = 0 # An instance method. All methods take "self" as the first argument def say(self, msg): print("{name}: {message}".format(name=self.name, message=msg)) # Another instance method def sing(self): return 'microphone check... one two... one two...' # A class method is shared among all instances # They are called with the calling class as the first argument @classmethod def get_species(cls): return cls.species # A static method is called without a class or instance reference @staticmethod def grunt(): return "*grunt*" # A property is just like a getter. # It turns the method age() into an read-only attribute of the same name. # There's no need to write trivial getters and setters in Python, though. @property def age(self): return self._age # This allows the property to be set @age.setter def age(self, age): self._age = age # This allows the property to be deleted @age.deleter def age(self): del self._age # When a Python interpreter reads a source file it executes all its code. # This __name__ check makes sure this code block is only executed when this # module is the main program. if __name__ == '__main__': # Instantiate a class i = Human(name="Ian") i.say("hi") # "Ian: hi" j = Human("Joel") j.say("hello") # "Joel: hello" # i and j are instances of type Human, or in other words: they are Human objects # Call our class method i.say(i.get_species()) # "Ian: H. sapiens" # Change the shared attribute Human.species = "H. neanderthalensis" i.say(i.get_species()) # => "Ian: H. neanderthalensis" j.say(j.get_species()) # => "Joel: H. neanderthalensis" # Call the static method print(Human.grunt()) # => "*grunt*" # Cannot call static method with instance of object # because i.grunt() will automatically put "self" (the object i) as an argument print(i.grunt()) # => TypeError: grunt() takes 0 positional arguments but 1 was given # Update the property for this instance i.age = 42 # Get the property i.say(i.age) # => "Ian: 42" j.say(j.age) # => "Joel: 0" # Delete the property del i.age # i.age # => this would raise an AttributeError ``` ## Inheritance ```python # Inheritance allows new child classes to be defined that inherit methods and # variables from their parent class. # Using the Human class defined above as the base or parent class, we can # define a child class, Superhero, which inherits the class variables like # "species", "name", and "age", as well as methods, like "sing" and "grunt" # from the Human class, but can also have its own unique properties. # To take advantage of modularization by file you could place the classes above in their own files, # say, human.py # To import functions from other files use the following format # from "filename-without-extension" import "function-or-class" from human import Human # Specify the parent class(es) as parameters to the class definition class Superhero(Human): # If the child class should inherit all of the parent's definitions without # any modifications, you can just use the "pass" keyword (and nothing else) # but in this case it is commented out to allow for a unique child class: # pass # Child classes can override their parents' attributes species = 'Superhuman' # Children automatically inherit their parent class's constructor including # its arguments, but can also define additional arguments or definitions # and override its methods such as the class constructor. # This constructor inherits the "name" argument from the "Human" class and # adds the "superpower" and "movie" arguments: def __init__(self, name, movie=False, superpowers=["super strength", "bulletproofing"]): # add additional class attributes: self.fictional = True self.movie = movie self.superpowers = superpowers # The "super" function lets you access the parent class's methods # that are overridden by the child, in this case, the __init__ method. # This calls the parent class constructor: super().__init__(name) # override the sing method def sing(self): return 'Dun, dun, DUN!' # add an additional instance method def boast(self): for power in self.superpowers: print("I wield the power of {pow}!".format(pow=power)) if __name__ == '__main__': sup = Superhero(name="Tick") # Instance type checks if isinstance(sup, Human): print('I am human') if type(sup) is Superhero: print('I am a superhero') # Get the Method Resolution search Order used by both getattr() and super() # This attribute is dynamic and can be updated print(Superhero.__mro__) # => (, # => , ) # Calls parent method but uses its own class attribute print(sup.get_species()) # => Superhuman # Calls overridden method print(sup.sing()) # => Dun, dun, DUN! # Calls method from Human sup.say('Spoon') # => Tick: Spoon # Call method that exists only in Superhero sup.boast() # => I wield the power of super strength! # => I wield the power of bulletproofing! # Inherited class attribute sup.age = 31 print(sup.age) # => 31 # Attribute that only exists within Superhero print('Am I Oscar eligible? ' + str(sup.movie)) ``` ## Multiple inheritance ```python # Another class definition # bat.py class Bat: species = 'Baty' def __init__(self, can_fly=True): self.fly = can_fly # This class also has a say method def say(self, msg): msg = '... ... ...' return msg # And its own method as well def sonar(self): return '))) ... (((' if __name__ == '__main__': b = Bat() print(b.say('hello')) print(b.fly) # And yet another class definition that inherits from Superhero and Bat # superhero.py from superhero import Superhero from bat import Bat # Define Batman as a child that inherits from both Superhero and Bat class Batman(Superhero, Bat): def __init__(self, *args, **kwargs): # Typically to inherit attributes you have to call super: # super(Batman, self).__init__(*args, **kwargs) # However we are dealing with multiple inheritance here, and super() # only works with the next base class in the MRO list. # So instead we explicitly call __init__ for all ancestors. # The use of *args and **kwargs allows for a clean way to pass arguments, # with each parent "peeling a layer of the onion". Superhero.__init__(self, 'anonymous', movie=True, superpowers=['Wealthy'], *args, **kwargs) Bat.__init__(self, *args, can_fly=False, **kwargs) # override the value for the name attribute self.name = 'Sad Affleck' def sing(self): return 'nan nan nan nan nan batman!' if __name__ == '__main__': sup = Batman() # Get the Method Resolution search Order used by both getattr() and super(). # This attribute is dynamic and can be updated print(Batman.__mro__) # => (, # => , # => , # => , ) # Calls parent method but uses its own class attribute print(sup.get_species()) # => Superhuman # Calls overridden method print(sup.sing()) # => nan nan nan nan nan batman! # Calls method from Human, because inheritance order matters sup.say('I agree') # => Sad Affleck: I agree # Call method that exists only in 2nd ancestor print(sup.sonar()) # => ))) ... ((( # Inherited class attribute sup.age = 100 print(sup.age) # => 100 # Inherited attribute from 2nd ancestor whose default value was overridden. print('Can I fly? ' + str(sup.fly)) # => Can I fly? False ``` # Error handling ```python # Handle exceptions with a try/except block try: # Use "raise" to raise an error raise IndexError("This is an index error") except IndexError as e: # Pass is just a no-op. Usually you would do recovery here. pass except (TypeError, NameError): # Multiple exceptions can be handled together, if required. pass else: # Optional clause to the try/except block. Must follow all except blocks # Runs only if the code in try raises no exceptions print("All good!") finally: # Execute under all circumstances print("We can clean up resources here") # Instead of try/finally to cleanup resources you can use a with statement with open("myfile.txt") as f: for line in f: print(line) ``` # Common mistakes ## Silly things - Write `return` immediately after defining function. - Misspellings often dont generate errors. Combat them by using pylint and writing tests regularly. - Beware of mixing up def and class ## Style - Hungarian notation is blnNO. - Listen to PEP8. - Use lambdas sparingly. - Complex comprehension should go into a function. ## Structure ###### Pathological If/Elif Blocks (anti-pattern) If we really do need to manage many special cases, we can employ the Strategy pattern: ```python def strategy1(): ... def strategy2(): ... strategies = { 'condition1': strategy1, 'condition2': strategy2, ... } def do_awesome_stuff(): which_one = ... strategy = strategies[which_one] strategy() ... ``` We start by extracting the contents of our if / elif / else structure into separate functions with identical interfaces. Then we can create a dictionary to map conditions to those strategy functions. The dic‐ tionary key doesn’t have to be a string. It can be anything hashable, so tuples and frozensets can be quite effective if we need richer con‐ ditions. Finally, our original function determines which key to use, plucks the appropriate strategy function from our dictionary, and invokes it. Our original function is now much, much simpler to understand, as are each of the strategies, and writing tests for each of the now- isolated strategies is straightforward. However, figuring out what value to use for that dictionary key can sometimes be complicated. If it takes 200 lines to determine what key to use, is this really much of a victory? If that’s the case, consider externalizing it entirely, and let the strategy be chosen by the caller, who may in fact know better than we do about whatever those factors are. The strategy is invoked as a call‐back: ```python def do_awesome_stuff(strategy): ... strategy() ... result = do_awesome_stuff(strategy1) ``` From there it’s not too far of a jump into dependency injection, where our code is provided with what it needs, rather than having to be smart enough to ask for it on its own: ```python class Foo(object): def __init__(self, strategy): self.strategy = strategy def do_awesome_stuff(self): ... self.strategy() ... foo = Foo(strategy2) foo.do_awesome_stuff() ``` ###### Unnecessary Getters and Setters Make most attributes public, and use properties to protect any special snowflakes that need extra care and feeding. Don't do: ```python class InviteEvent(object): ... def getEventNumber(self): return self._intEventNumber def setEventNumber(self, x): self._intEventNumber = int(x) ... event.setEventNumber(10) print event.getEventNumber() ``` Do this instead: ```python class InviteEvent(object): ... @property def event_number(self): return self._event_number @event_number.setter def _set_event_number(self, x): self._event_number = int(x) @event_number.deleter def _delete_event_number(self): self._event_number = None event.event_number = 10 print event.event_number ``` ###### Getting Wrapped Up in Decorators A decorator is a function (or, more generally, a callable) that returns a function, which replaces the function being decorated. Did you want to test the original function in isolation? Too bad— that function is effectively gone. Your test has no choice but to exercise the final, multilayered Frankenstein function. Make the decorated method as simple and devoid of logic as possible, pushing all of its smarts down into a deeper layer of abstraction that can be tested in isolation. ###### Breaking the law of Demeter The Law of Demeter (also known as the principle of least knowledge) tells us that our code should only interact with the things that it knows about, and not reach deeply into nested attributes, across friends of friends, and into strangers. Don't do this: ```python gvars.objSession.objCustomer.objMemberStatus.isPAID() if gvars.dctEnv['session'].getCustomer().isSignedIn(): current_url = self.objSession._getCurrentURL() #underscore! return event._objGuestList.getGuestList()[0].getEventSequence() ``` ###### Overusing Private Attributes Since single-underscore names aren’t mangled (unlike double underscore names), they can be more conveniently used if you absolutely must break the Law of Demeter. When “outside” code wants access to the internals of a class, those “internal” attributes probably shouldn’t be private at all; rather, this is a clear signal that those attributes should be public. The code is telling us that it’s time to refactor! ###### God Objects and God Methods When a class or method has accumulated too much knowledge or too many responsibilities, its role in the system becomes practically godlike: it has become all-encompassing, all- seeing, and all-doing, and many other entities will end up being tightly coupled to it in order to get anything done. Unix command-line kung fu to identify potential sources of godlike trouble: ```bash $ find . -name "*.py" -exec wc -l {} \; | sort -r $ grep "^class " bigmodule.py | wc -l $ grep "\sdef " bigmodule.py | wc -l ``` If you find these kinds of abominations in your code, it’s a sign that it’s time to take a deep breath and refactor them. Favor small functions and small classes that have as few responsibilities as possible, and strive to do as little work as possible in the `__init__` so that your classes are easy to instantiate, with no weird side effects, and your tests can be easy and lightweight. You want to break up these wanna-be gods before they get out of hand. ###### Global State Avoid global state as much as humanly possible. Resist its siren lure, reject its convenience, refuse the temptation, no matter how much your colleagues think they want it. In the long run, they’ll thank you for not having to maintain such monstrosities. ## Surprises ###### Importing Everything Don't. Listen to pep8 and avoid wildcard imports. ###### Overbroadly Silencing Exceptions Diaper Pattern **Never** catch all errors: ```python try: do_something() except: pass ``` In most cases, the best choice is to catch a more specific exception. Something like this: ```python try: do_something() # Catch some very specific exception - KeyError, ValueError, etc. except ValueError: pass ``` If some code path simply must broadly catch all exceptions - for example, the top-level loop for some long-running persistent process - then each caught exception must write the full stack trace to a log or file, along with a timestamp. ```python import logging def get_number(): return int('foo') try: x = get_number() except Exception as ex: logging.exception('Caught an error') ``` ###### Reinventing the Wheel - Look at the standard library and PyPI to see if someone has already solved your problem. If whatever problem it is isn’t your core domain, your home-grown implementation is probably going to be worse. - If you decide to replace one solution in favor of another, see it through; update the codebase to your new standard so that you don’t have a dozen ways to do essentially the same thing. - Establish and enforce standards so that surprises—both during development and at runtime—are minimized. ###### Mutable Keyword Argument Defaults Python doesn’t give us a new list every time the function gets called; it creates that empty list when the function is defined during the import of its module. And so the following is incredibly dangerous and hides a bug: ```python def set_reminders(self, event, reminders=[]): ``` When the default value really does need to be mutable, we set it to None in the function definition, and then immediately give it a reasonable default inside the function itself: ```python def set_reminders(self, event, reminders=None): reminders = reminders or [] # or, if there are valid falsey inputs # that we'd like to preserve: reminders = [] if reminders is None else reminders ... ``` This way we’re guaranteed to start with a fresh instance each time the function is called, and data won’t leak between invocations. ###### Overeager code - Don’t do expensive things at import. - Don’t couple to resources that might not be available at import. - Don’t put anything into an `__init__.py` that could jeopardize the import. - Don't do too much when an object is instantiated - Beware of circular imports. ###### Poisoning Persistent State Any time you’re messing with the contents of a module, or of a class definition, or of anything else that persists outside the scope of a function call, you have an opportunity to shoot yourself in the foot. Proceed with caution when you find yourself writing code like this, and make good use of logging to verify that your assumptions hold. ###### Assuming Logging Is Unnecessary - “This code is too simple to need logging.” - “The service I’m integrating with will always work.” - “I’ll add logging later.” No. Logging, now. **Log at Boundaries** That can be when entering or leaving a method, when branching ( if / elif/ else ) or looping ( for, while ), when there might be errors ( try / except / finally ), or before and after calling some external service. The type of boundary will guide your choice of log level; for example, debug is best in branching and looping situations, where info makes more sense when entering or leaving larger blocks. **Log Actions, Motives, and Results** Don’t just log what you’re doing, but why, and what happened. This can include actions you’re about to take, decisions made and the information used to make them, errors and exceptions, and things like the URL of a service you’re calling, the data you’re sending to it, and what it returned. **Log Mindfully** - Unless you have a fancy aggregation tool, like Splunk or Loggly, a single application (or website) should share a single log file. - Choose an appropriate level when logging messages. Take a moment to really think about whether a message is for debugging, general information, a caution, or an error. - Be mindful of unsanitized user input, of users’ person‐ ally identifiable information, and especially health and payment data. ###### Assuming tests are unnecessary As long as our code is syntactically reasonable, Python will cheer‐ fully do its best to execute it, even if that means we’ve forgotten to return a value, gotten our types mismatched, mixed up a sign in some tricky math, used the wrong variable or misspelled a variable name, or committed any number of other common programmer errors. When we have unit tests, we learn about our errors up front, as we make them, rather than during integration—or worse, production—where it’s much more expensive to resolve them. As an added bonus, when your code can be easily tested, it is more likely to be better structured and thus cleaner and more maintainable. So go make friends with unittest , Pytest, or Nose, and explore what the Mock library can do to help you isolate components from one another. Get comfortable with testing, practice it until it becomes like a reflex. Utilize logging. Testing now will help prevent weird surprises later. ## Gotchas ###### Common Gotchas For the most part, Python aims to be a clean and consistent language that avoids surprises. However, there are a few cases that can be confusing to newcomers. Some of these cases are intentional but can be potentially surprising. Some could arguably be considered language warts. In general, what follows is a collection of potentially tricky behavior that might seem strange at first glance, but is generally sensible once you're aware of the underlying cause for the surprise. ###### Mutable Default Arguments Seemingly the *most* common surprise new Python programmers encounter is Python's treatment of mutable default arguments in function definitions. **What You Wrote** ```python def append_to(element, to=[]): to.append(element) return to ``` **What You Might Have Expected to Happen** ```python my_list = append_to(12) print(my_list) my_other_list = append_to(42) print(my_other_list) ``` A new list is created each time the function is called if a second argument isn't provided, so that the output is: ``` [12] [42] ``` **What Does Happen** ``` [12] [12, 42] ``` A new list is created *once* when the function is defined, and the same list is used in each successive call. Python's default arguments are evaluated *once* when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you *will* and have mutated that object for all future calls to the function as well. **What You Should Do Instead** Create a new object each time the function is called, by using a default arg to signal that no argument was provided (`None` is often a good choice). ```python def append_to(element, to=None): if to is None: to = [] to.append(element) return to ``` Do not forget, you are passing a *list* object as the second argument. **When the Gotcha Isn't a Gotcha** Sometimes you can specifically "exploit" (read: use as intended) this behavior to maintain state between calls of a function. This is often done when writing a caching function. ###### Late Binding Closures Another common source of confusion is the way Python binds its variables in closures (or in the surrounding global scope). **What You Wrote** ```python def create_multipliers(): return [lambda x : i * x for i in range(5)] ``` **What You Might Have Expected to Happen** ```python for multiplier in create_multipliers(): print(multiplier(2)) ``` A list containing five functions that each have their own closed-over `i` variable that multiplies their argument, producing: ``` 0 2 4 6 8 ``` **What Does Happen** ``` 8 8 8 8 8 ``` Five functions are created; instead all of them just multiply `x` by 4. Python’s closures are _late binding_. This means that the values of variables used in closures are looked up at the time the inner function is called. Here, whenever any of the returned functions are called, the value of i is looked up in the surrounding scope at call time. By then, the loop has completed and i is left with its final value of 4. What’s particularly nasty about this gotcha is the seemingly prevalent misinformation that this has something to do with lambdas in Python. Functions created with a `lambda` expression are in no way special, and in fact the same exact behavior is exhibited by just using an ordinary `def`: ```python def create_multipliers(): multipliers = [] for i in range(5): def multiplier(x): return i * x multipliers.append(multiplier) return multipliers ``` **What You Should Do Instead** The most general solution is arguably a bit of a hack. Due to Python’s aforementioned behavior concerning evaluating default arguments to functions (see Mutable Default Arguments), you can create a closure that binds immediately to its arguments by using a default arg like so: ```python def create_multipliers(): return [lambda x, i=i : i * x for i in range(5)] ``` Alternatively, you can use the functools.partial function: ```python from functools import partial from operator import mul def create_multipliers(): return [partial(mul, i) for i in range(5)] ``` **When the Gotcha Isn’t a Gotcha** Sometimes you want your closures to behave this way. Late binding is good in lots of situations. Looping to create unique functions is unfortunately a case where they can cause hiccups. # Competitive programming Competitive programming is a mind sport usually held over the Internet or a local network, involving participants trying to program according to provided specifications. Contestants are referred to as sport programmers. A programming competition generally involves the host presenting a set of logical or mathematical problems, also known as puzzles, to the contestants, and contestants are required to write computer programs capable of solving each problem. Judging is based mostly upon number of problems solved and time spent for writing successful solutions, but may also include other factors (quality of output produced, execution time, program size, etc.) You can find a lot of resources [here](https://github.com/lnishan/awesome-competitive-programming). **Why bother?** Being good at competitive programming means you: - have good problem solving skills - can operate under time pressure - have a good understanding of algorithms and data structures If you are bad at competitive programming, it most likely means you don't truly understand algorithms/data structures, and you have a weak problem solving foundation. It's harsh, but it's most likely true. The good thing is that you can improve your problem solving skills depending on the time you spend. Does this affect you? If you want to work for companies such as FB/Google/Microsoft or any big company, you will need to be at competitive coding to some degree since all the interviews are based on these types of questions. If you want to excel in your job, get promoted quickly/go into management, then obviously other qualities and skills, especially networking, are 10x as important. But at the same time you will need to jump ship periodically to advance quickly in your career, which will require staying sharp and constant interviewing. Overall, if you're an ambitious person that wants to excel in SV, competitive coding skills are paramount for interviewing. Of course, there are other ways to "make it": start-ups -> getting acquired/IPO/ becoming a VC. So you don't need to go to FAANG. For those things, skills like people selection/business prioritization are 10x more important. If you don't give a shit about status/money, then you don't need to care about competitive programming because lower-tier companies won't care too much about that. The bottom line is that it's not hard to learn OOP concepts/structure/ abstraction or fucking whatever other coding guidelines there are. It's just a bunch of best practices. If I had to say it in one line, it would be that if someone is good at competitive programming, they can get good at designing software. The opposite is not necessarily true, because problem solving isn't something you learn in a day. So there will be more demand for people that are good at problem solving, who will be valued higher. Sites for practice - [HackerRank](https://www.hackerrank.com/) - [LeetCode](https://leetcode.com/explore/) - [exercism](https://exercism.io/) - [CodeWars](https://www.codewars.com/) - [Project Euler](https://projecteuler.net/) ## Special data structures in Python Resources on competitive programming on Python is limited. Most of the competitive programmers write in Cpp and Java, for their speed in execution. Moreover, examples in Python shown on GeeksForGeeks is transliterated from C++/Java, which ends up verbose and difficult to modify. This serves as a resource to bridge Python programmers into competitive programming. This is not a guide for getting started in Python. It assumes that you already know how to use the basic built-in Data Structures - String `st = "abcd"` - List arr = `[4,5,6,7,8,5]` - Dictionaries `d = {}` This is not an introduction to data structures and algorithms. It is best if you have a rough idea of the big-O notation. Otherwise, think of the big-O as the type of increase of space or time required by the computation. This understanding is useful to help you predict whether your algorithm can finish running in a specified time and space. ## List and its alternatives See [documentation](https://docs.python.org/3/tutorial/datastructures.html) Basic operations on a python list - Reading from a list given an index (aka random access) `arr[4]` - Appending to the list from right `arr.append(1)` - Popping from the list from right `arr.pop()` The above operations take O(1) time. You can modify a list in other ways as follows - Popping from the list from left `del arr[0]` - Reversing an array `arr = arr[::-1]` - Appending to the list from left `arr.insert(0, x)` The operations take O(n) time to run. If you want it faster, you have to use `deque`, which will be explained. The following are the best practices to make the use of lists concise., using the In-built functions and concepts ## Enumerate See [documentation](https://docs.python.org/3/library/functions.html#enumerate) You want to track both the index and element when iterating through the array. ```python arr = [3,1,4,1,5,9,2,6,5,3,5,9] for i in range(len(arr)): arr[i] = arr[i] + i arr # [3, 2, 6, 4, 9, 14, 8, 13, 13, 12, 15, 20] ``` You can use enumerate instead ```python arr = [3,1,4,1,5,9,2,6,5,3,5,9] for i,ar in enumerate(arr): arr[i] = ar + i arr # [3, 2, 6, 4, 9, 14, 8, 13, 13, 12, 15, 20] ``` The benefit is to avoid the use of awkward `range(len())`. Moreover, without the need to use arr[i] in the loop you can reduce the nesting and make the code clearer. ## List comprehensions See [documentation](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) You can iterate through a list with one line of code. For example, you want to convert a list of integers to a list of strings. ```python arr = [3,1,4,1,5,9,2] for i,ar in enumerate(arr): arr[i] = str(ar) arr # ['3', '1', '4', '1', '5', '9', '2'] ``` You can do this in one line with a list comprehension. ```python arr = [3,1,4,1,5,9,2] arr2 = [str(ar) for ar in arr] ``` You can imbue logic in list comprehensions. ```python arr3 = [str(ar) for ar in arr if ar > 5] arr4 = [str(ar) if ar > 5 else "x" for ar in arr] ``` ## 2D list Due to the way Python is structured, beginners usually make the mistake unknowingly. ```python arr = [0]*5 arr[1] = 9 arr # [0, 9, 0, 0, 0] ok arr2 = [[0]]*5 arr2[1][0] = 9 arr2 # [[9], [9], [9], [9], [9]] ``` This is how you should set up a 2D list. This behavior has been discussed on [stackoverflow](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly). ```python arr3 = [[0 for _ in range(1)] for _ in range(5)] arr3[1][0] = 9 arr3 # [[0], [9], [0], [0], [0]] ``` For matrix operations, you may need to use the numpy package. When the matrix is defined to be a number (i.e. a numpy float), numeric operations on the matrix can be faster. However, this tutorial will not cover the use of packages from numpy. ## deque object See [documentation](https://docs.python.org/3/library/collections.html#collections.deque) You might want to append to both sides of a list. deque is a doubly-linked list. deque is a list-like container with fast appends and pops on either end. ```python from collections import deque de = deque(arr) ``` Popping and appending to the list on either side is now O(1). ```python de.append(1) de.pop() # 1 de.appendleft(0) de.popleft() # 0 ``` Reading from the list (aka random access) de[3] now takes O(n) time. ____ **Example problem: leetcode 862: Shortest Subarray with Sum at Least K** Return the **length** of the shortest, non-empty, contiguous subarray of `A` with sum at least `K`. If there is no non-empty subarray with sum at least `K`, return `-1`. Example 1: ``` Input: A = [1], K = 1 Output: 1 ``` Example 2: ``` Input: A = [1,2], K = 4 Output: -1 ``` Example 3: ``` Input: A = [2,-1,2], K = 3 Output: 3 ``` Note: ``` 1 <= A.length <= 50000 -10 ^ 5 <= A[i] <= 10 ^ 5 1 <= K <= 10 ^ 9 ``` **Solution:** We can rephrase this as a problem about the prefix sums of A. Let `P[i] = A[0] + A[1] + ... + A[i-1]``. We want the smallest `y-x` such that `y > x` and `P[y] - P[x] >= K`. Motivated by that equation, let `opt(y)`` be the largest `x` such that P[x] <= P[y] - K`. We need two key observations: - If `x1 < x2` and `P[x2] <= P[x1]``, then `opt(y)`` can never be `x1`, as if `P[x1] <= P[y] - K`, then `P[x2] <= P[x1] <= P[y] - K` but `y - x2` is smaller. This implies that our candidates `x` for `opt(y)` will have increasing values of `P[x]`. - If `opt(y1) = x`, then we do not need to consider this `x` again. For if we find some `y2 > y1` with `opt(y2) = x`, then it represents an answer of `y2 - x` which is worse (larger) than `y1 - x`. **Algorithm:** Maintain a "monoqueue" of indices of `P`: a deque of indices `x_0, x_1, ...`` such that `P[x_0], P[x_1], ...` is increasing. When adding a new index `y`, we'll pop `x_i` from the end of the deque so that `P[x_0], P[x_1], ..., P[y]` will be increasing. If `P[y] >= P[x_0] + K`, then (as previously described), we don't need to consider this `x_0` again, and we can pop it from the front of the deque. ```python class Solution(object): def shortestSubarray(self, A, K): N = len(A) P = [0] for x in A: P.append(P[-1] + x) #Want smallest y-x with Py - Px >= K ans = N+1 # N+1 is impossible monoq = collections.deque() #opt(y) candidates, represented as indices of P for y, Py in enumerate(P): #Want opt(y) = largest x with Px <= Py - K while monoq and Py <= P[monoq[-1]]: monoq.pop() while monoq and Py - P[monoq[0]] >= K: ans = min(ans, y - monoq.popleft()) monoq.append(y) return ans if ans < N+1 else -1 ``` ____ ## bisect functions See [documentation](https://docs.python.org/3/library/bisect.html) You have a list you know is sorted. You are given a new number, and you want to find the first index which element is greater than or equal to the new number. ```python arr = [1,1,2,3,4,5,5,5,5,9] x = 5 ``` One way is to iterate through the entire array. ```python for index,ar in enumerate(arr): if ar >= x: print(index) break ``` Instead of "greater than or equal to" you want the first element that is "greater than". ```python for index2,ar in enumerate(arr): if ar > x: print(index2) break else: print(index2) ``` The other way is to implement a binary search. Instead of implementing the binary search yourself, you can use the functions from bisect. However, please be prepared to implement a binary search from scratch, as it may be required for interview and custom problems. ```python import bisect index = bisect.bisect_left(arr, x) index # 5 ``` This is for the case when you want the first element "greater than" rather than "greater than or equal to". ```python index2 = bisect.bisect_right(arr, x) index2 # 9 ``` ____ **Example problem: leetcode 475. Heaters** Winter is coming! During the contest, your first job is to design a standard heater with a fixed warm radius to warm all the houses. Every house can be warmed, as long as the house is within the heater's warm radius range. Given the positions of `houses` and `heaters` on a horizontal line, return the minimum radius standard of heaters so that those heaters could cover all houses. Notice that all the `heaters` follow your radius standard, and the warm radius will the same. Example 1: ``` Input: houses = [1,2,3], heaters = [2] Output: 1 Explanation: The only heater was placed in the position 2, and if we use the radius 1 standard, then all the houses can be warmed. ``` Example 2: ``` Input: houses = [1,2,3,4], heaters = [1,4] Output: 1 Explanation: The two heater was placed in the position 1 and 4. We need to use radius 1 standard, then all the houses can be warmed. ``` Example 3: ``` Input: houses = [1,5], heaters = [2] Output: 3 ``` Constraints: ``` 1 <= houses.length, heaters.length <= 3 * 104 1 <= houses[i], heaters[i] <= 109 ``` **Solution:** ```python def findRadius(self, houses, heaters): heaters.sort() return max(min(abs(house - heater) for i in [bisect.bisect(heaters, house)] for heater in heaters[i-(i>0):i+1]) for house in houses) ``` **Alternative Solution:** Go through houses and heaters in ascending order. My `i` points to the current closest heater. Go to the next heater if the current house coordinate is larger than or equal to the middle between the current and the next heater. ```python def findRadius(self, houses, heaters): heaters = sorted(heaters) + [float('inf')] i = r = 0 for x in sorted(houses): while x >= sum(heaters[i:i+2]) / 2.: i += 1 r = max(r, abs(heaters[i] - x)) return r ``` ____ ## heapq object See [documentation](https://docs.python.org/3/library/heapq.html) You want to track the smallest object in the list, and remove it when needed. One way is to continually sort the list and remove the smallest element. Sorting takes O(nlogn) ```python arr = [4,6,7,1] new = [[1], [], [9,9], [5]] for ne in new: arr.extend(ne) arr.sort() print(arr) del arr[0] ``` The other way is to identify the smallest element and remove it. This takes O(n) time for every operation. However, there is this special data structure that allows the small value. You can find several interactive visualisation on the Internet. ```python arr = [4,6,7,1] import heapq heapq.heapify(arr) print(arr) # [1, 4, 7, 6] ``` You see that the array is somewhat sorted, but not really. From the documentation: > Heaps are binary trees for which every parent node has a value less than or equal to any of its children. This implementation uses arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that its smallest element is always the root, heap[0] ```python heapq.heappush(arr,4) # push an element popped = heapq.heappop(arr) # returns the smallest element popped # 1 ``` This is a minimum heap. For max-heap, the recommended solution is to multiply and input. As heapq is used to maintain fast popping of the smallest element, the baseline method of iterating through the whole array for the for the smallest element will take O(n) time. In API-based questions, you may be tasked to get the smallest element repeatedly and doing it in O(n) may be too slow. With heapq, the pushing and popping of elements take O(logn) time. Note that creating a heap takes O(nlogn) time. ____ **Example problem: leetcode 703: K^th^ Largest Element in a Stream** Design a class to find the `k^th^` largest element in a stream. Note that it is the `k^th^` largest element in the sorted order, not the `k^th^` distinct element. Implement `KthLargest` class: - `KthLargest(int k, int[] nums)` Initializes the object with the integer `k` and the stream of integers `nums`. - `int add(int val)` Returns the element representing the `k^th^` largest element in the stream. Example 1: ``` Input ["KthLargest", "add", "add", "add", "add", "add"] [[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]] Output [null, 4, 5, 5, 8, 8] Explanation KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]); kthLargest.add(3); // return 4 kthLargest.add(5); // return 5 kthLargest.add(10); // return 5 kthLargest.add(9); // return 8 kthLargest.add(4); // return 8 ``` Constraints: ``` 1 <= k <= 104 0 <= nums.length <= 104 -104 <= nums[i] <= 104 -104 <= val <= 104 At most 104 calls will be made to add. ``` **Solution:** Create a pq - keep it only having the k-largest elements by popping off small elements. With only k elements, the smallest item (self.pool[0]) will always be the kth largest. If a new value is bigger than the smallest, it should be added into your heap. If it's bigger than the smallest (that are already the kth largest), it will certainly be within the kth largest of the stream. ```python class KthLargest(object): def __init__(self, k, nums): self.pool = nums self.k = k heapq.heapify(self.pool) while len(self.pool) > k: heapq.heappop(self.pool) def add(self, val): if len(self.pool) < self.k: heapq.heappush(self.pool, val) elif val > self.pool[0]: heapq.heapreplace(self.pool, val) return self.pool[0] ``` ____ ## dict and its subclasses See [documentation](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) A dictionary is a hashmap of key-value pairs. - Creating a dictionary `d={}` - Populating a key with a value `d["foo"] = "bar"` - To check if a dictionary has a key `foo in d` - Deleting a key as well as its value `del d["foo"]` The above operations take O(1) time. You can obtain the keys by converting the dictionary into a list. ```python d = {"foo":"bar", "bar":"baz"} list(d) sorted(d) ``` If you want the values, you have to iterate through its keys. ```python [(k,v) for k,v in d.items()] ``` Some procedures using a dictionary can be implemented with less code. Concise code is more understandable and less prone to mistakes. ## defaultdict See [documentation](https://docs.python.org/3/library/collections.html#collections.defaultdict) For example, you have a list of directed edges you want to populate this into a dictionary. ```python edges = [[1,2], [2,1], [1,3]] d = {} for start, end in edges: if start in d: d[start].append(end) else: d[start] = [end] d # {1: [2, 3], 2: [1]} ``` You may use defaultdict to skip initializing the value for every key. ```python from collections import defaultdict d = defaultdict(list) for start, end in edges: d[start].append(end) d # defaultdict(, {1: [2, 3], 2: [1]}) ``` This makes your code neater. However, it assumes that every value is a list. This is useful in helping to construct graphs from a list of edges. Graphs are usually necessary to solve problems involving paths. ## Counter See [documentation](https://docs.python.org/3/library/collections.html#collections.Counter) For example, you want to count the number of each element in a list. ```python digits = [3,1,4,1,5,9,2,6,5,3,5,9] d = defaultdict(int) for digit in digits: d[digit] += 1 d # defaultdict(, {3: 2, 1: 2, 4: 1, 5: 3, 9: 2, 2: 1, 6: 1}) ``` There is a function `Counter` which does the work in one line after importing. ```python from collections import Counter d = Counter(digits) d # Counter({5: 3, 3: 2, 1: 2, 9: 2, 4: 1, 2: 1, 6: 1}) ``` ____ **Example problem: leetcode 1296. Divide Array in Sets of K Consecutive Numbers** Given an array of integers `nums` and a positive integer `k`, find whether it's possible to divide this array into sets of `k` consecutive numbers. Return `True` if its possible otherwise return `False`. Example 1: ``` Input: nums = [1,2,3,3,4,4,5,6], k = 4 Output: true Explanation: Array can be divided into [1,2,3,4] and [3,4,5,6]. ``` Example 2: ``` Input: nums = [3,2,1,2,3,4,3,4,5,9,10,11], k = 3 Output: true Explanation: Array can be divided into [1,2,3] , [2,3,4] , [3,4,5] and [9,10,11]. ``` Example 3: ``` Input: nums = [3,3,2,2,1,1], k = 3 Output: true ``` Example 4: ``` Input: nums = [1,2,3,4], k = 3 Output: false Explanation: Each array should be divided in subarrays of size 3. ``` Constraints: ``` 1 <= nums.length <= 10^5 1 <= nums[i] <= 10^9 1 <= k <= nums.length ``` **Solution:** ```python import collections class Solution: def isPossibleDivide(self, nums: List[int], k: int) -> bool: count = collections.Counter(nums) keys = sorted(count.keys()) for n in keys: if count[n] > 0: minus = count[n] for i in range(n,n+k): if count[i] < minus: return False count[i] -= minus return True ``` ____ ## set See [documentation](https://docs.python.org/3/library/stdtypes.html#set) Sometimes you want a dictionary without its values. We present the python set. You can create a set from a list, or define it in non-empty curly braces. ```python set1 = set(["one-only", "one-two"]) set2 = {"two-only", "one-two"} ``` Like a list, you can iterate through a set and return its length ```python print(len(set1)) for x in set1: print(x) ``` You can check whether an element is in a set ```python "one-two" in set1 ``` The use of a set can help you filter the unique items of a list ```python arr = [1,1,2,3,3] arr = list(set(arr)) ``` You can add elements to a set (or multiple elements with update) ```python set1.add("orange") set1.update(["apple", "orange"]) ``` You can delete an item from a set. ```python set1.remove("apple") ``` You can find the union of two sets. In the following, the elements in set3 is made up of elements that appear either in set1 or set2 or both. ```python set3 = set1.union(set2) set3 = set1 | set2 set3 # {'one-only', 'two-only', 'one-two', 'orange'} ``` You can find the intersection of two sets. In the following, the elements in set3 is made up of elements that appear either in both set1 and set2. ```python set3 = set1.intersection(set2) set3 = set1 & set2 set3 # {'one-two'} ``` You can take the difference of one set from another. ```python set3 = set1.difference(set2) set3 = set1 - set2 # {'one-only', 'orange'} ``` You can take the union and exclude the intersection of a pair of sets. ```python set3 = set1.symmetric_difference(set2) set3 = set1 ^ set2 # {'two-only', 'one-only', 'orange'} ```