Python Vocab 101
Learning a new programmming language can be daunting, and confusing because essentially you are learning a new way of talking to a computation machine, a.k.a the computer to do your bidding.
With that in mind, what’s a better way to learn new a language than to know its vocabulary (ie functions and methods).
We will be focusing on basic vocab that are often used for text processing, but I am pretty sure it will also be useful for other things, as it is quiet general
Importing and installing a package
Python is widely popular for its vast number of packages that will do pretty much eveything you need done. To use these packages in your environment simply call import, and if you do not have that particular package in your environment you can install it by calling pip install and/or conda install depending on your environment
%pip install spacy
%pip install pandas
In some environment, you might need to add “%” or “!” before the pip command
import pandas as pd
import matplotlib.pyplot as plt
import spacy
Putting “as” after the package name allows you to shorten the package when called.
For example, instead of using pandas.read_csv(), you can do pd.read_csv().
You can also do a selective import by using the format from Package Name import Method Name
from spacy import tokenizer
If a package has a hyphen on its name, you need to change it to an underscore when you import it
%pip install -U sentence-transformers
import sentence_transformers
Popular functions and methods
Python offers built-in functions and methods that are extermely useful that will make your life easier. Some of them are below.
A function has the following syntax — function(argument)
A method has the following syntax — argument.method()
print() will print out the argument given to the method
x = 'Hello World'
print(x)Hello World
Using print() we can also quickly find out if certain values exist or not by using in argument
print('World' in x)True
type() will return the type of the object such as str for string, int for integer, bool for boolean, etc.
type(x)stry = True
type(y)booldictionary = {'Country': 'Australia', 'Capital': 'Canberra'}
type(dictionary)dict
len() will return the number of elements in an object
sentence = "The sun shines brighter than yesterday"
lists = [1,2,3,4]
print(len(sentence))
print(len(lists))
print(len(dictionary))38
4
2
upper() method will make your string into an upper case
sentence.upper()'THE SUN SHINES BRIGHTER THAN YESTERDAY'
Consequently, lower() will make your string into lower case
sentence.lower()'the sun shines brighter than yesterday'
strip() will removes whitespaces from your string
a = " I believe I can fly. "
a.strip()'I believe I can fly.'
split() will split your string into a list.
a.split()['I', 'believe', 'I', 'can', 'fly.']
append() will add an item at the end of the list.
a_list = a.split()
a_list.append('sky')
print(a_list)['I', 'believe', 'I', 'can', 'fly.', 'sky']
remove() will remove the first item specified in the argument from the list.
a_list.remove('I')
print(a_list)['believe', 'I', 'can', 'fly.', 'sky']
You can also use pop() to remove an item at the given position in the list. If no argument specified, it will remove the last item.
a_list.pop()
print(a_list)['believe', 'I', 'can', 'fly.']a_list.pop(0)
print(a_list)['I', 'can', 'fly.']
index() method returns the index value of the argument from the list. Python starts indexing at 0
a_list.index('can')1
Dictionary objects are made up of key-value pairs. You can get all the keys and values by calling keys() and values() respectively.
dictionary.keys()dict_keys(['Country', 'Capital'])dictionary.values()dict_values(['Australia', 'Canberra'])
Adding another value in the dictionary requires you to put in this syntax dictionary[‘key’] = ‘value’
dictionary['Language'] = 'English'
dictionary{'Country': 'Australia', 'Capital': 'Canberra', 'Language': 'English'}
Depending on the object type, list() return different values. For string, it will return each characters, for a list, it will return each item, and dictionary, it will return the keys.
list(a)[' ',
' ',
' ',
'I',
' ',
'b',
'e',
'l',
'i',
'e',
'v',
'e',
' ',
'I',
' ',
'c',
'a',
'n',
' ',
'f',
'l',
'y',
'.',
' ',
' ',
' ']list(a_list)['I', 'can', 'fly.']list(dictionary)['Country', 'Capital', 'Language']
Accessing values
Depending on the object type, you access the value of an object differently.
# Build a list
a = "I believe I can fly. I believe I can touch the sky"
a_list = a.split()
print(a_list, end="\n\n====\n\n")
# Build a dictionary inside dictionary
dictionary = {'Country': 'Australia' , 'Capital': 'Canberra', 'Language': 'English', 'Currency': 'Dollar'}
print(dictionary, end="\n\n====\n\n")
# Build a DataFrame
# Define lists
country = ['Australia','New Zealand','United Kingdom','Canada','Japan','South Korea','Singapore','India']
dollar = [True, True, False, True, False, False, True, False]
# Create a dictionary
new_dict = {'Country':country, 'Dollar Currency': dollar}
# Convert to DataFrame
df = pd.DataFrame(new_dict)
print(df)['I', 'believe', 'I', 'can', 'fly.', 'I', 'believe', 'I', 'can', 'touch', 'the', 'sky']
====
{'Country': 'Australia', 'Capital': 'Canberra', 'Language': 'English', 'Currency': 'Dollar'}
====
Country Dollar Currency
0 Australia True
1 New Zealand True
2 United Kingdom False
3 Canada True
4 Japan False
5 South Korea False
6 Singapore True
7 India False
List
For list object, you simply use the syntax list[i] where i is the argument of an index number. Remember Python starts indexing with 0.
a_list[0]'I'a_list[3]'can'
Supplying negative values will return items from the end of the list.
a_list[-1]'sky'a_list[-4:-2]['can', 'touch']
When you supply a range, the last index is not counted. So if you say 2:5, it will return the item at index 2, 3, and 4.
a_list[2:5]['I', 'can', 'fly.']
Dictionary
For dictionary, you have to specify the keys to return the values of the corresponding keys, and it is case sensitive.
print(dictionary['Country'])
print(dictionary['Currency'])Australia
Dollar
Data Frame
Data Frame requires you to give a range to return a value. Supplying one index number will not work. The syntax is the same as accessing a list object.
df[0:3]
If you want to use one index number, you will need to use method iloc()
df.iloc[[1]]
list() will return the column name of the data frame
list(df)['Country', 'Dollar Currency']
Using .shape after the data frame object will tell you the number of entries/values in the data frame. The second value is the number of column in the data frame.
Alternatively, you can use .info() to get a more comprehensive summary of the data frame.
df.shape(8, 2)df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 2 columns):
Country 8 non-null object
Dollar Currency 8 non-null bool
dtypes: bool(1), object(1)
memory usage: 152.0+ bytes
Another way to look at the data frame is by calling .T attribute to show the data frame in a transpose way.
df.T
Conclusion
One of the things I found important in learning a new programming language or any language in general is getting to know key words and word types/ parts of speech such as noun, verb, pronoun, etc. In the case of Python, you want to know the popular methods and functions to get you started as well as understanding basic object such as a list, dictionary, and data frame.
As any learning process, practice will help you getting used to these key concepts and discover more vocabs as you go along.
Thanks for reading!