This python series is to teach you some of the most important python libraries for security and automation. Each iteration of this series will focus on one or more python libraries. A Library is a collection of code functions and methods that have already been written and can be used in your programs to perform various functions. Having a solid awareness of the libraries available to you is one of the most valuable aspects of programming you can have. It saves a tremendous amount of time and adds very useful features to your programs.
In order to properly follow along you will need a solid understanding of basic python data types, structures and syntax. So things like functions, strings, integers, booleans, variables, lists, dictionaries and tuples. If you have a solid understanding of what these things are, you should be good to follow along: Remember if you ever get stuck you can check my resources page for places to help improve your understanding, as well as places like youtube, stack overflow and google. To start off we will begin with the library re, which stands for Regular Expressions.
What is a Regular Expression?
A Regular Expression (regex) is a sequence of characters that you use to define a search pattern. One common way we encounter this is when we press command+f, which creates a text box and searches for words that match the letters we put in the text box.
With the re module we can create much more advanced searches for our python programs that not only match exact letters but match a specific pattern. For example we can create a script that searches for emails, phone numbers or ip addresses regardless of the values because they follow a certain pattern. Here’s a simple example below:
It starts off with importing the re module. Next, I define a variable and save the message I want to search, keep in mind this text could be any length, this is small for the purposes of an example. Next I create the regular expression using the method re.compile, I specify the pattern that I want and save it to PhoneNumRegex. Lastly I use the findall method to get the results, save it to match and print it.
In this example I only used the “\d” to represent numeric digits but Python has several different characters that represent different types of characters in text.
Python also has special characters that specify how many times a pattern needs to match:
By combining these characters in the re module you can create scripts that search for many different types of words or phrases in a text. The example below is from a script I wrote that takes the text copied to the clipboard and returns all the phones numbers and emails within that text:
I’ll explain the pyperclip module in a later tutorial but for now just know that it allows you to use the copy and paste functions on our mouse within our scripts.