Codiga has joined Datadog!

Read the Blog·

Interested in our Static Analysis?

Sign up
← All posts
Khalid Khan Saturday, June 25, 2022

Python security pitfalls and how to avoid them



Khalid Khan, Developer Relations Engineer

Khalid is the Developer Relations Engineer at Codiga. He is passionate about Software Engineering. Startups and Developer Advocacy. He is also an MLH Coach and Organizer & Member to numerous Hackathons & Developer communities

See all articles

Programming in Python is not as easy as its syntax. Python is a trusted language in terms of security state due to its standard libraries and frameworks. Python is a widespread programming language used in machine learning and artificial intelligence. Well, certain features in Python can be misused by developers. These may result in high-level security breaches, leading to you losing your job…. just kidding!. Here are the top five security pitfalls in Python that you should watch out for.

We will also discuss setting up automated code reviews to avoid these security pitfalls.

Benefits of avoiding security pitfalls

  • It will protect your business
  • Protects personal information
  • Inspires customer confidence

Unlawful/Unsafe deserialization with pickle

Pickle is a module provided by Python for serialization & de-serialization of Python-specific objects like lists & tuples. It is also used to store objects in Python. Suppose you do not know what serialization is. In that case, it is a process of converting Python objects like lists and tuples into byte-streams that can be transferred over a network & vice-versa for de-serialization.

This is a fantastic module unless de-serialization occurs from an un-trusted or infected source. Which we believe is a significant risk and security concern.

Below is an example of serialization in pickles

import pickle
with open('data.pkl', 'wb') as f:
  pickle.dump(data, f)

This issue can be solved by avoiding pickles and using JSON or similar options for serialization/deserialization.

Here is an example of serialization in JSON

import json
# Writing a JSON file
with open('data.json', 'w') as f:
  json.dump(data, f)
# Reading a JSON file
with open('data.json', 'r') as f:
  data = json.load(f)

Avoid assert statement

Asserts statements in Python are beneficial for testing. The statement lets you check if the statement is true or, in simple words, the truthiness of a condition. It throws an error & returned it to the user.

Using asserts in your production python code is the biggest mistake. The reason is that the assert statement only works if the __debug__ constant is true, usually during testing. If the constant is false, the compiler removes all asserts in the code, exposing the code to vulnerabilities.

def share_data(user, command):
  assert user.has_permissions(command), f'{user} is not authorized'

If the above code goes into production, it will run with optimization; that is when the compiler will remove all the assertions, skipping to the main command, i.e., the secure code. This may be the case when the secure code is executed even though the user is not authenticated.

To avoid this issue, the best thing is to go back to ifs & else. Here is the same program with if and else; in this case, the condition won’t be removed even if the __debug__is set to false. Hence guarding the code.

Here is the example with the correct code:

def share_data(user, command):
 if user.has_permissions(command): # safer as it will not be removed by compiler
  raise PermissionError(f'{user} is not authorized') # suitable error class

XML parsing

XML is a markup language like HTML. At the same time, HTML acts as a skeleton of the application, while XML defines the application's structure. It uses various texts and tags to describe components in a file. Using XML from structurization is an excellent thing to do, but this exposes a risk to multiple security threats like Denial of Service (DoS) or XML External Entity Injection (XXE) Attacks.

Here is an example of incorrect code:

from xml.etree.ElementTree import parse
et = parse(<xml>)

This issue can be avoided using a safety wrapper library for XML such as defusedxml. These libraries will help you write safe code.

Example of correct code:

from defusedxml.ElementTree import parse
et = parse(<xml>)

Dynamic code execution

There are multiple ways a programmer can process the strings passed to them as Python code. There are two functions that most programmers believe to be dangerous. Both the functions are pretty much the same thing. eval/exec are bad practices because they can be abused to do tasks that aren't needed, leading to potential security issues.

This issue can be avoided by limiting the access to eval by either passing it via a variable or providing limited access to the global variables. Although eval and exec are almost the same, you can also pass a statement through eval, exposing most of the data at stake.

eval('print("I am a Danger")') # prints "I am a Danger", returns None

These functions can be used to execute virtually any code in python, which makes things go horror. Here is how you can limit access to eval using a variable.

x = 10
eval('x *10 ')
eval('x + 5', { 'x': 2 })
eval('x + 5', { 'x': 2 }, { 'x': 1 })

Use of unsafe YAML load.

yaml.load is a function from the PyYAML package in python. This function allows you to construct an arbitrary Python object based on the YAML file. This seems safe unless the function is fed with an unsafe YAML file or a file from an untrusted source. This issue can be solved by using the safe_load function.

Below is an example of both the functions:

def load(stream, Loader=None):
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    if Loader is None:
        Loader = FullLoader

    loader = Loader(stream)
        return loader.get_single_data()

Here is the code for the safe_load function

def safe_load(stream):
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    Resolve only basic YAML tags. This is known
    to be safe for untrusted input.
    return load(stream, SafeLoader)

How to check insecure code?

code review

To avoid these issues and keep track of all the best practices, one should set up Automated Code Reviews. With Codiga, you can review all the changes made in the code and get reviews on the code quality and best practices.

Automated Code Reviews require minimum or almost zero human efforts to get time-to-time code review and analysis. It can be easily integrated with most code collaboration tools like Github, Gitlab, and Bitbucket.

Here are a list of rules you should consider while coding in Python.

Try Codiga for Free

How to set up Automated Code Reviews

Codiga provides a platform that can be easily integrated into your current workflows; At the same time, there are multiple platforms Codiga can be connected with; you can find the steps to integrate Codiga with Github.

All you have to do is configure our Github Application. We are available on Bitbucket & Gitlab as well.

Step 1: Click on configure on our Github Application


Step 2: Select & login into your Github Account


Step 3: Select the repository you want to provide access to

Select Repo

And you are ready to get seamless Automated Code reviews and Code Analysis. Codiga can be configured with all other applications.

Wrapping up

We had an overview of the blog post one should be very careful using multiple objects and functions in python as it may result in data breaches or security attacks. One should follow best practices and avoid getting trapped in Security pitfalls.

The easiest way to avoid just issues is to get regular code reviews so one can know and prevent such mistakes in the future. Code reviews can be manual and automated. Automated code reviews have a machine's eye and require less human effort. Platforms like Codiga are helping developers get automated code reviews and prevent the risks of their code.

Important links

Are you interested in Datadog Static Analysis?

Sign up