professional-programming/antipatterns/code-antipatterns.md

13 KiB

Table of Contents

Antipatterns

Most of those are antipatterns in the Python programming language, but some of them might be more generic.

Strict email validation

It is almost impossible to strictly validate an email. Even if you were writing or using a regex that follows RFC5322, you would have false positives when trying to validate actual emails that don't follow the RFC.

What's more, validating an email provides very weak guarantees. A stronger, more meaningful validation would be to send an email and validate that the user received it.

To sum up, don't waste your time trying to validate an email if you don't need to (or just check that there's a @ in it). If you need to, send an email with a token and validate that the user received it.

Late returns

Returning early reduces cognitive overhead, and improve readability by killing indentation levels.

Bad:

def toast(bread):
    if bread.kind != 'brioche':
        if not bread.is_stale:
            toaster.toast(bread)

Good:

def toast(bread):
    if bread.kind == 'brioche' or bread.is_stale:
        return

    toaster.toast(bread)

Hacks comment

Bad:

# Gigantic hack (written by Louis de Funes) 04-01-2015
toaster.restart()

There's multiple things wrong with this comment:

  • Even if it is actually a hack, no need to say it in a comment. It lowers the perceived quality of a codebase and impacts developer motivation.
  • Putting the author and the date is totally useless when using source control (git blame).
  • This does not explain why it's temporary.
  • It's impossible to easily grep for temporary fixes.
  • Louis de Funès would never write a hack.

Good:

# Need to restart toaster to prevent burning bread
# TODO: replace with proper fix
toaster.restart()
  • This clearly explains the nature of the temporary fix.
  • Using TODO is an ubiquitous pattern that allows easy grepping and plays nice with most text editors.
  • The perceived quality of this temporary fix is much higher.

Repeating arguments in function name

Bad:

def get_by_color(color):
    return Toasters.filter_by(color=color)

Putting the argument name in both the function name and in arguments is, in most cases and for most interpreted languages, redundant.

Good:

def get(color=None):
    if color:
        return Toasters.filter_by(color=color)

Repeating class name in method name

Bad:

class Toasters(object):
    def get_toaster(self, toaster_id):
        pass

This is bad because it's unnecessarily redundant (Toasters.get_toaster(1)). According to the single responsibility principle, a class should focus on one area of responsibility. So the Toasters class should only focus on toasters object.

Good:

class Toasters(object):
    def get(self, toaster_id):
        pass

Which produces much more concise code:

toaster = Toasters.get(1)

Repeating function name in docstring

Bad:

def test_return_true_if_toast_is_valid():
    """Verify that we return true if toast is valid."""
    assert is_valid(Toast('brioche')) is true

Why is it bad?

  • The docstring and function name are not DRY.
  • There's no actual explanation of what valid means.

Good:

def test_valid_toast():
    """Verify that 'brioche' are valid toasts."""
    assert is_valid(Toast('brioche')) is true

Or, another variation:

def test_brioche_are_valid_toast():
    assert is_valid(Toast('brioche')) is true

Unreadable response construction

TODO

Bad:

def get_data():
    returned = {}
    if stuff:
        returned['toaster'] = 'toaster'
    if other_stuff:
        if the_other_stuff:
            returned['toast'] = 'brioche'
    else:
        returned['toast'] = 'bread'
    return returned

Good:

def get_data():
    returned = {
        'toaster': '',
        'toast': '',
    }

Undeterministic tests

When testing function that don't behave deterministically, it can be tempting to run them multiple time and average their results.

Bad:

def function():
    if random.random() > .4:
        return True
    else:
        return False


def test_function():
    number_of_true = 0
    for _ in xrange(1000):
        returned = function()
        if returned:
            number_of_true += 1

    assert 30 < number_of_true < 50

There are multiple things that are wrong with this approach:

  • This is a flaky test. Theoretically, this test could still fail.
  • This example is simple enough, but function might be doing some computationally expensive task, which would make this test severely inefficient.
  • The test is quite difficult to understand.

Good:

@mock.patch('random.random')
def test_function(mock_random):
    mock_random.return_value = 0.7
    assert function() is True

This is a deterministic test that clearly tells what's going on.

Unbalanced boilerplate

One thing to strive for in libraries is have as little boilerplate as possible, but not less.

Not enough boilerplate: you'll spend hours trying to understand specific behaviors that are too magical/implicit. You will need flexibility and you won't be able to get it. Boilerplate is useful insofar as it increases transparency.

Too much boilerplate: users of your library will be stuck using outdated patterns. Users will write library to generate the boilerplate required by your library.

I think Flask and SQLAlchemy do a very good job at keeping this under control.

Inconsistent use of verbs in functions

Bad:

def get_toasters(color):
    """Get a bunch of toasters."""
    return filter(lambda toaster: toaster.color == color, TOASTERS)


def find_toaster(id_):
    """Return a single toaster."""
    toasters = filter(lambda toaster: toaster.id == id_, TOASTERS)
    assert len(toasters) == 1
    return toasters[1]


def find_toasts(color):
    """Find a bunch of toasts."""
    return filter(lambda toast: toast.color == color, TOASTS)

The use of verb is inconsistent in this example. get is used to return a possibly empty list of toasters, and find is used to return a single toaster (or raise an exception) in the second function or a possibly empty list of toasts in the third function.

This is based on personal taste but I have the following rule:

  • get prefixes function that return at most one object (they either return none or raise an exception depending on the cases)
  • find prefixes function that return a possibly empty list (or iterable) of objects.

Good:

def find_toasters(color):
    """Find a bunch of toasters."""
    return filter(lambda toaster: toaster.color == color, TOASTERS)


def get_toaster(id_):
    """Return a single toaster."""
    toasters = filter(lambda toaster: toaster.id == id_, TOASTERS)
    assert len(toasters) == 1
    return toasters[1]


def find_toasts(color):
    """Find a bunch of toasts."""
    return filter(lambda toast: toast.color == color, TOASTS)

Opaque function arguments

A few variants of what I consider code that is difficult to debug:

def create(toaster_params):
    name = toaster_params['name']
    color = toaster_params.get('color', 'red')


class Toaster(object):

    def __init__(self, params):
        self.name = params['name']


# Probably the worst of all
def create2(*args, **kwargs):
    name = kwargs['name']

Why is this bad?

  • It's really easy to make a mistake, especially in interpreted languages such as Python. For instance, if I call create({'name': 'hello', 'ccolor': 'blue'}), I won't get any error, but the color won't be the one I expect.
  • It increases cognitive load, as I have to understand where the object is coming from to introspect its content.
  • It makes the job of static analyzer harder or impossible.

Granted, this pattern is sometimes required (for instance when the number of params is too large, or when dealing with pure data).

A better way is to be explicit:

def create(name, color='red'):
    pass  # ...

Hiding formatting

Bad:

# main.py

from utils import format_query


def get_user(user_id):
    url = get_url(user_id)
    return requests.get(url)


# utils.py


def get_url(user_id):
    return 'http://127.0.0.1/users/%s' % user_id

I consider this an antipattern because it hides the request formatting from the developer, making it more complex to see what url look like. In this extreme example, the formatting function is a one-liner which sounds a bit overkill for a function.

Good:

def get_user(user_id):
    url = 'http://127.0.0.1/users/%s' % user_id
    return requests.get(url)

Even if you were duplicating the logic once or twice it might still be fine, because:

  • You're unlikely to re-use anywhere else outside this file.
  • Putting this inline makes it easier for follow the flow. Code is written to be read primarily by computers.

Returning nothing instead of raising NotFound exception

Bad in certain cases:

def get_toaster(toaster_id):
    try:
        return do_get_toaster(toaster_id)
    except NotFound:
        return None


def toast(toaster_id):
    toaster = get_toaster(toaster_id)
    ...
    toaster.toast("brioche")

It all depends on the caller, but in this cases I'd argue that it is bad practice to return nothing when the toaster identified by toaster_id can't be found, for two main reasons.

First reason: when we provide an identifier, we expect it to return something. Once again, this depends on the caller (for instance, we could try to see if a user exists by checking an email for instance). In this simple example it's ok because the toaster.toast() will fail immediately, but what if we were never calling it and creating some other unrelated objects? We would be doing things that we should never be doing if the object did not exist:

def toast(toaster_id, user):
    toaster = get_toaster(toaster_id)
    # We should never do this! The toaster might not even exists.
    send_welcome_email(user)
    bill_new_toaster(user)

Second reason: toaster.toast will fail anyway if toaster is none (in Python with AttributeError: NoneType has no attribute toast). In this abstract example it's ok because the two lines are next to each other, but the actual toaster.toast() call might happen further down the stack - and it will be very difficult for the developer to understand where the error is coming from.

def toast(toaster_id, user):
    toaster = get_toaster(toaster_id)
    do_stuff_a(toaster)


def do_stuff_a(toaster):
    ...
    do_stuff_b(toaster)
    ...


def do_stuff_b(toaster):
    # Here is the actual call where toaster is called - but we should
    # have failed early if it's not there.
    toaster.toast()

What's the correct things to do?

  • If you expect the object to be there, make sure to raise if you don't find it.
  • If you're using SQLAlchemy, use one() to force raising an exception if the object can't be found. Don't use first or one_or_none().

Having a library that contains all utils

Bad:

def get_current_date():
    ...


def create_csv(...):
    ...


def upload_to_sftp(...):
    ...

util or tools or lib modules that contain all sorts of utilities have a tendency to become bloated and unmaintainable. Prefer to have small, dedicated files.

This will keep your imports logical (lib.date_utils, lib.csv_utils, lib.sftp), make it easier for the reader to identify all the utilities around a specific topic, and test files easy to keep organized.