Add antipatterns repo

This commit is contained in:
Charles-Axel Dein 2020-07-21 09:02:44 +02:00
parent 311255c312
commit 6f941a8ab3
13 changed files with 1686 additions and 0 deletions

18
antipatterns/README.md Normal file
View File

@ -0,0 +1,18 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Antipatterns](#antipatterns)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Antipatterns
============
This is a list of antipatterns.
* [Code antipatterns](./code-antipatterns.md)
* [Python antipatterns](./python-antipatterns.md)
* [SQLAlchemy antipatterns](./sqlalchemy-antipatterns.md)
* [Error handling antipatterns](./error-handling-antipatterns.md)
* [Tests antipatterns](./tests-antipatterns.md)

View File

@ -0,0 +1,481 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Antipatterns](#antipatterns)
- [Strict email validation](#strict-email-validation)
- [Late returns](#late-returns)
- [Hacks comment](#hacks-comment)
- [Repeating arguments in function name](#repeating-arguments-in-function-name)
- [Repeating class name in method name](#repeating-class-name-in-method-name)
- [Repeating function name in docstring](#repeating-function-name-in-docstring)
- [Unreadable response construction](#unreadable-response-construction)
- [Undeterministic tests](#undeterministic-tests)
- [Unbalanced boilerplate](#unbalanced-boilerplate)
- [Inconsistent use of verbs in functions](#inconsistent-use-of-verbs-in-functions)
- [Opaque function arguments](#opaque-function-arguments)
- [Hiding formatting](#hiding-formatting)
- [Returning nothing instead of raising NotFound exception](#returning-nothing-instead-of-raising-notfound-exception)
- [Having a library that contains all utils](#having-a-library-that-contains-all-utils)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Antipatterns
Most of those are antipatterns in the Python programming language, but some of
them might be more generic.
Strict email validation
-----------------------
It is almost impossible to strictly validate an email. Even if you were writing
or using a regex that follows
[RFC5322](http://tools.ietf.org/html/rfc5322#section-3.4), you would have false
positives when trying to validate actual emails that don't follow the RFC.
What's more, validating an email provides very weak guarantees. A stronger,
more meaningful validation would be to send an email and validate that the user
received it.
To sum up, don't waste your time trying to validate an email if you don't need
to (or just check that there's a `@` in it). If you need to, send an email with
a token and validate that the user received it.
Late returns
------------
Returning early reduces cognitive overhead, and improve readability by killing
indentation levels.
Bad:
```python
def toast(bread):
if bread.kind != 'brioche':
if not bread.is_stale:
toaster.toast(bread)
```
Good:
```python
def toast(bread):
if bread.kind == 'brioche' or bread.is_stale:
return
toaster.toast(bread)
```
Hacks comment
-------------
Bad:
```python
# Gigantic hack (written by Louis de Funes) 04-01-2015
toaster.restart()
```
There's multiple things wrong with this comment:
* Even if it is actually a hack, no need to say it in a comment. It lowers the
perceived quality of a codebase and impacts developer motivation.
* Putting the author and the date is totally useless when using source control
(`git blame`).
* This does not explain why it's temporary.
* It's impossible to easily grep for temporary fixes.
* [Louis de Funès](https://en.wikipedia.org/wiki/Louis_de_Fun%C3%A8s) would never
write a hack.
Good:
```python
# Need to restart toaster to prevent burning bread
# TODO: replace with proper fix
toaster.restart()
```
* This clearly explains the nature of the temporary fix.
* Using `TODO` is an ubiquitous pattern that allows easy grepping and plays
nice with most text editors.
* The perceived quality of this temporary fix is much higher.
## Repeating arguments in function name
Bad:
```python
def get_by_color(color):
return Toasters.filter_by(color=color)
```
Putting the argument name in both the function name and in arguments is, in
most cases and for most interpreted languages, redundant.
Good:
```python
def get(color=None):
if color:
return Toasters.filter_by(color=color)
```
## Repeating class name in method name
Bad:
```python
class Toasters(object):
def get_toaster(self, toaster_id):
pass
```
This is bad because it's unnecessarily redundant (`Toasters.get_toaster(1)`). According to the single responsibility principle, a class should focus on one area of responsibility. So the `Toasters` class should only focus on toasters object.
Good:
```python
class Toasters(object):
def get(self, toaster_id):
pass
```
Which produces much more concise code:
```
toaster = Toasters.get(1)
```
Repeating function name in docstring
------------------------------------
Bad:
```python
def test_return_true_if_toast_is_valid():
"""Verify that we return true if toast is valid."""
assert is_valid(Toast('brioche')) is true
```
Why is it bad?
* The docstring and function name are not DRY.
* There's no actual explanation of what valid means.
Good:
```python
def test_valid_toast():
"""Verify that 'brioche' are valid toasts."""
assert is_valid(Toast('brioche')) is true
```
Or, another variation:
```python
def test_brioche_are_valid_toast():
assert is_valid(Toast('brioche')) is true
```
Unreadable response construction
--------------------------------
TODO
Bad:
```python
def get_data():
returned = {}
if stuff:
returned['toaster'] = 'toaster'
if other_stuff:
if the_other_stuff:
returned['toast'] = 'brioche'
else:
returned['toast'] = 'bread'
return returned
```
Good:
```python
def get_data():
returned = {
'toaster': '',
'toast': '',
}
```
Undeterministic tests
---------------------
When testing function that don't behave deterministically, it can be tempting
to run them multiple time and average their results.
Bad:
```python
def function():
if random.random() > .4:
return True
else:
return False
def test_function():
number_of_true = 0
for _ in xrange(1000):
returned = function()
if returned:
number_of_true += 1
assert 30 < number_of_true < 50
```
There are multiple things that are wrong with this approach:
* This is a flaky test. Theoretically, this test could still fail.
* This example is simple enough, but `function` might be doing some
computationally expensive task, which would make this test severely
inefficient.
* The test is quite difficult to understand.
Good:
```python
@mock.patch('random.random')
def test_function(mock_random):
mock_random.return_value = 0.7
assert function() is True
```
This is a deterministic test that clearly tells what's going on.
Unbalanced boilerplate
----------------------
One thing to strive for in libraries is have as little boilerplate as possible,
but not less.
Not enough boilerplate: you'll spend hours trying to understand specific
behaviors that are too magical/implicit. You will need flexibility and you
won't be able to get it. Boilerplate is useful insofar as it increases
[transparency](http://www.catb.org/esr/writings/taoup/html/ch01s06.html).
Too much boilerplate: users of your library will be stuck using outdated
patterns. Users will write library to generate the boilerplate required by your
library.
I think Flask and SQLAlchemy do a very good job at keeping this under control.
Inconsistent use of verbs in functions
--------------------------------------
Bad:
```python
def get_toasters(color):
"""Get a bunch of toasters."""
return filter(lambda toaster: toaster.color == color, TOASTERS)
def find_toaster(id_):
"""Return a single toaster."""
toasters = filter(lambda toaster: toaster.id == id_, TOASTERS)
assert len(toasters) == 1
return toasters[1]
def find_toasts(color):
"""Find a bunch of toasts."""
return filter(lambda toast: toast.color == color, TOASTS)
```
The use of verb is inconsistent in this example. `get` is used to return
a possibly empty list of toasters, and `find` is used to return a single
toaster (or raise an exception) in the second function or a possibly empty list
of toasts in the third function.
This is based on personal taste but I have the following rule:
* `get` prefixes function that return at most one object (they either return
none or raise an exception depending on the cases)
* `find` prefixes function that return a possibly empty list (or iterable) of
objects.
Good:
```python
def find_toasters(color):
"""Find a bunch of toasters."""
return filter(lambda toaster: toaster.color == color, TOASTERS)
def get_toaster(id_):
"""Return a single toaster."""
toasters = filter(lambda toaster: toaster.id == id_, TOASTERS)
assert len(toasters) == 1
return toasters[1]
def find_toasts(color):
"""Find a bunch of toasts."""
return filter(lambda toast: toast.color == color, TOASTS)
```
Opaque function arguments
-------------------------
A few variants of what I consider code that is difficult to debug:
```python
def create(toaster_params):
name = toaster_params['name']
color = toaster_params.get('color', 'red')
class Toaster(object):
def __init__(self, params):
self.name = params['name']
# Probably the worst of all
def create2(*args, **kwargs):
name = kwargs['name']
```
Why is this bad?
* It's really easy to make a mistake, especially in interpreted languages such
as Python. For instance, if I call `create({'name': 'hello', 'ccolor':
'blue'})`, I won't get any error, but the color won't be the one I expect.
* It increases cognitive load, as I have to understand where the object is
coming from to introspect its content.
* It makes the job of static analyzer harder or impossible.
Granted, this pattern is sometimes required (for instance when the number of
params is too large, or when dealing with pure data).
A better way is to be explicit:
```python
def create(name, color='red'):
pass # ...
```
Hiding formatting
-----------------
Bad:
```python
# main.py
from utils import format_query
def get_user(user_id):
url = get_url(user_id)
return requests.get(url)
# utils.py
def get_url(user_id):
return 'http://127.0.0.1/users/%s' % user_id
```
I consider this an antipattern because it hides the request formatting from the developer, making it more complex to see what `url` look like. In this extreme example, the formatting function is a one-liner which sounds a bit overkill for
a function.
Good:
```python
def get_user(user_id):
url = 'http://127.0.0.1/users/%s' % user_id
return requests.get(url)
```
Even if you were duplicating the logic once or twice it might still be fine, because:
* You're unlikely to re-use anywhere else outside this file.
* Putting this inline makes it easier for follow the flow. Code is written to be read primarily by computers.
## Returning nothing instead of raising NotFound exception
Bad in certain cases:
```python
def get_toaster(toaster_id):
try:
return do_get_toaster(toaster_id)
except NotFound:
return None
def toast(toaster_id):
toaster = get_toaster(toaster_id)
...
toaster.toast("brioche")
```
It all depends on the caller, but in this cases I'd argue that it is bad practice to return nothing when the toaster identified by `toaster_id` can't be found, for two main reasons.
**First reason**: when we provide an identifier, we expect it to return something. Once again, this depends on the caller (for instance, we could try to see if a user exists by checking an email for instance). In this simple example it's ok because the `toaster.toast()` will fail immediately, but what if we were never calling it and creating some other unrelated objects? We would be doing things that we should never be doing if the object did not exist:
```python
def toast(toaster_id, user):
toaster = get_toaster(toaster_id)
# We should never do this! The toaster might not even exists.
send_welcome_email(user)
bill_new_toaster(user)
```
**Second reason**: `toaster.toast` will fail anyway if `toaster` is none (in Python with `AttributeError: NoneType has no attribute toast`). In this abstract example it's ok because the two lines are next to each other, but the actual `toaster.toast()` call might happen further down the stack - and it will be very difficult for the developer to understand where the error is coming from.
```python
def toast(toaster_id, user):
toaster = get_toaster(toaster_id)
do_stuff_a(toaster)
def do_stuff_a(toaster):
...
do_stuff_b(toaster)
...
def do_stuff_b(toaster):
# Here is the actual call where toaster is called - but we should
# have failed early if it's not there.
toaster.toast()
```
What's the correct things to do?
* If you expect the object to be there, make sure to raise if you don't find it.
* If you're using SQLAlchemy, use `one()` to force raising an exception if the object can't be found. Don't use `first` or `one_or_none()`.
## Having a library that contains all utils
Bad:
```python
def get_current_date():
...
def create_csv(...):
...
def upload_to_sftp(...):
...
```
`util` or `tools` or `lib` modules that contain all sorts of utilities have a tendency to become bloated and unmaintainable. Prefer to have small, dedicated files.
This will keep your imports logical (`lib.date_utils`, `lib.csv_utils`, `lib.sftp`), make it easier for the reader to identify all the utilities around a specific topic, and test files easy to keep organized.

View File

@ -0,0 +1,30 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Database anti-patterns](#database-anti-patterns)
- [Using `VARCHAR` instead of `TEXT` (PostgreSQL)](#using-varchar-instead-of-text-postgresql)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Database anti-patterns
======================
Using `VARCHAR` instead of `TEXT` (PostgreSQL)
----------------------------------------------
Unless you absolutely restrict the width of a text column for data consistency
reason, don't do it.
This
[benchmark](http://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/)
shows that there's fundamentally no difference in performance between
`char(n)`, `varchar(n)`, `varchar` and `text`. Here's why you should pick
`text`:
* `char(n)`: takes more space than necessary when dealing with values shorter
than n.
* `varchar(n)`: it's difficult to change the width.
* `varchar` is just like `text`.
* `text` does not have the width problem that `char(n)` and `varchar(n)` and
has a cleaner name than `varchar`.

View File

@ -0,0 +1,331 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Error handling anti-patterns](#error-handling-anti-patterns)
- [Hiding exceptions](#hiding-exceptions)
- [Raising unrelated/unspecific exception](#raising-unrelatedunspecific-exception)
- [Unconstrained defensive programming](#unconstrained-defensive-programming)
- [Unnecessarily catching and re-raising exceptions](#unnecessarily-catching-and-re-raising-exceptions)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Error handling anti-patterns
Hiding exceptions
-----------------
There are multiple variations of this anti-pattern:
```python
# Silence all exceptions
def toast(bread):
try:
toaster = Toaster()
toaster.insert(bread)
toaster.toast()
except:
pass
# Silence some exceptions
def toast(bread):
try:
toaster = Toaster()
toaster.insert(bread)
toaster.toast()
except ValueError:
pass
```
It depends on the context but in most cases this is a bad pattern:
* **Debugging** those silent errors will be really difficult, because they won't show up in logs and exception reporting tool such as Sentry. Say you have an undefined variable in `Toaster.insert()`: it will raise `NameError`, which will be caught, and ignored, and you will never know about this developer error.
* **The user experience** will randomly degrade without anybody knowing about it, including the user.
* **Identifying** those errors will be impossible. Say `do_stuff` does an HTTP request to another service, and that service starts misbehaving. There won't be any exception, any metric that will let you identify it.
An article even named this [the most diabolical Python antipattern](https://realpython.com/blog/python/the-most-diabolical-python-antipattern/).
The following full example:
```python
from collections import namedtuple
Bread = namedtuple('Bread', 'color')
class ToastException(Exception):
pass
def toast(bread):
try:
put_in_toaster(bread)
except:
raise ToastException('Could not toast bread')
def put_in_toaster(bread):
brad.color = 'light_brown' # Note the typo
toast(Bread('yellow'))
```
Will raise this cryptic and impossible to debug error:
```
Traceback (most recent call last):
File "python-examples/reraise_exceptions.py", line 19, in <module>
toast(Bread('yellow'))
File "python-examples/reraise_exceptions.py", line 12, in toast
raise ToastException('Could not toast bread')
__main__.ToastException: Could not toast bread
```
Sometime it's tempting to think that graceful degradation is about silencing
exception. It's not.
* Graceful degradation needs to happen at the **highest level** of the code, so that the user can get a very explicit error message (e.g. "we're having issues with X, please retry in a moment"). That requires knowing that there was an error, which you can't tell if you're silencing the exception.
* You need to know when graceful degradation happens. You also need to be
alerted if it happens too often. This requires adding monitoring (using
something like statsd) and logging (Python's `logger.exception` automatically
adds the exception stacktrace to the log message for instance). Silencing an
exception won't make the error go away: all things being equal, it's better
for something to break hard, than for an error to be silenced.
* It is tempting to confound silencing the exception and fixing the exception. Say you're getting sporadic timeouts from a service. You might thing: let's ignore those timeouts and just do something else, like return an empty response. But this is very different from (1) actually finding the root cause for those timeouts (e.g. maybe a specific edge cases impacting certain objects) (2) doing proper graceful degradation (e.g. asking users to retry later because the request failed).
In other words, ask yourself: would it be a problem if every single action was failing? If you're silencing the error, how would you know it's happening for every single action?
Here's a number a better ways to do this:
**Log and create metrics**
```python
import statsd
def toast(bread):
# Note: no exception handling here.
toaster = Toaster()
toaster.insert(bread)
toaster.toast()
def main():
try:
toast('brioche')
except:
logger.exception('Could not toast bread')
statsd.count('toast.error', 1)
```
**Very important**: adding logging and metrics won't not sufficient if it's difficult to consume them. They won't help the developer who's debugging. There needs to be automatic alerting associated to those stats. The logs needs to be surfaced somewhere, ideally next to the higher exception (e.g. let's our `main` function above is used in a web interface - the web interface could say "additionally, the following logs were generated and display the log). For instance, Sentry can be configured to surface `logger.exception` errors the same way exception are surfaced. Otherwise the developer will still have to read the code to understand what's going on. Also - this won't work with sporadic errors. Those needs to be dealt with properly, and until then, it's better to let them go to your usual alerting tool.
Note that knowing where to catch is very important too. If you're catching
inside the `toast` function, you might be hiding things a caller would need to
know. Since this function is not returning anything, how would you make the
difference between a success and a failure? You can't. That's why you want to
let it raise, and catch only in the caller, where you have the context to know
how you'll handle the exception.
**Re-raise immediately**
```python
import statsd
def toast(bread):
try:
toaster = Toaster()
toaster.insert(bread)
toaster.toast()
except:
raise ToastingException('Could not toast bread %r' % bread)
def main():
# Note: no exception handling here
toast('brioche')
```
**Be very specific about the exception that are caught**
```python
import statsd
def toast(bread):
try:
toaster = Toaster()
except ValueError:
# Note that even though we catch, we're still logging + creating
# a metric
logger.exception('Could not get toaster')
statsd.count('toaster.instantiate.error', 1)
return
toaster.insert(bread)
toaster.toast()
def main():
toast('brioche')
```
Raising unrelated/unspecific exception
--------------------------------------
Most languages have predefined exceptions, including Python. It is important to make sure that the right exception is raised from a semantic standpoint.
Bad:
```python
def validate(toast):
if isinstance(toast, Brioche):
# RuntimeError is too broad
raise RuntimeError('Invalid toast')
def validate(toast):
if isinstance(toast, Brioche):
# SystemError should only be used for internal interpreter errors
raise SystemError('Invalid toast')
```
Good:
```python
def validate(toast):
if isinstance(toast, Brioche):
raise TypeError('Invalid toast')
```
`TypeError` is here perfectly meaningful, and clearly convey the context around the error.
## Unconstrained defensive programming
While defensive programming can be a very good technique to make the code more resilient, it can seriously backfire when misused. This is a very similar anti-pattern to carelessly silencing exceptions (see about this anti-pattern in this document).
One example is to handle an edge case as a generic case at a very low level. Consider the following example:
```python
def get_user_name(user_id):
url = 'http://127.0.0.1/users/%s' % user_id
response = requests.get(url)
if response.status == 404:
return 'unknown'
return response.data
```
While this may look like a very good example of defensive programming (we're returning `unknown` when we can't find the user), this can have terrible repercussions, very similar to the one we have when doing an unrestricted bare `try... except`:
* A new developer might not know about this magical convention, and assume that `get_user_name` is guaranteed to return a true user name.
* The external service that we're getting user name from might start failing, and returning 404. We would silently return 'unknown' as a user name for all users, which could have terrible repercussions.
A much cleaner way is to raise an exception on 404, and let the caller decide how it wants to handle users that are not found.
Unnecessarily catching and re-raising exceptions
------------------------------------------------
Bad:
```python
def toast(bread):
try:
put_in_toaster(bread)
except InvalidBreadException:
raise ToastException('Could not toast')
```
Side note: an unconditional exception catching is considered even worse (see [hiding exceptions](https://github.com/charlax/antipatterns/blob/master/code-antipatterns.md#hiding-exceptions)).
This is a better pattern because we explicitly state what happened in the exception message:
```python
def toast(bread):
try:
put_in_toaster(bread)
except InvalidBreadType as e:
raise ToastException('Cannot toast this bread type')
```
If we need to do some cleanup or extra logging, it's better to just raise the original exception again. The developer will know exactly what happened.
```python
def toast(bread):
try:
put_in_toaster(bread)
except:
print 'Got exception while trying to toast'
raise # note the absence of specific exception
```
Here's what would happen:
```
Got exception while trying to toast
Traceback (most recent call last):
File "reraise_exceptions_good.py", line 20, in <module>
toast(Bread('yellow'))
File "reraise_exceptions_good.py", line 10, in toast
put_in_toaster(bread)
File "reraise_exceptions_good.py", line 17, in put_in_toaster
brad.color = 'light_brown' # Note the typo
NameError: global name 'brad' is not defined
```
Another way to show how absurd the anti-pattern becomes at scale is through an example:
```python
def call_1():
try:
call_2()
except Call2Exception:
raise Call1Exception()
def call_2():
try:
call_3()
except Call3Exception:
raise Call2Exception()
def call_3():
...
```
Another problem with this pattern is that you can consider it quite useless to do all of this catch/re-raise, since you will still need to catch at the end. In other words:
> Error handling, and recovery are best done at the outer layers of your code base. This is known as the end-to-end principle. The end-to-end principle argues that it is easier to handle failure at the far ends of a connection than anywhere in the middle. If you have any handling inside, you still have to do the final top level check. If every layer atop must handle errors, so why bother handling them on the inside?
*[Write code that is easy to delete, not easy to extend](http://programmingisterrible.com/post/139222674273/write-code-that-is-easy-to-delete-not-easy-to)*
A better way:
```python
# This is the highest level function where we have enough
# context to know how to handle the exceptions
def call_1():
try:
call_2()
except Call2Exception:
# handle it...
pass
except Call3Exception:
# handle it...
pass
def call_2():
# Do not handle anything here.
call_3()
def call_3():
...
```
More resources:
* [Hiding exceptions](https://github.com/charlax/antipatterns/blob/master/code-antipatterns.md#hiding-exceptions)) anti-pattern.

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,115 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [MVCS Antipatterns](#mvcs-antipatterns)
- [Creating entities for association tables](#creating-entities-for-association-tables)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
MVCS Antipatterns
=================
In simple terms, Model-View-Controller-Services add a few more layers to the
MVC pattern. The main one is the service, which owns all the core business
logic and manipulate the repository layer.
Creating entities for association tables
----------------------------------------
You'll often need association tables, for instance to set up a many to many
relationships between users and their toasters. Let's assume that a toaster can
be owned by multiple users.
It might be tempting to create a `UserToaster` entity for this relationship,
especially if this relationship has some complex attributes associated with
(for instance, since when the toaster is owned by the user).
Let me pull a few quotes from the [Domain Driven
Design](http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215) by Eric Evans:
> Design a portion of the software system to reflect the domain model in a very
> literal way, so that mapping is obvious.
> Object-oriented programming is powerful because it is based on a modeling
> paradigm, and it provides implementations of the model constructs. As far as
> the programmer is concerned, objects really exist in memory, they have
> associations with other objects, they are organized into classes, and they
> provide behavior available by messaging.
> Does an object represent something with continuity and identity— something
> that is tracked through different states or even across different
> implementations? Or is it an attribute that describes the state of something
> else? This is the basic distinction between an ENTITY and a VALUE OBJECT.
> Defining objects that clearly follow one pattern or the other makes the
> objects less ambiguous and lays out the path toward specific choices for
> robust design.
Evans, Eric (2003-08-22). Domain-Driven Design: Tackling Complexity in the
Heart of Software. Pearson Education. Kindle Edition.
Entities should model business processes, not persistence details
([source](http://blog.sapiensworks.com/post/2013/05/13/7-Biggest-Pitfalls-When-Doing-Domain-Driven-Design.aspx/)).
* In that case, `UserToaster` does not map to anything the business is using.
Using plain English, somebody might ask about "what toasters does user
A owns?" or "who owns toaster B and since when?" Nobody would ask "give me
the UserToaster for user A".
* The association table can be considered an implementation detail that should
not (in most cases) leak in the domain layer. All the code should be dealing
with the simpler logic of "user having toasters", not UserToaster objects
being an association between a user and a toaster. This makes the code more
intuitive and natural.
* It will be easier to handle serializing a "user having toasters" than
serializing UserToaster association.
* This will make it very easy to force the calling site to take care of some
business logic. For instance, you might be able to get all `UserToaster`, and
then filter on whether they were bought. You might be tempted to do that by
going through the `UserToaster` object and filtering those that have
`were_bought` to be True. At some point, you might be doing the same thing in
multiple places, which will decrease maintainability. If you were hiding that
logic in the repository, you wouldn't have that issue `find_bought_toasters`.
So in that case, I would recommend doing the following:
* Create a `User` and `Toaster` entity.
* Put the association properties on the entity that makes sense, for instance
`owned_since` would be on `Toaster`, even though in the database it's stored
on the association table.
* If filtering on association properties is required, put this logic in
repositories. In plain English, you would for instance ask "give all the
toasters user A owned in December?", you wouldn't ask "give be all the
UserToaster for owned by user A in December".
Note that Domain Driver Design recommends avoiding many-to-many relationships
altogether:
> In real life, there are lots of many-to-many associations, and a great number
> are naturally bidirectional. The same tends to be true of early forms of
> a model as we brainstorm and explore the domain. But these general
> associations complicate implementation and maintenance. Furthermore, they
> communicate very little about the nature of the relationship.
> There are at least three ways of making associations more tractable.
> 1. Imposing a traversal direction
> 2. Adding a qualifier, effectively reducing multiplicity
> 3. Eliminating nonessential associations
Evans, Eric (2003-08-22). Domain-Driven Design: Tackling Complexity in the
Heart of Software (Kindle Locations 1642-1647). Pearson Education. Kindle
Edition.
Imposing a traversal direction is probably the best solution. In our example,
it's not immediately evident which direction is the right one (a toaster being
owned by a user, or a user owning a toasters), because that depends on what
this application is doing. If we're working on an app that lets a connected
user see their toasters, then we would force the relationship to be
unidirectional user->toasters.
Sources:
* [7 Biggest Pitfalls When Doing Domain Driven
Design](http://blog.sapiensworks.com/post/2013/05/13/7-Biggest-Pitfalls-When-Doing-Domain-Driven-Design.aspx/)
* [Domain-Driven Design: Tackling Complexity in the Heart of
Software](http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215)

View File

@ -0,0 +1,259 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Python Antipatterns](#python-antipatterns)
- [Redundant type checking](#redundant-type-checking)
- [Restricting version in setup.py dependencies](#restricting-version-in-setuppy-dependencies)
- [Unwieldy if... else instead of dict](#unwieldy-if-else-instead-of-dict)
- [Overreliance on kwargs](#overreliance-on-kwargs)
- [Overreliance on list/dict comprehensions](#overreliance-on-listdict-comprehensions)
- [Mutable default arguments](#mutable-default-arguments)
- [Using `is` to compare objects](#using-is-to-compare-objects)
- [Instantiating exception with a dict](#instantiating-exception-with-a-dict)
- [Not strictly pinning all packages](#not-strictly-pinning-all-packages)
- [Reference](#reference)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Python Antipatterns
===================
Redundant type checking
-----------------------
Bad:
```python
def toast(bread):
if bread is None:
raise TypeError('Need bread to toast.')
if bread.is_toastable:
toaster.toast(bread)
```
In this case, checking against `None` is totally useless because in the next
line, `bread.is_toastable` would raise `AttributeError: 'NoneType' object has
no attribute 'is_toastable'`. This is not a general rule, but in this case
I would definitely argue that adding the type checks hurts readability and adds
very little value to the function.
Good:
```python
def toast(bread):
if bread.is_toastable:
toaster.toast(bread)
```
Restricting version in setup.py dependencies
--------------------------------------------
Read those articles first:
* [setup.py vs.
requirements.txt](https://caremad.io/2013/07/setup-vs-requirement/)
* [Pin Your Packages](http://nvie.com/posts/pin-your-packages/)
* [Better Package Management](http://nvie.com/posts/better-package-management/)
**Summary: The main point is that `setup.py` should not specify explicit version
requirements (good: `flask`, bad: `flask==1.1.1`).**
In a few words, if library `lib1` requires `flask==1.1.1` and library `lib2`
requires `flask==1.1.2`, then you'll have a conflict and won't be able to use
them both in application `app`.
Yet in 99.999% of the cases, you don't need a specific version of flask, so:
* `lib1` should just require `flask` in `setup.py` (no version specified, not
even with inequality operators: `flask<=2` is bad for instance)
* `lib2` should just require `flask` in `setup.py` (same)
`app` will be happy using `lib1` and `lib2` with whatever version of `flask` it
wants.
`app`'s `requirements.txt` should be as specific as possible, ideally
strictly pinning (`==`) every single dependency. This way the app's stability
will be very predictable, because always the same packages version will be
installed.
Usually apps only use `requirements.txt`, not `setup.py`, because `pip install
-r requirements.txt` is used when deploying.
The only exception for pinning a dependency in a library is in case of a known
incompatibility, but again this should be a very temporary move, because that
will prevent people from upgrading.
Ruby has a pretty similar dichotomy with [Gemspec and
gemfile](http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/).
Unwieldy if... else instead of dict
-----------------------------------
Bad:
```python
import operator as op
def get_operator(value):
"""Return operator function based on string.
e.g. ``get_operator('+')`` returns a function which takes two arguments
and return the sum of them.
"""
if value == '+':
return op.add
elif value == '-':
return op.sub
elif value == '*':
return op.mul
elif value == '/':
return op.div
else:
raise ValueError('Unknown operator %s' % value)
```
Note: the operator module is a standard library modules that defines base
operator functions. For instance `operator.add(1, 1) == 2`.
This huge switch-like expression will soon become really difficult to read and
maintain. A more pythonic way is to use a dict to store the mapping.
Another reason is that to get 100% line and branch coverage, you will have to
create as many tests as you have mappings. Yet you could consider that the
value to operator mapping is part of the codebase's configuration, not its
behavior, and thus shouldn't be tested.
Good:
```python
import operator as op
OPERATORS = {
'+': op.add,
'-': op.sub,
'*': op.mul,
'/': op.div,
}
def get_operator(value):
"""Return operator function based on string."""
operator = OPERATORS.get(value)
if operator:
return operator
else:
raise ValueError('Unknown operator %s' % value)
```
Overreliance on kwargs
----------------------
TODO
Overreliance on list/dict comprehensions
----------------------------------------
TODO
Mutable default arguments
-------------------------
TODO
Using `is` to compare objects
-----------------------------
TODO
[Why you should almost never use “is” in
Python](http://blog.lerner.co.il/why-you-should-almost-never-use-is-in-python/)
Instantiating exception with a dict
-----------------------------------
Example:
```python
def main():
raise Exception({'msg': 'This is not a toaster', 'level': 'error'})
```
Why is this an antipattern? Exception are meant to be read by human beings.
Thus, their first argument should be a human-readable error message, like this:
```python
class NotAToasterException(Exception):
pass
def main():
raise NotAToasterException('This is not a toaster')
```
Most tools expect this, most importantly
[Sentry](https://getsentry.com/welcome/) which is a tool used to get alerts
when exception are raised in production. An `Exception`'s message should be
short so that it can be displayed on a single line.
If you need to attach custom metadata to an exception, the proper way is to
have a custom constructor:
```python
class NotAToasterException(Exception):
def __init__(self, message, level):
super(NotAToasterException, self).__init__(message)
self.message = message
self.level = level
def main():
raise NotAToasterException('This is not a toaster', 'error')
```
## Not strictly pinning all packages
Example of bad `requirements.txt` file:
```
sqlalchemy
flask
```
Another example of a bad `requirements.txt` file:
```
sqlalchemy>=0.9
flask>0.10
```
Good:
```
Flask==0.10.1
Jinja2==2.8
MarkupSafe==0.23
SQLAlchemy==1.0.12
Werkzeug==0.11.4
itsdangerous==0.24
```
This ensures that there's absolutely no ambiguity as to which package will be installed on production. If you forget to mention even a single package, you will perhaps have a different version on your testing/dev environment, and in your production environment.
The proper way to update a package and its dependency is to use another tool, for instance [pip-tools](https://github.com/nvie/pip-tools). If you have multiple applications and you want to mass update a package, then you'll have to write a script to do so. Keep things simple and explicit, then use scripts on top of it to instrument your processes.
**Reference**
* [Pin Your Packages](http://nvie.com/posts/pin-your-packages/)
# Reference
* [Pythonic Pitfalls](http://nafiulis.me/potential-pythonic-pitfalls.html)
* [Python Patterns](https://github.com/faif/python-patterns)
* [The Little Book of Python
Anti-Patterns](http://docs.quantifiedcode.com/python-anti-patterns/)
* [How to make mistakes in
Python](http://www.oreilly.com/programming/free/files/how-to-make-mistakes-in-python.pdf)
G

View File

@ -0,0 +1,19 @@
from collections import namedtuple
Bread = namedtuple('Bread', 'color')
class ToastException(Exception):
pass
def toast(bread):
try:
put_in_toaster(bread)
except:
raise ToastException('Could not toast bread')
def put_in_toaster(bread):
brad.color = 'light_brown' # Note the typo
toast(Bread('yellow'))

View File

@ -0,0 +1,20 @@
from collections import namedtuple
Bread = namedtuple('Bread', 'color')
class ToastException(Exception):
pass
def toast(bread):
try:
put_in_toaster(bread)
except:
print 'Got exception while trying to toast'
raise
def put_in_toaster(bread):
brad.color = 'light_brown' # Note the typo
toast(Bread('yellow'))

View File

@ -0,0 +1,18 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Not paging a database query](#not-paging-a-database-query)
- [Returning whole objects where primary key would suffice](#returning-whole-objects-where-primary-key-would-suffice)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Not paging a database query
---------------------------
TODO
Returning whole objects where primary key would suffice
-------------------------------------------------------
TODO

View File

@ -0,0 +1,176 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [SQLAlchemy Anti-Patterns](#sqlalchemy-anti-patterns)
- [Abusing lazily loaded relationships](#abusing-lazily-loaded-relationships)
- [Explicit session passing](#explicit-session-passing)
- [Implicit transaction handling](#implicit-transaction-handling)
- [Loading the full object when checking for object existence](#loading-the-full-object-when-checking-for-object-existence)
- [Using identity as comparator](#using-identity-as-comparator)
- [Returning `None` instead of raising a `NoResultFound` exception](#returning-none-instead-of-raising-a-noresultfound-exception)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
SQLAlchemy Anti-Patterns
========================
This is a list of what I consider [SQLAlchemy](http://www.sqlalchemy.org/)
anti-patterns.
Abusing lazily loaded relationships
-----------------------------------
Bad:
```python
class Customer(Base):
@property
def has_valid_toast(self):
"""Return True if customer has at least one valid toast."""
return any(toast.kind == 'brioche' for toast in self.toaster.toasts)
```
This suffers from severe performance inefficiencies:
* The toaster will be loaded, as well as its toast. This involves creating and
issuing the SQL query, waiting for the database to return, and instantiating
all those objects.
* `has_valid_toast` does not actually care about those objects. It just returns
a boolean.
A better way would be to issue a SQL `EXISTS` query so that the database
handles this check and only returns a boolean.
Good:
```python
class Customer(Base):
@property
def has_valid_toast(self):
"""Return True if customer has at least one valid toast."""
query = (session.query(Toaster)
.join(Toast)
.with_parent(self)
.filter(Toast.kind == 'brioche'))
return session.query(query.exists()).scalar()
```
This query might not always be the fastest if those relationships are small,
and eagerly loaded.
Explicit session passing
------------------------
TODO
Bad:
```python
def toaster_exists(toaster_id, session):
...
```
Implicit transaction handling
-----------------------------
TODO
Loading the full object when checking for object existence
----------------------------------------------------------
Bad:
```python
def toaster_exists(toaster_id):
return bool(session.query(Toaster).filter_by(id=toaster_id).first())
```
This is inefficient because it:
* Queries all the columns from the database (including any eagerly loaded joins)
* Instantiates and maps all data on the Toaster model
The database query would look something like this. You can see that all columns
are selected to be loaded by the ORM.
```sql
SELECT toasters.id AS toasters_id, toasters.name AS toasters_name,
toasters.color AS toasters_color
FROM toasters
WHERE toasters.id = 1
LIMIT 1 OFFSET 0
```
And then it just checks if the result is truthy.
Here's a better way to do it:
```python
def toaster_exists(toaster_id):
query = session.query(Toaster).filter_by(id=toaster_id)
return session.query(query.exists()).scalar()
```
In this case, we just ask the database about whether a record exists with this
id. This is obviously much more efficient.
```sql
SELECT EXISTS (SELECT 1
FROM toasters
WHERE toasters.id = 1) AS anon_1
```
Using identity as comparator
----------------------------
Bad:
```python
toasters = session.query(Toaster).filter(Toaster.deleted_at is None).all()
```
Unfortunately this won't work at all. This query will return all toasters,
including the one that were deleted.
The way sqlalchemy works is that it overrides the magic comparison methods
(`__eq__`, `__lt__`, etc.). All comparison methods can be overrode except the
identity operator (`is`) which checks for objects identity.
What this means is that expression `Toaster.deleted_at is None` will be
immediately evaluated by the Python interpreter, and since (presumably)
`Toaster.deleted_at` is a `sqlalchemy.orm.attributes.InstrumentedAttribute`,
it's not `None` and thus it's equivalent to doing:
```python
toasters = session.query(Toaster).filter(True).all()
```
Which obviously renders the filter inoperable, and will return all records.
There's two ways to fix it:
```python
toasters = session.query(Toaster).filter(Toaster.deleted_at == None).all()
```
Here we use the equality operator, which Python allows overriding. Behind the
scene, Python calls `Toaster.deleted_at.__eq__(None)`, which gives SQLAlchemy
the opportunity to return a comparator that when coerced to a string, will
evaluate to `deleted_at is NULL`.
Most linter will issue a warning for equality comparison against `None`, so you
can also do (this is my preferred solution):
```python
toasters = session.query(Toaster).filter(Toaster.deleted_at.is_(None)).all()
```
See docs for
[is_](http://docs.sqlalchemy.org/en/rel_1_0/core/sqlelement.html#sqlalchemy.sql.operators.ColumnOperators.is_).
## Returning `None` instead of raising a `NoResultFound` exception
See [Returning nothing instead of raising NotFound exception](https://github.com/charlax/antipatterns/blob/master/code-antipatterns.md#returning-nothing-instead-of-raising-notfound-exception).

View File

@ -0,0 +1,39 @@
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(bind=engine)
Base = declarative_base()
class Toaster(Base):
__tablename__ = 'toasters'
id = Column(Integer, primary_key=True)
name = Column(String)
color = Column(String)
def toaster_exists_bad(toaster_id):
session = Session()
return bool(session.query(Toaster).filter_by(id=toaster_id).first())
def toaster_exists_good(toaster_id):
session = Session()
query = session.query(Toaster).filter_by(id=toaster_id)
return session.query(query.exists()).scalar()
def main():
Base.metadata.create_all(engine)
toaster_exists_bad(1)
toaster_exists_good(2)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,180 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
# Table of Contents
- [Test antipatterns](#test-antipatterns)
- [Testing implementation](#testing-implementation)
- [Testing configuration](#testing-configuration)
- [Testing multiple things](#testing-multiple-things)
- [Repeating integration tests for minor variations](#repeating-integration-tests-for-minor-variations)
- [Over-reliance on centralized fixtures](#over-reliance-on-centralized-fixtures)
- [Over-reliance on replaying external requests](#over-reliance-on-replaying-external-requests)
- [Inefficient query testing](#inefficient-query-testing)
- [Assertions in loop](#assertions-in-loop)
- [Inverted testing pyramid](#inverted-testing-pyramid)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Test antipatterns
=================
Testing implementation
----------------------
TODO
Testing configuration
---------------------
TODO
Testing multiple things
-----------------------
TODO
Repeating integration tests for minor variations
------------------------------------------------
TODO
Over-reliance on centralized fixtures
-------------------------------------
Bad:
```python
# fixtures.py
toaster = Toaster(color='black')
toaster_with_color_blue = Toaster(color='blue')
toaster_with_color_red = Toaster(color='red')
# test.py
from fixtures import toaster, toaster_with_color_blue
def test_stuff():
toaster_with_color_blue.toast('brioche')
```
The problem with centralized fixtures is that they tend to grow exponentially.
Every single test case will have some specific fixtures requirements, and every
single permutation will have its own fixture. This will make the fixture file
really difficult to maintain.
The other problem is that it will become very easy for two unrelated tests to
use the same fixture. Now let's say one of those test's requirement changes,
and you have to change the fixture as well. In that case, you might break the
other test, which will slow you down and defeats the purpose of keeping test an
efficient part of the developer flow.
Lastly, this separate the setup and running part of the tests. It makes it more
difficult for a new engineer to understand what is specific about this test's
setup without having to open the ``fixtures`` file.
Here's a more explicit way to do this. Most fixtures libraries allow you to
override default parameters, so that you can make clear what setup is specific
to each test.
```python
def test_stuff():
toaster_with_color_blue = Toaster(color='blue')
toaster_with_color_blue.toast('brioche')
```
Over-reliance on replaying external requests
--------------------------------------------
TODO
Inefficient query testing
-------------------------
Bad:
```python
def test_find():
install_fixture('toaster_1') # color is red
install_fixture('toaster_2') # color is blue
install_fixture('toaster_3') # color is blue
install_fixture('toaster_4') # color is green
results = find(color='blue')
assert len(results) == 2
for r in results:
assert r.color == 'blue'
```
This is inefficient because we're installing four fixtures, even though we
could probably get away with only one. In most cases this would achieve the
same thing at a much lower cost:
```python
def test_find():
install_fixture('toaster_2') # color is blue
results = find(color='blue')
# Note the implicit assertion that the list is not empty because we're
# accessing its first item.
assert results[0].color == 'blue'
```
The general rule here is that tests should require the minimum amount of code
to verify the behavior.
One would also recommend to not do this kind of integration testing for queries
going to the database, but sometimes it's a good tradeoff.
Assertions in loop
------------------
Bad:
```python
def test_find():
results = find(color='blue')
for r in results:
assert r.color == 'blue'
```
This is bad because in most languages, if the list is empty then the test will
pass, which is probably not what we want to test.
Good:
```python
def test_find():
results = find(color='blue')
assert len(results) == 5
for r in results:
assert r.color == 'blue'
```
This is also a matter of tradeoff, but in most cases I'd try to make sure the
list contains only one item. If we're checking more than one item, that hints
at our test trying to do too many things.
## Inverted testing pyramid
![Test Pyramid](/images/test-pyramid.png)
*The [test pyramid](https://martinfowler.com/bliki/TestPyramid.html). Image courtesy of Martin Fowler.*
Building lots of automated comprehensive end-to-end tests was tried multiple time, and almost never worked.
* End to end tests try to do too many things, are highly indeterministic and as a result very flakey.
* Debugging end to end tests failure is painful and slow - this is usually where most of the time is wasted.
* Building and maintaining e2e tests is very costly.
* The time to run e2e tests is much much longer than unit tests.
Focused testing (e.g. unit, component, etc) prior to roll out, and business monitoring with alerting are much more efficient.
More reading on this topic:
* [End-To-End Testing Considered Harmful](http://www.alwaysagileconsulting.com/articles/end-to-end-testing-considered-harmful/), Always Agile Consulting
* [Just Say No to More End-to-End Tests](https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html), Google Testing Blog
* [Testing Strategies in a Microservice Architecture](https://martinfowler.com/articles/microservice-testing/#testing-end-to-end-tips), section titled "Writing and maintaining end-to-end tests can be very difficult", Toby Clemson, MartinFowler.com
* [Introducing the software testing ice-cream cone (anti-pattern)](https://watirmelon.blog/2012/01/31/introducing-the-software-testing-ice-cream-cone/), Alister Scott
* [TestPyramid](https://martinfowler.com/bliki/TestPyramid.html), Martin Fowler
* [professional-programming's testing section](https://github.com/charlax/professional-programming#testing)