Everything I've learnt about writing good Python code

In the past 6 months, I've 10xed the amount of python code I've written. In this article, I'll show you a few easy actionable tips to write better and more maintainable code. I've been lucky enough to have Jason (@jxnlco on twitter) review a good chunk of my code and I've found that these few things have made a massive difference in my code quality.

using the @classmethod decorator
learn the stdlib
write simpler functions
being a bit lazier - earn the abstraction
decouple your implementation

Use the classmethod decorator

You should be using the @classmethod decorator when dealing with complex logic. A good example is that of the Instructor API schema which has clear explicit ways for you to instantiate the different API providers.

Let's compare two separate versions of the API. The first is the API that the library used before their v1.0.0 release and the second is their more recent version

# Pre-V1.0.0
import instructor
from openai import OpenAI

client = instructor.patch(OpenAI())

# Post V1
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

Because we're using the classmethod to define explicitly the client that we want to patch, we get better code readability and improved autocomplete out of the box. this is great for developer productivity.

If you ever want to migrate to a separate provider that doesn't support the OpenAI standard, you need to change to a separate classmethod and explicitly make that change in the code. If the two providers have very different behaviour, this helps you to catch subtle bugs that you otherwise might not have caught.

This is important because the more complex your code's logic is, the more likely for it to have a strange bug. You don't want to be dealing with complex edge cases since you're explicitly declaring the specific clients you're using in your code base.

Using Classmethods

I recently worked on a script that generated embeddings using different models using SentenceTransformers, OpenAI and Cohere. This was tricky because each of these models need to be used differently, even when initialising them and I finally settled on the code below.

import enum


class Provider(enum.Enum):
    HUGGINGFACE = "HuggingFace"
    OPENAI = "OpenAI"
    COHERE = "Cohere"


class EmbeddingModel:
    def __init__(
        self,
        model_name: str,
        provider: Provider,
        max_limit: int = 20,
        expected_iterations: int = float("inf"),
    ):
        self.model_name = model_name
        self.provider = provider
        self.max_limit = max_limit
        self.expected_iterations = expected_iterations

    @classmethod
    def from_hf(cls, model_name: str):
        return cls(
            model_name,
            provider=Provider.HUGGINGFACE,
            max_limit=float("inf"),
        )

    @classmethod
    def from_openai(cls, model_name: str, max_limit=20):
        return cls(model_name, provider=Provider.OPENAI, max_limit=max_limit)

    @classmethod
    def from_cohere(cls, model_name: str):
        return cls(model_name, provider=Provider.COHERE)

There are a few things which make the code above good

Easier To Read: I can determine which provider I'm using when I instantiate the class - EmbeddingModel.from_hf makes it clear that it's the SentenceTransformers package that's being used
Lesser Redundancy: I only need to pass in the values that I need to for each specific model. This makes it easy to add in additional configuration parameters down the line and be confident that it won't mess up existing functionality

Learn Common Libraries

This might be overstated but I think everyone should take some time to at least read through the basic functions in commonly used libraries. Some general parallels I've found have been

Handling Data -> Pandas
Retrying/Exception Handling -> Tenacity
Caching data -> diskcache
Validating Objects -> Pydantic
Printing/Formatting things to the console - Rich
Working with generators - itertools has a great selection of things like islice and automatic batching
Writing common counters/dictionary insertion logic etc - use Collections
Caching Data/Working with Curried functions? - use functools

If a commonly used libarary provides some functionality, you should use it. It's rarely going to be beneficial to spend hours writing your own version unless it's for educational purposes. The simple but effective hack I've found has been to use a variant of the following prompt.

I want to do <task>. How can I do so with <commonly used library>.

ChatGPT has a nasty habit of trying to roll its own implementation of everything. I made this mistake recently as usual when I had to log the results of an experiment I did. ChatGPT suggested I use the csv module, manually calculate a set of all of the keys in my data before writing it to a .csv file as seen below.

import csv

data = [{"key1": 2, "key4": 10}, {"key3": 3}, {"key4": 4}]

keys = set()
for obj in data:
    keys.update(obj.keys())

with open("output.csv", "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=keys)
    writer.writeheader()
    writer.writerows(data)

After spending 30 minutes testing and fixing some bugs with this version, I discovered to my dismay that Pandas had the native to_csv classmethod to write to csv and that I could generate a Dataframe from a list of objects as seen below.

import pandas as pd

data = [{"key1": 2, "key4": 10}, {"key3": 3}, {"key4": 4}]
df = pd.DataFrame(data)
df.to_csv("output.csv", index=False)

What's beautiful about using a pandas dataframe is that now, you get all this beautiful added functionality like

Generating it as a markdown table? - just use df.to_markdown()
Want to get a dictionary with the keys of each row? - just use df.to_dict()
Want to get a json formatted string? - just use df.to_json()

That's a huge reduction in the potential issues with my code because I'm now using a library method that other people have spent time and effort to write and test. Standard libraries are also well supported across the ecosystem, allowing you to take advantage of other integrations down the line (Eg. LanceDB supporting Pydantic Models )

Write Simpler Functions

I think that there are three big benefits to writing simpler functions that don't mutate state

They're easier to reason about since they're much smaller in size
They're easier to test because we can mock the inputs and assert on the outputs
They're easier to refactor because we can swap out different components easily

I had a pretty complex problem to solve recently with some code - which was to take in a dataset of rows with some text and then embed every sentence inside it. I took some time and wrote an initial draft that looked something like this

def get_dataset_batches(data, dataset_mapping: dict[str, int], batch_size=100):
    """
    In this case, dataset_mapping maps a sentence to a
    id that we can use to identify it by. This is an
    empty dictionary by default
    """
    batch = []
    for row in data:
        s1, s2 = data["text"]
        if s1 not in dataset_mapping:
            dataset_mapping[s1] = len(dataset_mapping)
            batch.append(s1)
            if len(batch) == batch_size:
                yield batch
                batch = []

        if s2 not in dataset_mapping:
            dataset_mapping[s2] = len(dataset_mapping)
            batch.append(s1)
        if len(batch) == batch_size:
            yield batch
            batch = []

    if batch:
        yield batch

Instead of doing this, a better method might be to break up our function into the following few smaller functions as seen below.

def get_unique_sentences(data):
    seen = set()
    for row in data:
        s1, s2 = data["text"]
        if s1 not in seen:
            seen.add(s1)
            yield s1

        if s2 not in seen:
            seen.add(s1)
            yield s2


def get_sentence_to_id_mapping(sentences: List[str]):
    return {s:i for i, s in enumerate(sentence)}


def generate_sentence_batches(sentence_mapping: dict[str, int], batch_size=100):
    batch = []
    for sentence in sentence_mapping:
        batch.append([sentence, sentence_mapping[sentence]])
        if len(batch) == batch_size:
            yield batch
        batch = []

    if batch:
        yield batch

I wrote my own batching function here but you should really be using itertools.batched if you're running Python 3.12 and above.

We can then call the function above using a main function like

def main(data):
    sentences = get_unique_sentences(data)
    s2id = get_sentence_to_id_mapping(sentences)
    batches = generate_sentence_batches(s2id)
    return batches

In the second case we're not squeezing everything into a single function. It's clear exactly what is happening in each function which makes it easy for people to understand your code.

Additionally, because we don't mutate state in between our functions and instead generate a new value, we are able to mock and test each of these functions individually, allowing for more stable code to be written in the long run.

It helps to have one main function call a sequence of other functions and have those functions be as flat as possible. This means that ideally between each of these calls, we minimise mutation or usage of some shared variable and only do so when there's an expensive piece of computation involved.

Earn the Abstraction

I think it's easy to quickly complicate a codebase with premature abstractions without much effort over time - just look at Java. After all, it's a natural reflex as we work towards writing code that is DRY. But that often makes it difficult for you to adapt your code down the line.

For instance, an easy way to do this initially is to just return a simple dictionary if the returned value is only being used by a single function.

def extract_attributes(data):
    new_data = process_data(data)
    return {"key1": new_data["key1"], "key2": new_data["key2"]}


def main():
    data = pd.read_csv("data.csv")
    attributes = extract_attributes(data)
    return attributes

In this example, there's a limited utility to declaring an entire dataclass because it adds additional overhead and complexity to the function.

@dataclass
class Attributes:
    key1: List[int]
    key2: List[int]


def extract_attributes(data):
    new_data = process_data(data)
    return Attributes(key1=new_data["key1"], key2=new_data["key2"])


def main():
    data = pd.read_csv("data.csv")
    attributes = extract_attributes(data)
    return attributes

Decouple your implementation

Another example I like a lot is scoring values. Say we want to calculate the recall for a list of predictions that we've made where we have a single known label as our ground truth and a list of other labels as our model's predictions. We might implement it down the line as

def calculate_recall(labels,predictions):
    scores = []
    for label,preds in zip(labels,predictions):
        calculate_recall(label,preds)
    return scores

But what if we'd like to work with other metrics down the line like precision, NDCG or Mean Reciprocal Rank? Wouldn't we then have to declare 4 different functions for this?

def calculate_recall_predictions(labels, predictions):
    scores = []
    for label, preds in zip(labels, predictions):
        calculate_recall(label, preds)
    return scores


def calculate_ndcg_predictions(labels,predictions):
    scores = []
    for label, preds in zip(labels, predictions):
        calculate_ndcg(label, preds)
    return scores

A better solution instead is to parameterise the scoring function itself. If you look at the different functions we've defined, they all take in a single label. We're also doing the same thing in each function, which is to score the predictions with respect to a specific output.

This means that we could rewrite this as seen below

def score(labels, predictions, score_fn):
    return [score_fn(label, pred) for label, pred in zip(labels, predictions)]

In fact, we could even go one step further and just represent all of the metrics that we want as a single dictionary.

SIZES = [3, 10, 15, 25]
metrics = {"recall": calculate_recall, "mrr": calculate_mrr}
evals = {}


def predictions_at_k(score_fn, k:int):
    def wrapper(chunk_id, predictions):
        return score(chunk_id, predictions[:k])

    return wrapper


for metric, k in itertools.product(metrics.keys(), SIZES):
    evals[f"{metric}@{k}"] = predictions_at_k(score_fn=metrics[metric], k=k)

We can then take a given label and list of predictions and calculate the result as

def score(labels, predictions):
    return pd.DataFrame(
        [
            {label: metric_fn(label, pred) for label, metric_fn in evals.items()}
            for label, pred in zip(labels, predictions)
        ]
    )

This is an extremely flexible function that we're using which gives us a pandas dataframe out of the box. All we need to do if we want to add an extra metric is to add a new entry to the metrics dictionary. If we want to evaluate our results at a new subset of k items, we just need to update the SIZES array too.

Conclusion

Everyone needs to write enough bad code to start writing better code. The path to writing better code is paved with a lot of PR reviews and reading better code examples. I've definitely written my share of bad code and I hope that this article helps you to see some interesting ways to write better code.