The contents of this post were originally meant to be a part of Pandas Merging 101, but due to the nature and size of the content required to fully do justice to this topic, it has been moved to its own QnA.

Given two simple DataFrames;

```
left = pd.DataFrame({'col1' : ['A', 'B', 'C'], 'col2' : [1, 2, 3]})
right = pd.DataFrame({'col1' : ['X', 'Y', 'Z'], 'col2' : [20, 30, 50]})
left
col1 col2
0 A 1
1 B 2
2 C 3
right
col1 col2
0 X 20
1 Y 30
2 Z 50
```

The cross product of these frames can be computed, and will look something like:

```
A 1 X 20
A 1 Y 30
A 1 Z 50
B 2 X 20
B 2 Y 30
B 2 Z 50
C 3 X 20
C 3 Y 30
C 3 Z 50
```

What is the most performant method of computing this result?

Let's start by establishing a benchmark. The easiest method for solving this is using a temporary "key" column:

How this works is that both DataFrames are assigned a temporary "key" column with the same value (say, 1).

`merge`

then performs a many-to-many JOIN on "key".While the many-to-many JOIN trick works for reasonably sized DataFrames, you will see relatively lower performance on larger data.

A faster implementation will require NumPy. Here are some famous NumPy implementations of 1D cartesian product. We can build on some of these performant solutions to get our desired output. My favourite, however, is @senderle's first implementation.

## Generalizing: CROSS JOIN on Unique

orNon-Unique Indexed DataFramesThis trick will work on any kind of DataFrame. We compute the cartesian product of the DataFrames' numeric indices using the aforementioned

`cartesian_product`

, use this to reindex the DataFrames, andAnd, along similar lines,

This solution can generalise to multiple DataFrames. For example,

## Further Simplification

A simpler solution not involving @senderle's

`cartesian_product`

is possible when dealing withjust twoDataFrames. Using`np.broadcast_arrays`

, we can achieve almost the same level of performance.## Performance Comparison

Benchmarking these solutions on some contrived DataFrames with unique indices, we have

Do note that timings may vary based on your setup, data, and choice of

`cartesian_product`

helper function as applicable.Performance Benchmarking CodeThis is the timing script. All functions called here are defined above.

## Continue Reading

Jump to other topics in Pandas Merging 101 to continue learning:

Merging basics - basic types of joins

Index-based joins

Generalizing to multiple DataFrames

Cross join

^{*}_{* you are here }