# Matplotlib scatter plot with different text at each data point

I am trying to make a scatter plot and annotate data points with different numbers from a list. So, for example, I want to plot `y` vs `x` and annotate with corresponding numbers from `n`.

``````y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
ax1.scatter(z, y, fmt='o')
``````

Any ideas?

55

I'm not aware of any plotting method which takes arrays or lists but you could use `annotate()` while iterating over the values in `n`.

``````y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.subplots()
ax.scatter(z, y)

for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
``````

There are a lot of formatting options for `annotate()`, see the matplotlib website: Tuesday, June 1, 2021

62

There are several ways to animate a matplotlib plot. In the following let's look at two minimal examples using a scatter plot.

### (a) use interactive mode `plt.ion()`

For an animation to take place we need an event loop. One way of getting the event loop is to use `plt.ion()` ("interactive on"). One then needs to first draw the figure and can then update the plot in a loop. Inside the loop, we need to draw the canvas and introduce a little pause for the window to process other events (like the mouse interactions etc.). Without this pause the window would freeze. Finally we call `plt.waitforbuttonpress()` to let the window stay open even after the animation has finished.

``````import matplotlib.pyplot as plt
import numpy as np

plt.ion()
fig, ax = plt.subplots()
x, y = [],[]
sc = ax.scatter(x,y)
plt.xlim(0,10)
plt.ylim(0,10)

plt.draw()
for i in range(1000):
x.append(np.random.rand(1)*10)
y.append(np.random.rand(1)*10)
sc.set_offsets(np.c_[x,y])
fig.canvas.draw_idle()
plt.pause(0.1)

plt.waitforbuttonpress()
``````

### (b) using `FuncAnimation`

Much of the above can be automated using `matplotlib.animation.FuncAnimation`. The FuncAnimation will take care of the loop and the redrawing and will constantly call a function (in this case `animate()`) after a given time interval. The animation will only start once `plt.show()` is called, thereby automatically running in the plot window's event loop.

``````import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np

fig, ax = plt.subplots()
x, y = [],[]
sc = ax.scatter(x,y)
plt.xlim(0,10)
plt.ylim(0,10)

def animate(i):
x.append(np.random.rand(1)*10)
y.append(np.random.rand(1)*10)
sc.set_offsets(np.c_[x,y])

ani = matplotlib.animation.FuncAnimation(fig, animate,
frames=2, interval=100, repeat=True)
plt.show()
``````
Saturday, June 5, 2021

67

Basically, you're wanting a density estimate of some sort. There multiple ways to do this:

1. Use a 2D histogram of some sort (e.g. `matplotlib.pyplot.hist2d` or `matplotlib.pyplot.hexbin`) (You could also display the results as contours--just use `numpy.histogram2d` and then contour the resulting array.)

2. Make a kernel-density estimate (KDE) and contour the results. A KDE is essentially a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins (usually in the shape of a gaussian "bell curve").

Using a 2D histogram is simple and easy to understand, but fundementally gives "blocky" results.

There are some wrinkles to doing the second one "correctly" (i.e. there's no one correct way). I won't go into the details here, but if you want to interpret the results statistically, you need to read up on it (particularly the bandwidth selection).

At any rate, here's an example of the differences. I'm going to plot each one similarly, so I won't use contours, but you could just as easily plot the 2D histogram or gaussian KDE using a contour plot:

``````import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kde

np.random.seed(1977)

# Generate 200 correlated x,y points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

nbins = 20

fig, axes = plt.subplots(ncols=2, nrows=2, sharex=True, sharey=True)

axes[0, 0].set_title('Scatterplot')
axes[0, 0].plot(x, y, 'ko')

axes[0, 1].set_title('Hexbin plot')
axes[0, 1].hexbin(x, y, gridsize=nbins)

axes[1, 0].set_title('2D Histogram')
axes[1, 0].hist2d(x, y, bins=nbins)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

axes[1, 1].set_title('Gaussian KDE')
axes[1, 1].pcolormesh(xi, yi, zi.reshape(xi.shape))

fig.tight_layout()
plt.show()
`````` One caveat: With very large numbers of points, `scipy.stats.gaussian_kde` will become very slow. It's fairly easy to speed it up by making an approximation--just take the 2D histogram and blur it with a guassian filter of the right radius and covariance. I can give an example if you'd like.

One other caveat: If you're doing this in a non-cartesian coordinate system, none of these methods apply! Getting density estimates on a spherical shell is a bit more complicated.

Tuesday, July 27, 2021

49

This works:

``````s = [u'+', u'+', u'o']
col = ['r','r','g']
x = np.array([1,2,3])
y = np.array([4,5,6])

for _s, c, _x, _y in zip(s, col, x, y):
plt.scatter(_x, _y, marker=_s, c=c)

plt.xlim(0, 4)
plt.ylim(0, 8)

plt.show()
``````

Rendering like this: Update

It seems you can have a variety of colors and have a single call to the scatter function: example. The multiple color feature is confirmed on the API but it doesn't read that you can specify an iterable for the marker kwarg. Your code works if you remove `marker=s`

Friday, July 30, 2021

55

Here's a work-around along the same lines suggested by Etienne. The key idea is to set up the plot, then use a separate call to `points3d()` to plot the points in each size class.

``````# Break data.frame into a list of data.frames, each to be plotted
# with points of a different size
size <- as.numeric(cut(iris\$Petal.Width, 7))
irisList <- split(iris, size)

# Setup the plot
with(iris, plot3d(Sepal.Length, Sepal.Width, Petal.Length, col=Species, size=0))

# Use a separate call to points3d() to plot points of each size
for(i in seq_along(irisList)) {
with(irisList[[i]], points3d(Sepal.Length, Sepal.Width,
Petal.Length, col=Species, size=i))
}
``````

(FWIW, it does appear that there's no way to get `plot3d()` to do this directly. The problem is that `plot3d()` uses the helper function `material3d()` to set point sizes and as shown below, `material3d()` only wants to take a single numeric value.)

``````material3d(size = 1:7)
# Error in rgl.numeric(size) : size must be a single numeric value
``````
Sunday, October 17, 2021