Asked  6 Months ago    Answers:  5   Viewed   66 times

I am trying to make a scatter plot and annotate data points with different numbers from a list. So, for example, I want to plot y vs x and annotate with corresponding numbers from n.

y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
ax = fig.add_subplot(111)
ax1.scatter(z, y, fmt='o')

Any ideas?

 Answers

55

I'm not aware of any plotting method which takes arrays or lists but you could use annotate() while iterating over the values in n.

y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.subplots()
ax.scatter(z, y)

for i, txt in enumerate(n):
    ax.annotate(txt, (z[i], y[i]))

There are a lot of formatting options for annotate(), see the matplotlib website:

enter image description here

Tuesday, June 1, 2021
 
Raef
answered 6 Months ago
62

There are several ways to animate a matplotlib plot. In the following let's look at two minimal examples using a scatter plot.

(a) use interactive mode plt.ion()

For an animation to take place we need an event loop. One way of getting the event loop is to use plt.ion() ("interactive on"). One then needs to first draw the figure and can then update the plot in a loop. Inside the loop, we need to draw the canvas and introduce a little pause for the window to process other events (like the mouse interactions etc.). Without this pause the window would freeze. Finally we call plt.waitforbuttonpress() to let the window stay open even after the animation has finished.

import matplotlib.pyplot as plt
import numpy as np

plt.ion()
fig, ax = plt.subplots()
x, y = [],[]
sc = ax.scatter(x,y)
plt.xlim(0,10)
plt.ylim(0,10)

plt.draw()
for i in range(1000):
    x.append(np.random.rand(1)*10)
    y.append(np.random.rand(1)*10)
    sc.set_offsets(np.c_[x,y])
    fig.canvas.draw_idle()
    plt.pause(0.1)

plt.waitforbuttonpress()

(b) using FuncAnimation

Much of the above can be automated using matplotlib.animation.FuncAnimation. The FuncAnimation will take care of the loop and the redrawing and will constantly call a function (in this case animate()) after a given time interval. The animation will only start once plt.show() is called, thereby automatically running in the plot window's event loop.

import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np

fig, ax = plt.subplots()
x, y = [],[]
sc = ax.scatter(x,y)
plt.xlim(0,10)
plt.ylim(0,10)

def animate(i):
    x.append(np.random.rand(1)*10)
    y.append(np.random.rand(1)*10)
    sc.set_offsets(np.c_[x,y])

ani = matplotlib.animation.FuncAnimation(fig, animate, 
                frames=2, interval=100, repeat=True) 
plt.show()
Saturday, June 5, 2021
 
supermitch
answered 6 Months ago
67

Basically, you're wanting a density estimate of some sort. There multiple ways to do this:

  1. Use a 2D histogram of some sort (e.g. matplotlib.pyplot.hist2d or matplotlib.pyplot.hexbin) (You could also display the results as contours--just use numpy.histogram2d and then contour the resulting array.)

  2. Make a kernel-density estimate (KDE) and contour the results. A KDE is essentially a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins (usually in the shape of a gaussian "bell curve").

Using a 2D histogram is simple and easy to understand, but fundementally gives "blocky" results.

There are some wrinkles to doing the second one "correctly" (i.e. there's no one correct way). I won't go into the details here, but if you want to interpret the results statistically, you need to read up on it (particularly the bandwidth selection).

At any rate, here's an example of the differences. I'm going to plot each one similarly, so I won't use contours, but you could just as easily plot the 2D histogram or gaussian KDE using a contour plot:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kde

np.random.seed(1977)

# Generate 200 correlated x,y points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

nbins = 20

fig, axes = plt.subplots(ncols=2, nrows=2, sharex=True, sharey=True)

axes[0, 0].set_title('Scatterplot')
axes[0, 0].plot(x, y, 'ko')

axes[0, 1].set_title('Hexbin plot')
axes[0, 1].hexbin(x, y, gridsize=nbins)

axes[1, 0].set_title('2D Histogram')
axes[1, 0].hist2d(x, y, bins=nbins)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

axes[1, 1].set_title('Gaussian KDE')
axes[1, 1].pcolormesh(xi, yi, zi.reshape(xi.shape))

fig.tight_layout()
plt.show()

enter image description here

One caveat: With very large numbers of points, scipy.stats.gaussian_kde will become very slow. It's fairly easy to speed it up by making an approximation--just take the 2D histogram and blur it with a guassian filter of the right radius and covariance. I can give an example if you'd like.

One other caveat: If you're doing this in a non-cartesian coordinate system, none of these methods apply! Getting density estimates on a spherical shell is a bit more complicated.

Tuesday, July 27, 2021
 
relyt
answered 4 Months ago
49

This works:

s = [u'+', u'+', u'o']
col = ['r','r','g']
x = np.array([1,2,3])
y = np.array([4,5,6])

for _s, c, _x, _y in zip(s, col, x, y):
    plt.scatter(_x, _y, marker=_s, c=c)

plt.xlim(0, 4)
plt.ylim(0, 8)

plt.show()

Rendering like this:

Plot of above code

Update

It seems you can have a variety of colors and have a single call to the scatter function: example. The multiple color feature is confirmed on the API but it doesn't read that you can specify an iterable for the marker kwarg. Your code works if you remove marker=s

Friday, July 30, 2021
 
Floris
answered 4 Months ago
55

Here's a work-around along the same lines suggested by Etienne. The key idea is to set up the plot, then use a separate call to points3d() to plot the points in each size class.

# Break data.frame into a list of data.frames, each to be plotted 
# with points of a different size
size <- as.numeric(cut(iris$Petal.Width, 7))
irisList <- split(iris, size)

# Setup the plot
with(iris, plot3d(Sepal.Length, Sepal.Width, Petal.Length, col=Species, size=0))

# Use a separate call to points3d() to plot points of each size
for(i in seq_along(irisList)) {
    with(irisList[[i]], points3d(Sepal.Length, Sepal.Width, 
                                 Petal.Length, col=Species, size=i))
}

(FWIW, it does appear that there's no way to get plot3d() to do this directly. The problem is that plot3d() uses the helper function material3d() to set point sizes and as shown below, material3d() only wants to take a single numeric value.)

material3d(size = 1:7)
# Error in rgl.numeric(size) : size must be a single numeric value
Sunday, October 17, 2021
 
Kai
answered 2 Months ago
Kai
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share