# Converting RGB to grayscale/intensity

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.

It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.

However, I didn't find a good reference for this on the web. What is the source of these values?

92

The specific numbers in the question are from CCIR 601 (see the Wikipedia link below).

If you convert RGB -> grayscale with slightly different numbers / different methods, you won't see much difference at all on a normal computer screen under normal lighting conditions -- try it.

Here are some more links on color in general:

Wikipedia Luma

Bruce Lindbloom 's outstanding web site

chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2; this long link to Ware in books.google.com may or may not work

cambridgeincolor : excellent, well-written "tutorials on how to acquire, interpret and process digital photographs using a visually-oriented approach that emphasizes concept over procedure"

Should you run into "linear" vs "nonlinear" RGB, here's part of an old note to myself on this. Repeat, in practice you won't see much difference.

### RGB -> ^gamma -> Y -> L*

In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ), are called "nonlinear" or Gamma corrected. "Linear" values are defined as

``````Rlin = R^gamma,  Glin = G^gamma,  Blin = B^gamma
``````

where gamma is 2.2 for many PCs. The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma)) (purists tongue-click) but here I'll drop the '.

Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma, so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness. (LCD displays are more complex; furthermore, some graphics cards compensate for gamma.)

To get the measure of lightness called `L*` from RGB, first divide R G B by 255, and compute

``````Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
``````

This is `Y` in XYZ color space; it is a measure of color "luminance". (The real formulas are not exactly x^gamma, but close; stick with x^gamma for a first pass.)

Finally,

``````L* = 116 * Y ^ 1/3 - 16
``````

"... aspires to perceptual uniformity [and] closely matches human perception of lightness." -- Wikipedia Lab color space

Tuesday, June 1, 2021

34

`double` images have values in range [0,1] (float), `uint8` images in range `[0,2^8-1]` (only integers). Using `uint8` you simply convert your values between 0 and 1 to 0 and 1 which is black or nearly black.

Use `im2uint8` or `im2double` to convert images, these functions automatically rescale your values to the appropriate range.

Tuesday, August 31, 2021

61

Your values are way, way outside the bounds of the colorspace.

From the docs:

Coordinates in all of these color spaces are floating point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1.

I'm guessing you're expecting them to be byte values between 0 and 255? Just divide by 255 first:

``````r, g, b = 192, 64, 1
r, g, b = [x/255.0 for x in r, g, b]
h, l, s = colorsys.rgb_to_hls(r, g, b)
r, g, b = colorsys.hls_to_rgb(h, l, s)
r, g, b = [x*255.0 for x in r, g, b]
print r, g, b
``````

This will give you:

``````192.0 64.0 1.0
``````

If you want to understand why you get such ridiculous errors when you go way outside the bounds of the colorspace.

Well, first, read the two documents linked from the docs to understand why values outside the colorspace are meaningless, and why you're asking it to generate values to impossible constraints. Then, to figure out exactly why it fails in the way it does instead of some other way, you need to know which algorithm it's using—which is pretty easy, since the docs link to the source code, which is pure Python and pretty readable.

(PS, I can imagine what 9650% lightness might look like, but I'm curious about -100% saturation. Probably something Lovecraftian.)

Friday, October 8, 2021

72

ImageMagick's "convert" command can generate a histogram.

``````\$ convert image.png -define histogram:unique-colors=true -format %c histogram:info:-

19557: (  0,  0,  0) #000000 gray(0,0,0)
1727: (  1,  1,  1) #010101 gray(1,1,1)
2868: (  2,  2,  2) #020202 gray(2,2,2)
2066: (  3,  3,  3) #030303 gray(3,3,3)
1525: (  4,  4,  4) #040404 gray(4,4,4)
.
.
.
``````

Depending on your language of choice and how you want the colors represented, you could go lots of directions from here. Here's a quick Ruby example though:

``````out = `convert /tmp/lbp_desert1.png
-define histogram:unique-colors=true
-format %c histogram:info:-
| sed -e 's/.*: (//'
| sed -e 's/).*//'`

out.split("n")
.map{ |row| row.split(",").map(&:to_i) }

# => [[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4] .....
``````
Monday, October 11, 2021

78

Here's my attempt at recreating this using OpenCV Python. It's a rather hack-ish solution that is a bit computationally intensive, but it certainly gets the job done.

First, create a mask where pixels that are zero correspond to those pixels you want to keep at high resolution and pixels that are one correspond to those pixels you want to blur. To make things simple, I would create a circle of dark pixels that define the high resolution pixels.

With this mask, one tool that I can suggest to make this work is to use the distance transform on this mask. For each point in the binary mask, the corresponding output point in the distance transform is the distance from this point to the closest zero pixel. As such, as you venture far away from the zero pixels in the mask, the greater the distance would be.

Therefore the farther away you go from a zero pixel in this mask, the more blur you apply. Using this idea, I simply wrote a loop through the image and at each point, create a blur mask - whether it be averaging or Gaussian or anything related to that - that is proportional to the distance in the distance transform and blur this point with that blur mask. Any values that are zero in this mask should have no blur applied to it. For all of the other points in the mask, we use the values in the mask to guide us in collecting a neighbourhood of pixels centered at this point and perform a blur. The larger the distance, the larger the pixel neighbourhood should be and thus the stronger the blur will be.

To simplify things, I'm going to use an averaging mask. Specifically, for each value in the distance transform, the size of this mask will be `M x M` where `M` is:

``````M = d / S
``````

`d` is a distance value from the distance transform and `S` is a scale factor that scales down the value of `d` so that the averaging can be more feasible. This is because the distance transform can get quite large as you go farther away from a zero pixel, and so the scale factor makes the averaging more realistic. Formally, for each pixel in our output, we collect a neighbourhood of `M x M` pixels, get an average and set this to be our output.

One intricacy that we need to keep in mind is that when we collect pixels where the centre of the neighbourhood is along the border of the image, we need to make sure that we collect pixels within the boundaries of the image so any locations that go outside of the image, we skip.

Now it's time to show some results. For reference, I used the Camera Man image, that is a standard testing image and is very popular. It is shown here: I'm also going to set the mask to be located at row 70 and column 100 to be a circle of radius 25. Without further ado, here's the code fully commented. I'll let you parse through the comments yourself.

``````import cv2 # Import relevant libraries
import cv
import numpy as np

height = img.shape # Get the dimensions
width = img.shape

# Draw circle at x = 100, y = 70 of radius 25 and fill this in with 0
cv2.circle(mask, (100, 70), 25, 0, -1)

# Apply distance transform to mask

# Define scale factor
scale_factor = 10

# Create output image that is the same as the original
filtered = img.copy()

# Create floating point copy for precision
img_float = img.copy().astype('float')

# Number of channels
if len(img_float.shape) == 3:
num_chan = img_float.shape
else:
# If there is a single channel, make the images 3D with a singleton
# dimension to allow for loop to work properly
num_chan = 1
img_float = img_float[:,:,None]
filtered = filtered[:,:,None]

# For each pixel in the input...
for y in range(height):
for x in range(width):

# If distance transform is 0, skip
if out[y,x] == 0.0:
continue

# Calculate M = d / S

# If M is too small, set the mask size to the smallest possible value

# Get beginning and ending x and y coordinates for neighbourhood
# and ensure they are within bounds
if beginx < 0:
beginx = 0

if beginy < 0:
beginy = 0

if endx >= width:
endx = width-1

if endy >= height:
endy = height-1

# Get the coordinates of where we need to grab pixels
xvals = np.arange(beginx, endx+1)
yvals = np.arange(beginy, endy+1)
(col_neigh,row_neigh) = np.meshgrid(xvals, yvals)
col_neigh = col_neigh.astype('int')
row_neigh = row_neigh.astype('int')

# Get the pixels now
# For each channel, do the foveation
for ii in range(num_chan):
chan = img_float[:,:,ii]
pix = chan[row_neigh, col_neigh].ravel()

# Calculate the average and set it to be the output
filtered[y,x,ii] = int(np.mean(pix))

# Remove singleton dimension if required for display and saving
if num_chan == 1:
filtered = filtered[:,:,0]

# Show the image
cv2.imshow('Output', filtered)
cv2.waitKey(0)
cv2.destroyAllWindows()
``````

The output I get is: Friday, October 15, 2021