# Relative frequencies / proportions with dplyr

Suppose I want to calculate the proportion of different values within each group. For example, using the `mtcars` data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with `dplyr`?

``````library(dplyr)
data(mtcars)
mtcars <- tbl_df(mtcars)

# count frequency
mtcars %>%
group_by(am, gear) %>%
summarise(n = n())

# am gear  n
#  0    3 15
#  0    4  4
#  1    4  8
#  1    5  5
``````

What I would like to achieve:

``````am gear  n rel.freq
0    3 15      0.7894737
0    4  4      0.2105263
1    4  8      0.6153846
1    5  5      0.3846154
``````

30

Try this:

``````mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154
``````

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the `summarise`, the last grouping variable specified in `group_by`, 'gear', is peeled off. In the `mutate` step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with `groups`.

The outcome of the peeling is of course dependent of the order of the grouping variables in the `group_by` call. You may wish to do a subsequent `group_by(am)`, to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

Tuesday, June 1, 2021

44

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

``````mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)
``````

Note that this will give a different answer if you use `group_by(gear, cyl)` in the second line.

And to get your first attempt working:

``````df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))

df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)
``````
Tuesday, June 22, 2021

20

There is an elegant solution for this from the following link.

http://haacked.com/archive/2004/06/29/current-directory-for-windows-service-is-not-what-you-expect.aspx/

As my service is running both as console/service I just called

``````Directory.SetCurrentDirectory(AppDomain.CurrentDomain.BaseDirectory)
``````

before running it as Service E.g.

``````static void Main(string[] args)
{
if (args.Length == 0)
{
Directory.SetCurrentDirectory(AppDomain.CurrentDomain.BaseDirectory);
RunAsService();
}
else
{
RunAsConsole();
}
}
``````
Sunday, August 1, 2021

16

We can use within `do`

``````data %>%
group_by(let ) %>%
do(mutate(., mean.by.letter = mean(.\$x)))
``````
Monday, August 9, 2021

96

After we order the dataset based on the 'freq' column (`arrange(...)`), we can the top 3 values with `slice`, use `ggplot`, specify the 'x' and 'y' variables in the `aes`, and plot the bar with `geom_bar`

`````` library(ggplot2)
library(dplyr)
df %>%
arrange(desc(freq)) %>%
slice(1:3) %>%
ggplot(., aes(x=type, y=freq))+
geom_bar(stat='identity')
``````

Or another option is `top_n` which is a convenient wrapper that uses `filter` and `min_rank` to select the top 'n' (3) observations in 'freq' column and use `ggplot` as above.

``````top_n(df, n=3, freq) %>%
ggplot(., aes(x=type, y=freq))+
geom_bar(stat='identity')
`````` Thursday, August 19, 2021