Asked  7 Months ago    Answers:  5   Viewed   42 times

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with dplyr?

library(dplyr)
data(mtcars)
mtcars <- tbl_df(mtcars)

# count frequency
mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n())

# am gear  n
#  0    3 15 
#  0    4  4 
#  1    4  8  
#  1    5  5 

What I would like to achieve:

am gear  n rel.freq
 0    3 15      0.7894737
 0    4  4      0.2105263
 1    4  8      0.6153846
 1    5  5      0.3846154

 Answers

30

Try this:

mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

Tuesday, June 1, 2021
 
buymypies
answered 7 Months ago
44

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt)) %>%
 summarise(newvar2 = sum(newvar) + 5)

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt))

df2 <- df1 %>%
 group_by(cyl) %>%
 summarise(newvar2 = sum(newvar)+5)
Tuesday, June 22, 2021
 
Precastic
answered 6 Months ago
20

There is an elegant solution for this from the following link.

http://haacked.com/archive/2004/06/29/current-directory-for-windows-service-is-not-what-you-expect.aspx/

As my service is running both as console/service I just called

Directory.SetCurrentDirectory(AppDomain.CurrentDomain.BaseDirectory) 

before running it as Service E.g.

static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                Directory.SetCurrentDirectory(AppDomain.CurrentDomain.BaseDirectory);
                RunAsService();
            }
            else
            {
                RunAsConsole();
            }
        }
Sunday, August 1, 2021
 
Tapha
answered 4 Months ago
16

We can use within do

data %>%
    group_by(let ) %>% 
    do(mutate(., mean.by.letter = mean(.$x)))
Monday, August 9, 2021
 
jsuggs
answered 4 Months ago
96

After we order the dataset based on the 'freq' column (arrange(...)), we can the top 3 values with slice, use ggplot, specify the 'x' and 'y' variables in the aes, and plot the bar with geom_bar

 library(ggplot2)
 library(dplyr)
 df %>% 
    arrange(desc(freq)) %>%
    slice(1:3) %>%
    ggplot(., aes(x=type, y=freq))+
              geom_bar(stat='identity')

Or another option is top_n which is a convenient wrapper that uses filter and min_rank to select the top 'n' (3) observations in 'freq' column and use ggplot as above.

top_n(df, n=3, freq) %>%
          ggplot(., aes(x=type, y=freq))+
              geom_bar(stat='identity')

enter image description here

Thursday, August 19, 2021
 
Neil Stockton
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share