# count number of rows in a data frame in R based on group [duplicate]

I have a data frame in `R` like this:

``````  ID   MONTH-YEAR   VALUE
110   JAN. 2012     1000
111   JAN. 2012     2000
.         .
.         .
121   FEB. 2012     3000
131   FEB. 2012     4000
.           .
.           .
``````

So, for each month of each year there are `n` rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each `MONTH-YEAR` i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:

`````` MONTH-YEAR   NUMBER OF ROWS
JAN. 2012     10
FEB. 2012     13
MAR. 2012     6
APR. 2012     9
``````

I tried to do this:

``````n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))
``````

but it does not produce the desired output.How can I do that?

11

Here's an example that shows how `table(.)` (or, more closely matching your desired output, `data.frame(table(.))` does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

``````mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))

mydf
#    ID MONTH.YEAR VALUE
# 1 110  JAN. 2012  1000
# 2 111  JAN. 2012  2000
# 3 121  FEB. 2012  3000
# 4 131  FEB. 2012  4000
# 5 141  MAR. 2012  5000
``````

Here's the calculation of the number of rows per group, in two output display formats:

``````table(mydf\$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
#         2         2         1

data.frame(table(mydf\$MONTH.YEAR))
#        Var1 Freq
# 1 FEB. 2012    2
# 2 JAN. 2012    2
# 3 MAR. 2012    1
``````
Tuesday, June 1, 2021

17

The reason is because you assigned a single new column to a 2 column `matrix` output by `apply`. So, the result will be a `matrix` in a single column. You can convert it back to normal data.frame with

`````` do.call(data.frame, df)
``````

A more straightforward method will be to assign 2 columns and I use `lapply` instead of `apply` as there can be cases where the columns are of different classes. `apply` returns a `matrix` and with mixed class, the columns will be 'character' class. But, `lapply` gets the output in a `list` and preserves the `class`

``````df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)
``````
Friday, August 13, 2021

93

Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with `HBase` shell `count_table` command. First parameter is table name and second is array of properties, the same as for `scan` shell command.

``````count_table 'your.table', { COLUMNS => 'your.family' }
``````

I also recommend to add cache, like for scan:

``````count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }
``````

And here you go with sources:

``````# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})

table = @shell.hbase_table(tablename)

# Run the scanner
scanner = table._get_scanner(args)

count = 0
iter = scanner.iterator

# Iterate results
while iter.hasNext
row = iter.next
count += 1
end

# Return the counter
return count
end
``````
Thursday, November 11, 2021

50

We can try using `data.table` methods

``````dt[, v1 := Reduce(`+`, lapply(.SD, function(x) x!=0)), .SDcols = 1:3]
dt[, result2 := round((Reduce(`*`, lapply(.SD, function(x)
replace(x, x==0, 1))))^(1/v1), 2), .SDcols = 1:3][, v1 := NULL][]
#    a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60
``````

Or another less efficient option is to group by sequence of rows and then do it on each row

``````dt[, result2 := {
u1 <- unlist(.SD)
round(prod(u1[u1!=0])^(1/sum(u1!=0)), 2)} , 1:nrow(dt), .SDcols = 1:3]
dt
#     a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60
``````

NOTE: Both of these are `data.table` methods.

Or another option contributed by @DavidArenburg

``````dt[, Result := round(Reduce(`*`, replace(.SD, .SD == 0, 1))^(1/rowSums(.SD != 0)), 2)]
``````

Another vectorized option is to convert to `matrix`

``````library(matrixStats)
m1 <- as.matrix(setDF(dt)[1:3])
round(rowProds(replace(m1, !m1, 1))^(1/rowSums(m1!=0)), 2)
# 0.67 0.39 0.10 0.60
``````
Friday, November 12, 2021

75

Based on the OP's clarification, it could be

``````out <- reshape(dat[setdiff(names(dat), 'item_type')], idvar = c('person_id', 'gender'), direction = 'wide', timevar = 'item_id')
dim(out)
#  2000 16006

out[1:3, c(1:3, 16000:16006)]
#   person_id gender item_trans.1 item_trans.15998 item_trans.15999 item_trans.16000 item_trans.16001 item_trans.16002 item_trans.16003
#1          1   MALE     5.091636               NA               NA               NA               NA               NA               NA
#32         2   MALE           NA               NA               NA               NA               NA               NA               NA
#64         3 FEMALE           NA               NA               NA               NA               NA               NA               NA
#   item_trans.16004
#1                NA
#32               NA
#64               NA
``````
Saturday, December 4, 2021