Asked  7 Months ago    Answers:  5   Viewed   45 times

I have a data frame in R like this:

  ID   MONTH-YEAR   VALUE
  110   JAN. 2012     1000
  111   JAN. 2012     2000
         .         .
         .         .
  121   FEB. 2012     3000
  131   FEB. 2012     4000
         .           .
         .           .

So, for each month of each year there are n rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each MONTH-YEAR i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:

 MONTH-YEAR   NUMBER OF ROWS
 JAN. 2012     10
 FEB. 2012     13
 MAR. 2012     6
 APR. 2012     9

I tried to do this:

n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))

but it does not produce the desired output.How can I do that?

 Answers

11

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

mydf
#    ID MONTH.YEAR VALUE
# 1 110  JAN. 2012  1000
# 2 111  JAN. 2012  2000
# 3 121  FEB. 2012  3000
# 4 131  FEB. 2012  4000
# 5 141  MAR. 2012  5000

Here's the calculation of the number of rows per group, in two output display formats:

table(mydf$MONTH.YEAR)
# 
# FEB. 2012 JAN. 2012 MAR. 2012 
#         2         2         1

data.frame(table(mydf$MONTH.YEAR))
#        Var1 Freq
# 1 FEB. 2012    2
# 2 JAN. 2012    2
# 3 MAR. 2012    1
Tuesday, June 1, 2021
 
StampyCode
answered 7 Months ago
17

The reason is because you assigned a single new column to a 2 column matrix output by apply. So, the result will be a matrix in a single column. You can convert it back to normal data.frame with

 do.call(data.frame, df)

A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes. apply returns a matrix and with mixed class, the columns will be 'character' class. But, lapply gets the output in a list and preserves the class

df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)
Friday, August 13, 2021
 
Baba
answered 4 Months ago
93

Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with HBase shell count_table command. First parameter is table name and second is array of properties, the same as for scan shell command.

Direct answer to your question is

count_table 'your.table', { COLUMNS => 'your.family' }

I also recommend to add cache, like for scan:

count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }

And here you go with sources:

# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})

    table = @shell.hbase_table(tablename)

    # Run the scanner
    scanner = table._get_scanner(args)

    count = 0
    iter = scanner.iterator

    # Iterate results
    while iter.hasNext
        row = iter.next
        count += 1
    end

    # Return the counter
    return count
end
Thursday, November 11, 2021
 
David Miani
answered 4 Weeks ago
50

We can try using data.table methods

dt[, v1 := Reduce(`+`, lapply(.SD, function(x) x!=0)), .SDcols = 1:3]
dt[, result2 := round((Reduce(`*`, lapply(.SD, function(x) 
    replace(x, x==0, 1))))^(1/v1), 2), .SDcols = 1:3][, v1 := NULL][]
#    a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60

Or another less efficient option is to group by sequence of rows and then do it on each row

dt[, result2 := {
           u1 <- unlist(.SD)
           round(prod(u1[u1!=0])^(1/sum(u1!=0)), 2)} , 1:nrow(dt), .SDcols = 1:3]
dt
#     a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60

NOTE: Both of these are data.table methods.

Or another option contributed by @DavidArenburg

dt[, Result := round(Reduce(`*`, replace(.SD, .SD == 0, 1))^(1/rowSums(.SD != 0)), 2)]

Another vectorized option is to convert to matrix

library(matrixStats)
m1 <- as.matrix(setDF(dt)[1:3])
round(rowProds(replace(m1, !m1, 1))^(1/rowSums(m1!=0)), 2)
#[1] 0.67 0.39 0.10 0.60
Friday, November 12, 2021
 
Null
answered 4 Weeks ago
75

Based on the OP's clarification, it could be

out <- reshape(dat[setdiff(names(dat), 'item_type')], idvar = c('person_id', 'gender'), direction = 'wide', timevar = 'item_id')
dim(out)
#[1]  2000 16006

out[1:3, c(1:3, 16000:16006)]
#   person_id gender item_trans.1 item_trans.15998 item_trans.15999 item_trans.16000 item_trans.16001 item_trans.16002 item_trans.16003
#1          1   MALE     5.091636               NA               NA               NA               NA               NA               NA
#32         2   MALE           NA               NA               NA               NA               NA               NA               NA
#64         3 FEMALE           NA               NA               NA               NA               NA               NA               NA
#   item_trans.16004
#1                NA
#32               NA
#64               NA
Saturday, December 4, 2021
 
Tushar Garg
answered 3 Days ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share