Asked  7 Months ago    Answers:  5   Viewed   202 times

There is an option in R to get control over digit display. For example:

options(digits=10)

is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:

digits: controls the number of digits to print when printing numeric values. It is a suggestion only. Valid values are 1...22 with default 7

So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?

My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?

Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave

[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"

which has 48 decimals. Is this the maximum limit R can handle?

 Answers

74

The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.

As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.

Mathematica or Maple will allow you to give as many decimal places as your heart desires.

EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.

On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).

Compare these two analyses.

x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1)  #Should throw an error

x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2)  #ok

In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.


As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.

mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")

These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.

Tuesday, June 1, 2021
 
Vlad
answered 7 Months ago
34

Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.

Solution:

The following code shows exactly two decimal places for the number x.

format(round(x, 2), nsmall = 2)

For example:

format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"

A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.

specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))

E.g.,

specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Tuesday, June 1, 2021
 
mdevils
answered 7 Months ago
99

You should use the digits parameter from xtable function correctly.

table1 <- xtable(t3,caption="Table showing the Mean discharge
and mean gage height on each year on each month",digits=c(0,0,0,3,4))

Each element of that vector represents the number of decimal fields in each column (including the first column with row.names).

Thursday, July 22, 2021
 
cbcp
answered 5 Months ago
13

When it comes to the CONLL format, i presume you mean the CONLL2000 chunking task format as such:

   He        PRP  B-NP
   reckons   VBZ  B-VP
   the       DT   B-NP
   current   JJ   I-NP
   account   NN   I-NP
   deficit   NN   I-NP
   will      MD   B-VP
   narrow    VB   I-VP
   to        TO   B-PP
   only      RB   B-NP
   #         #    I-NP
   1.8       CD   I-NP
   billion   CD   I-NP
   in        IN   B-PP
   September NNP  B-NP
   .         .    O

There are three columns in the CONLL chunking task format:

  1. token (i.e. word)
  2. POS tag
  3. BIO (begin, inside, outside) of chunk/phrase tag

Sadly, if you use the stanford MaxEnt tagger, it only give you the token and POS information but has no BIO chunk information.

java -cp stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/left3words-wsj-0-18.tagger -textFile short.txt -outputFormat tsv 2> /dev/null

Using the above command the Stanford POS tagger already give you the tab separated format, just that it's without the 3rd column (see http://nlp.stanford.edu/software/pos-tagger-faq.shtml):

   He        PRP
   reckons   VBZ
   the       DT
   ...

To get the BIO colum, you would require either:

  • a statistical chunker or
  • a full parser

see http://www-nlp.stanford.edu/links/statnlp.html for a list of chunker/parser, if you want to stick with stanford tools, i suggest the stanford parser but it gives you the bracketed parse format, which you have to do some post-processing to get it into CONLL2000 format, see http://nlp.stanford.edu/software/lex-parser.shtml

Monday, October 4, 2021
 
Momhain
answered 2 Months ago
64

The quote argument in write.table supports numeric vectors to specify the location of the columns to add quotes, so

write.table(df, file = "test.txt", sep = "|", quote = 2)

works for this example, producing

"c1"|"c2"
"1"|0.01|"A"
"2"|0.02|"B"
"3"|0.03|"C"
"4"|0.04|"D"
"5"|0.05|"E"
Monday, November 22, 2021
 
Morrison Chang
answered 1 Week ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share