Asked  4 Months ago    Answers:  5   Viewed   359 times

Given the following array, I want to replace commas with dots:

array(['0,140711', '0,140711', '0,0999', '0,0999', '0,001', '0,001',
       '0,140711', '0,140711', '0,140711', '0,140711', '0,140711',
       '0,140711', 0L, 0L, 0L, 0L, '0,140711', '0,140711', '0,140711',
       '0,140711', '0,140711', '0,1125688', '0,140711', '0,1125688',
       '0,140711', '0,1125688', '0,140711', '0,1125688', '0,140711',
       '0,140711', '0,140711', '0,140711', '0,140711', '0,140711',
       '0,140711', '0,140711', '0,140711', '0,140711', '0,140711',
       '0,140711', '0,140711', '0,140711', '0,140711', '0,140711',
       '0,140711', '0,140711', '0,140711', '0,140711'], dtype=object)

I've been trying different ways but I can't figure out how to do this. Also, I have imported it as a pandas DataFrame but can't apply the function:

df
      1-8        1-7
H0   0,140711   0,140711
H1     0,0999     0,0999
H2      0,001      0,001
H3   0,140711   0,140711
H4   0,140711   0,140711
H5   0,140711   0,140711
H6          0          0
H7          0          0
H8   0,140711   0,140711
H9   0,140711   0,140711
H10  0,140711  0,1125688
H11  0,140711  0,1125688
H12  0,140711  0,1125688
H13  0,140711  0,1125688
H14  0,140711   0,140711
H15  0,140711   0,140711
H16  0,140711   0,140711
H17  0,140711   0,140711
H18  0,140711   0,140711
H19  0,140711   0,140711
H20  0,140711   0,140711
H21  0,140711   0,140711
H22  0,140711   0,140711
H23  0,140711   0,140711 

df.applymap(lambda x: str(x.replace(',','.')))

Any suggestions how to solve this?

 Answers

23

You need to assign the result of your operate back as the operation isn't inplace, besides you can use apply or stack and unstack with vectorised str.replace to do this quicker:

In [5]:
df.apply(lambda x: x.str.replace(',','.'))

Out[5]:
          1-8        1-7
H0   0.140711   0.140711
H1     0.0999     0.0999
H2      0.001      0.001
H3   0.140711   0.140711
H4   0.140711   0.140711
H5   0.140711   0.140711
H6          0          0
H7          0          0
H8   0.140711   0.140711
H9   0.140711   0.140711
H10  0.140711  0.1125688
H11  0.140711  0.1125688
H12  0.140711  0.1125688
H13  0.140711  0.1125688
H14  0.140711   0.140711
H15  0.140711   0.140711
H16  0.140711   0.140711
H17  0.140711   0.140711
H18  0.140711   0.140711
H19  0.140711   0.140711
H20  0.140711   0.140711
H21  0.140711   0.140711
H22  0.140711   0.140711
H23  0.140711   0.140711

In [4]:    
df.stack().str.replace(',','.').unstack()

Out[4]:
          1-8        1-7
H0   0.140711   0.140711
H1     0.0999     0.0999
H2      0.001      0.001
H3   0.140711   0.140711
H4   0.140711   0.140711
H5   0.140711   0.140711
H6          0          0
H7          0          0
H8   0.140711   0.140711
H9   0.140711   0.140711
H10  0.140711  0.1125688
H11  0.140711  0.1125688
H12  0.140711  0.1125688
H13  0.140711  0.1125688
H14  0.140711   0.140711
H15  0.140711   0.140711
H16  0.140711   0.140711
H17  0.140711   0.140711
H18  0.140711   0.140711
H19  0.140711   0.140711
H20  0.140711   0.140711
H21  0.140711   0.140711
H22  0.140711   0.140711
H23  0.140711   0.140711

the key thing here is to assign back the result:

df = df.stack().str.replace(',','.').unstack()

Sunday, June 13, 2021
 
Daveel
answered 4 Months ago
20

You may use

df['col_b_PY'] = df['col_a'].str.extract(r"([a-zA-Z'-]+s+PY)b")
df['col_c_LG'] = df['col_a'].str.extract(r"([a-zA-Z'-]+s+LG)b")

Or, to extract all matches and join them with a space:

df['col_b_PY'] = df['col_a'].str.extractall(r"([a-zA-Z'-]+s+PY)b").unstack().apply(lambda x:' '.join(x.dropna()), axis=1)
df['col_c_LG'] = df['col_a'].str.extractall(r"([a-zA-Z'-]+s+LG)b").unstack().apply(lambda x:' '.join(x.dropna()), axis=1)

Note you need to use a capturing group in the regex pattern so that extract could actually extract the text:

Extract capture groups in the regex pat as columns in a DataFrame.

Note the b word boundary is necessary to match PY / LG as a whole word.

Also, if you want to only start a match from a letter, you may revamp the pattern to

r"([a-zA-Z][a-zA-Z'-]*s+PY)b"
r"([a-zA-Z][a-zA-Z'-]*s+LG)b"
   ^^^^^^^^          ^

where [a-zA-Z] will match a letter and [a-zA-Z'-]* will match 0 or more letters, apostrophes or hyphens.

Python 3.7 with Pandas 0.24.2:

pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 500)

df = pd.DataFrame({
    'col_a': ['Python PY is a general-purpose language LG',
             'Programming language LG in Python PY',
             'Its easier LG to understand  PY',
             'The syntax of the language LG is clean PY',
             'Python PY is a general purpose PY language LG']
    })
df['col_b_PY'] = df['col_a'].str.extractall(r"([a-zA-Z'-]+s+PY)b").unstack().apply(lambda x:' '.join(x.dropna()), axis=1)
df['col_c_LG'] = df['col_a'].str.extractall(r"([a-zA-Z'-]+s+LG)b").unstack().apply(lambda x:' '.join(x.dropna()), axis=1)

Output:

                                           col_a              col_b_PY     col_c_LG
0     Python PY is a general-purpose language LG             Python PY  language LG
1           Programming language LG in Python PY             Python PY  language LG
2                Its easier LG to understand  PY        understand  PY    easier LG
3      The syntax of the language LG is clean PY              clean PY  language LG
4  Python PY is a general purpose PY language LG  Python PY purpose PY  language LG
Thursday, August 5, 2021
 
Ujjawal Khare
answered 3 Months ago
48

Regex is overkill for replacing just a single character. Why not just do this instead?

str_replace(',', '.', $form);
Thursday, August 12, 2021
 
Ramacciotti
answered 2 Months ago
94

You probably have something in your csv (a character line) that does not allow converting the columns to number, because the dec = "," parameter should work. See this example with your data:

text <- "3,063E+01 1,775E-02 6,641E-07 3,747E-02"
read.table(text=text, dec = ",")
     V1      V2        V3      V4
1 30.63 0.01775 6.641e-07 0.03747

Now, if you can't identify the problem (find what is preventing R to identify your columns as numeric), you could use gsub.

df <- read.table(text=text)
df <- sapply(df, gsub, pattern = ",", replacement= ".")
df <- sapply(df, as.numeric)
     V1        V2        V3        V4 
3.063e+01 1.775e-02 6.641e-07 3.747e-02 
Friday, August 20, 2021
 
Elliott Frisch
answered 2 Months ago
39

You could do something like the following:

target_value = 15
df['max_duration'] = df.groupby('Date')['Duration'].transform('max')
df.query('max_duration == Duration')
  .assign(dist=lambda df: np.abs(df['Value'] - target_value))
  .assign(min_dist=lambda df: df.groupby('Date')['dist'].transform('min'))
  .query('min_dist == dist')
  .loc[:, ['Date', 'ID']

Results:

        Date ID
4   1/1/2018  e
11  1/2/2018  e
Saturday, August 28, 2021
 
Carlo Pellegrini
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :