Asked  4 Months ago    Answers:  5   Viewed   667 times

I installed spark and when trying to run it, I am getting the error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

Can someone help me with that?



The same problem occured with me because python path was not added to system environment. I added this in environment and now it works perfectly.

Adding PYTHONPATH environment variable with value as:


helped resolve this issue. Just check what py4j version you have in your spark/python/lib folder.

Friday, June 25, 2021
answered 4 Months ago

Let's start with a couple of imports

from pyspark.sql.functions import col, lit, coalesce, greatest

Next define minus infinity literal:

minf = lit(float("-inf"))

Map columns and pass the result to greatest:

rowmax = greatest(*[coalesce(col(x), minf) for x in ['v2','v3','v4']])

Finally withColumn:

df1.withColumn("rowmax", rowmax)

with result:

| v1| v2| v3|  v4|rowmax|
|foo|1.0|3.0|null|   3.0|
|bar|2.0|2.0| -10|   2.0|
|baz|3.3|1.2|null|   3.3|

You can use the same pattern with different row wise operations replacing minf with neutral element. For example:

rowsum = sum([coalesce(col(x), lit(0)) for x in ['v2','v3','v4']])


from operator import mul
from functools import reduce

rowproduct = reduce(
  [coalesce(col(x), lit(1)) for x in ['v2','v3','v4']]

Your own code could be significantly simplified with udf:

from pyspark.sql.types import DoubleType
from pyspark.sql.functions import udf

def get_max_row_with_None_(*cols):
    return float(max(x for x in cols if x is not None))

get_max_row_with_None = udf(get_max_row_with_None_, DoubleType())
df1.withColumn("rowmax", get_max_row_with_None('v2','v3','v4'))

Replace minf with lit(float("inf")) and greatest with least to get the smallest value per row.

Saturday, July 3, 2021
answered 4 Months ago

It's a bug:

The link above also provides the workaround: using a java.time.format.DateTimeFormatterBuilder with a java.time.temporal.ChronoField for the milliseconds field:

String text = "20170925142051591";
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    // date/time
    // milliseconds
    .appendValue(ChronoField.MILLI_OF_SECOND, 3)
    // create formatter
// now it works

Unfortunately, there seems to be no way to parse this using only DateTimeFormatter.ofPattern(String).

Wednesday, July 7, 2021
answered 4 Months ago

If you want to calculate RMSE by group, a slight adaptation of the solution I proposed to your question

import pyspark.sql.functions as psf

def compute_RMSE(expected_col, actual_col):

  rmse = old_df.withColumn("squarederror",
                           psf.pow(psf.col(actual_col) - psf.col(expected_col),
  .groupby('start_month', 'start_week')
  .withColumn("rmse", psf.sqrt(psf.col("mse")))


compute_RMSE("col1", "col2")
Saturday, July 31, 2021
answered 3 Months ago

See this Microsoft Knowledge Base article:


  • You run a Microsoft .NET Framework 4-based application that is stored on a network share.
  • The application calls a static method in the System.Configuration.ConfigurationManager class. For example, the application calls the ConfigurationManager.GetSection method.

In this scenario, a System.Security.SecurityException exception is thrown and then the application crashes.


The issue occurs because the method fails to access configuration section from the application on network share.

You can request the hotfix from that site.

Saturday, October 9, 2021
answered 1 Week ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :