Asked  7 Months ago    Answers:  5   Viewed   57 times

I want to eliminate all the whitespace from a string, on both ends, and in between words.

I have this Python code:

def my_handle(self):
    sentence = ' hello  apple  '
    sentence.strip()

But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?

 Answers

68

If you want to remove leading and ending spaces, use str.strip():

sentence = ' hello  apple'
sentence.strip()
>>> 'hello  apple'

If you want to remove all space characters, use str.replace():

(NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace)

sentence = ' hello  apple'
sentence.replace(" ", "")
>>> 'helloapple'

If you want to remove duplicated spaces, use str.split():

sentence = ' hello  apple'
" ".join(sentence.split())
>>> 'hello apple'
Tuesday, June 1, 2021
 
jakubos
answered 7 Months ago
63

Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# Returns string without leading white space
trim.leading <- function (x)  sub("^\s+", "", x)

# Returns string without trailing white space
trim.trailing <- function (x) sub("\s+$", "", x)

# Returns string without leading or trailing white space
trim <- function (x) gsub("^\s+|\s+$", "", x)

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To 'show' the white space you could use:

 paste(myDummy$country)

which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.

Tuesday, June 1, 2021
 
Bere
answered 7 Months ago
89
var myContent = '<div id="test">Hello <span>world!</span></div>';

alert($(myContent).text());

That results in hello world. Does that answer your question?

http://jsfiddle.net/D2tEf/ for an example

Saturday, July 3, 2021
 
msg
answered 5 Months ago
msg
64

This solution uses two regexes. The first regex splits the entire file/string into three chunks:

  1. The first chunk, (captured into group $1) is everything from the start of the string up through and including the first HTML start tag.
  2. The second chunk, (captured into group $2) is everything after the first HTML start tag up to the start of the last HTML close tag.
  3. The third chunk, (captured into group $3) includes the last HTML end tag and everything that follows up to the end of the file/string.

The function first attempts to match the regex to the input text. If this matches, the contents of the outermost HTML element (which was previously captured in group 2) are then stripped of any HTML start and end tags using the second regex. The string is then reassembled using the three chunks (with the middle chunk having been stripped of HTML tags).

def stripInnermostHTMLtags(text):
    '''Strip all but outermost HTML start and end tags.
    '''
    # Regex to match outermost HTML element and its contents.
    p_outer = re.compile(r"""
        ^                 # Anchor to start of string.
        (.*?<html[^>]*>)  # $1: Outer HTML start tag.
        (.*)              # $2: Outer HTML element contents.
        (</htmls*>.*)    # $3: Outer HTML end tag.
        $                 # Anchor to end of string.
        """, re.DOTALL | re.VERBOSE | re.IGNORECASE)
    # Split text into outermost HTML tags and its contents.
    m = p_outer.match(text)
    if m:
        # Regex to match HTML element start or end tag.
        p_inner = re.compile("</?html[^>]*>", re.IGNORECASE)
        # Strip contents of any/all HTML start and end tags.
        contents = p_inner.sub("", m.group(2))
        # Put string back together stripped of inner HTML tags.
        text = m.group(1) + contents + m.group(3)
    return text

Note that this solution correctly handles any attributes that may be in the HTML start tags. Note also that this solution does NOT handle HTML tags having attributes with values containing the > character (but this should be very rare).

Friday, August 20, 2021
 
Arman
answered 4 Months ago
21

Trim will remove spaces only at the edges, not in the middle (this is common behaviour on almost all languages/libraries). If you want to remove all spaces in the string, you will have to create your own function to do this, iterating through the string.

Ex.:

program Test

implicit none

    ! Variables
    character(len=200) :: string

    ! Body of Test
    string = 'Hello World              7    9'
    print *, string
    call StripSpaces (string)
    print *, string


contains

    subroutine StripSpaces(string)
    character(len=*) :: string
    integer :: stringLen 
    integer :: last, actual

    stringLen = len (string)
    last = 1
    actual = 1

    do while (actual < stringLen)
        if (string(last:last) == ' ') then
            actual = actual + 1
            string(last:last) = string(actual:actual)
            string(actual:actual) = ' '
        else
            last = last + 1
            if (actual < last) &
                actual = last
        endif
    end do

    end subroutine

end program Test

This was tested on intel compiler, not on gfortran, but I think it will work.

Wednesday, October 20, 2021
 
clean_coding
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share