Asked  7 Months ago    Answers:  5   Viewed   20 times

Python 2.7.1 I am trying to use python regular expression to extract words inside of a pattern

I have some string that looks like this

someline abc
someother line
name my_user_name is valid
some more lines

I want to extract the word "my_user_name". I do something like

import re
s = #that big string
p = re.compile("name .* is valid", re.flags)
p.match(s) #this gives me <_sre.SRE_Match object at 0x026B6838>

How do I extract my_user_name now?



You need to capture from regex. search for the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:

>>> p = re.compile("name (.*) is valid")
>>> result =
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>>     # group(1) will return the 1st capture (stuff within the brackets).
                        # group(0) will returned the entire matched text.
Tuesday, June 1, 2021
answered 7 Months ago

I am not familiar with 2to3, but from all the comments, it looks like the correct tool for the job.

That said, perhaps we can use this question as an excuse for a short lesson in some vim basics.

First, you want a pattern that matches the correct lines. I think that ^s*print> will do:

  • ^ matches start of line (and $ matches end of line).
  • s matches whitespace (space or tab)
  • * means 0 or more of the previous atom (as many as possible, or "greedy").
  • print is a literal string.
  • > matches end-of-word (zero width). You might use a (literal) space or s+ instead.

Next, you need to identify the part to be enclosed in parentheses. Since * is greedy, .* will match to the end of the line; there is no need to anchor it on the right. Use (s*print) and (.*) to capture the pieces, so that you can refer to them as 1 and 2 in the replacement.

Now, put the pieces together. There are many variants, and I have not tried to "golf" this one:


Some people prefer the "very magic" version, where only a-z, A-Z, 0-9, and _ are treated as literal characters; then you do not need to escape the parentheses nor the plus:

Thursday, August 12, 2021
answered 4 Months ago

You can extract all files matching your pattern from many tar as follows:

  1. Use glob to get you a list of all of the *.tar or *.gz files in a given folder.

  2. For each tar file, get a list of the files in each tar file using the getmembers() function.

  3. Use a regular expression (or a simple if "xxx" in test) to filter the required files.

  4. Pass this list of matching files to the members parameter in the extractall() function.

  5. Exception handling is added to catch badly encoded tar files.

For example:

import tarfile
import glob
import re

reT = re.compile(r'.*?_sl_H.*?')

for tar_filename in glob.glob(r'my_source_folder*.tar'):
        t =, 'r')
    except IOError as e:
        t.extractall('outdir', members=[m for m in t.getmembers() if])
Thursday, August 26, 2021
answered 4 Months ago

You need to replace the last D with (?!d).

In your testing, you used a multiline string input and in the code, you test individual strings that have no digit at the end after 2. D is a consuming pattern, there must be a non-digit char, and the (?!d) is a negative lookahead, a non-consuming pattern that just requires that the next char cannot be a digit.

Another solution is to replace the last D with a word boundary b, but you have to use a raw string literal to avoid issues with escaping (i.e. use r'pattern').

Monday, October 4, 2021
answered 2 Months ago

It seems you have encountered a bug in Python. This other question details the problem and workarounds. You can elect to use one of those workarounds, or update to Python 2.6.5 or 2.7b2.

One of the workarounds suggests copying the patched module from the fixed Python.

Best of luck!

Monday, November 29, 2021
answered 1 Week ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :