Asked  7 Months ago    Answers:  5   Viewed   107 times

Let's say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part.

I only know what will be the few characters directly before AAA, and after ZZZ the part I am interested in 1234.

With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA(.*)ZZZ.*|1|"

And this will give me 1234 as a result.

How to do the same thing in Python?

 Answers

58

Using regular expressions - documentation for further reference

import re

text = 'gfgfdAAA1234ZZZuijjk'

m = re.search('AAA(.+?)ZZZ', text)
if m:
    found = m.group(1)

# found: 1234

or:

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling

# found: 1234
Tuesday, June 1, 2021
 
Sauleil
answered 7 Months ago
41

If the strings are different (ie: [foo] & [/foo]), take a look at this post from Justin Cook. I copy his code below:

function get_string_between($string, $start, $end){
    $string = ' ' . $string;
    $ini = strpos($string, $start);
    if ($ini == 0) return '';
    $ini += strlen($start);
    $len = strpos($string, $end, $ini) - $ini;
    return substr($string, $ini, $len);
}

$fullstring = 'this is my [tag]dog[/tag]';
$parsed = get_string_between($fullstring, '[tag]', '[/tag]');

echo $parsed; // (result = dog)
Tuesday, June 1, 2021
 
qitch
answered 7 Months ago
75

Try this test:

any(substring in string for substring in substring_list)

It will return True if any of the substrings in substring_list is contained in string.

Note that there is a Python analogue of Marc Gravell's answer in the linked question:

from itertools import imap
any(imap(string.__contains__, substring_list)) 

In Python 3, you can use map directly instead:

any(map(string.__contains__, substring_list))

Probably the above version using a generator expression is more clear though.

Tuesday, July 27, 2021
 
njai
answered 5 Months ago
37

It's hard to suggest an optimal solution without seeing the actual data, but you can try these things:

  • Generate a single pattern matching all values. This way you would only need to search the string once (instead of once per value).
  • Skip escaping values unless they contain special characters (like '^' or '*').
  • Assign the result directly to temp, avoiding unnecessary copying with temp.extend().
import regex

# 'str' is a built-in name, so use 'string' instead
string = 'This is a Test string from which I want to match multiple substrings'
values = ['test', 'test2', 'Multiple', 'ring', 'match']
pattern = r'b({})b'.format('|'.join(map(regex.escape, values)))

# unique matches, lowercased
matches = set(map(str.lower, regex.findall(pattern, string, regex.IGNORECASE)))

# arrange the results as they appear in `values`
temp = [x.upper() for x in values if x.lower() in matches]
print(temp)  # ['TEST', 'MULTIPLE', 'MATCH']
Wednesday, August 25, 2021
 
tadman
answered 4 Months ago
91

Find the range of the two strings and return the substring in between:

NSString *s = @"hi how are... you";

NSRange r1 = [s rangeOfString:@"how"];
NSRange r2 = [s rangeOfString:@"you"];
NSRange rSub = NSMakeRange(r1.location + r1.length, r2.location - r1.location - r1.length);
NSString *sub = [s substringWithRange:rSub];
Saturday, September 11, 2021
 
iammichael
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share