Asked  7 Months ago    Answers:  5   Viewed   43 times

Is there a fast algorithm for finding the Largest Common Substring in two strings or is it an NPComplete problem?

In PHP I can find a needle in a haystack:

<?php

if (strstr("there is a needle in a haystack", "needle")) {
    echo "found<br>n";
}
?>

I guess I could do this in a loop over one of the strings but that would be very expensive! Especially since my application of this is to search a database of email and look for spam (i.e. similar emails sent by the same person).

Does anyone have any PHP code they can throw out there?

 Answers

58

I have since found a relevant wikipedia article. It is not a NP complete problem, it can be done in O(mn) time using a dynamic programming algorithm.

In PHP I found the similar_text function very useful. Here's a code sample to retrieve a series of text emails and loop through them and find ones that are 90% similar to each other. Note: Something like this is NOT scalable:

<?php
// Gather all messages by a user into two identical associative arrays
$getMsgsRes = mysql_query(SELECT * FROM email_messages WHERE from = '$someUserID');
while($msgInfo = mysql_fetch_assoc($getMsgsRes))
{
    $msgsInfo1[] = $msgInfo;
    $msgsInfo2[] = $msgInfo;
}

// Loop over msgs and compare each one to every other
foreach ($msgsInfo1 as $msg1)
    foreach ($msgsInfo2 as $msg2)
        similar_text($msg1['msgTxt'],$msg2['msgTxt'],$similarity_pst);
        if ($similarity_pst > 90)
            echo "{$msg1['msgID']} is ${similarity_pst}% to {$msg2['msgID']}n";
?>
Wednesday, March 31, 2021
 
IvanH
answered 7 Months ago
59

The reason why your mail is being sent to Spam folder is either because of the content of your email or that the receiving side is not able to verify if the email actually came from the stated domain in the from address, i.e., if the sender (you) are authorized to send email on behalf of heygee.com.

Content part is easy to correct. You need to avoid bad grammar, ambiguous links (e.g links which say google.com but point to example.com), etc. Your message should be well worded (exclude those words frequently found in spam mails), and preferably include an unsubscribe link as well (if sent to a mailing list).

Now comes the second and difficult part. The domain that you are writing in your from address should be the same domain from which you are executing this mail script or should be authorized by this domain's TXT records to send mail on its behalf. The simplest way to go about this would be (provided you have DNS access to the sending domain name) to add a TXT SPF record permitting the IP of the server your script resides on to send mail on its behalf. Example SPF record:

"v=spf1 ip6:1080::8:800:200C:417A/96 -all"

The above record means Allow any IPv6 address between 1080::8:800:0000:0000 and 1080::8:800:FFFF:FFFF.

Checkout: SPF (Wikipedia)

Also, you may have a look here http://www.openspf.org/

Now if you don't have DNS access, then simply change the domain name of the from address to the domain name of the server and it should fix it.

Hope it helps.

Saturday, May 29, 2021
 
relipse
answered 5 Months ago
52

I would have the following line of protection:

1. authentication.

  • First authenticate user using something like for example lightopenid. When you use google as openid provider for instance then you have a very good chance that the users isn't a spammer, because they take measures against spammers.

2. Akismet/Sblam

  • Like glenn pointed out you can use akismet which is free for personal sites. Or you could try sblam which is free to use.

3. Captcha

  • You could back the mailinglist up by captcha for even further protection.

4. authorization.

  • Ban openid urls which spam.
Saturday, May 29, 2021
 
Eddas
answered 5 Months ago
54

If you must use Streams:

IntStream.of(arr).sorted().skip(N-M)

Otherwise use a PriorityQueue and write yourself an inverting Comparator. Insertion will be O(N(log(N)) and removal of M elements will be O(M(log(N)). Not what you asked for, but maybe close enough.

Thursday, August 19, 2021
 
odbhut.shei.chhele
answered 2 Months ago
43
template<bool, typename T1, typename T2>
struct is_cond {
    typedef T1 type;
};

template<typename T1, typename T2>
struct is_cond<false, T1, T2> {
    typedef T2 type;
};

template<typename T1, typename T2>
struct largest {
     typedef typename is_cond< (sizeof(T1)>sizeof(T2)), T1, T2>::type type;
};
Wednesday, September 22, 2021
 
altermativ
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :