How to combine these two regex passes into one? - php

I have a few thousand strings that have one of these two forms:
SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord
SomeT1tle-ThatL00ks L1k3.that - 4.5% KnownWord
The SomeT1tle-ThatL00ks L1ke.this part may contain uppercase and lowercase characters, digits, periods, dashes, and spaces. It is always followed by a space-dash-space pattern.
I want to pull out the Title (the part before the space-dash-space separator) and the Amount, which is right before KnownWord.
So for these two strings I'd like:
SomeT1tle-ThatL00ks L1k3.this, $3.57 and
SomeT1tle-ThatL00ks L1k3.that, 4.5%.
This code works (using Perl equivalent Regular Expressions)
$my_string = "SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord";
$pattern_title = "/^(.*?)\x20\x2d\x20/";
$pattern_amount = "/([0-9.$%]+) KnownWord$/";
preg_match_all($pattern_title, $my_string, $matches_title);
preg_match_all($pattern_amount, $my_string, $matches_amount);
echo $matches_title[1][0] . " " . $matches_amount[1][0] . "<br>";
I tried putting both patterns together:
$pattern_together_doesnt_work = "/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/";
but the first part of the pattern always matches the whole thing, even with the "lazy" part (.*? rather than .*). I can't negative-match spaces and dashes, because the title itself can contain either.
Any hints?

Use this pattern
/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/

Related

split a value into two and then reverse the value in php

I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please
You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.

RegEx to match expression even if it is not 100% the same

How would you write a regular expression pattern that matches a string even if it is 90% accurate?
For example:
$search_string = "Two years in, the <a href='site.com'>company</a> has expanded to 35 cities, five of which are outside the U.S. "
$subject = "Two years in,the company has expanded to 35 cities, five of which are outside the U.S."
The end result is that the $search_string matches the $subject and returns true even though they are not 100% the same.
You can have some optional parts on the regex pattern. For example:
$search_string = "A tiny little bear";
$regex = "A ([a-zA-Z]+)? little bear";
The ? character there says that the group before it goes optional, and the [a-zA-Z]+ indicates there will be one or more letters inside it.
Thus, using preg_match you can get a validation not 100% restrictive.
in case any one comes around looking for the right way to do it
$search_string = "Two years in, the <a href='site.com'>company</a> has expanded to 35 cities, five of which are outside the U.S. ";
$subject = "Two years in,the company has expanded to 35 cities, five of which are outside the U.S.";
similar_text ($search_string,$subject,$sim);
echo 'text is: ' .round($sim). '% similar';
result:
text is:85% similar
you can use the result to determine what value is a match in your particular circumstances like so:
similar_text($search_string,$subject,$sim);
if($sim >=85){
echo 'MATCH';
}
Just for grins, I tried this out using Perl.
All the warnings about using regex to parse html apply:
(Should not use on html).
This will split the Search string on either html or entities or whitespace.
After that, the parts are joined with .*? using the modifiers (?is).
This is not a true partial matching substring regex because
it requires all the parts to exist.
This does overcome the distance or content between them however.
Possibly, with a little algorithm work, it could be tweaked in such
a way that parts are optional, in the form of clustering.
use strict;
use warnings;
my $search_string = "Two years in, the <a href='site.com'>company</a> has expanded to 35 cities, five of which are outside the U.S. ";
my $subject = "Two years in,the company has expanded to 35 cities, five of which are outside the U.S.";
## Trim leading/trailing whitespace from $search_string
$search_string =~ s/^\s+|\s+$//g;
## Split the $search_string on html tags or entities or whitespaces ..
my #SearchParts = split m~
\s+|
(?i)[&%](?:[a-z]+|(?:\#(?:[0-9]+|x[0-9a
-f]+)));|<(?:script(?:\s+(?:"[\S\s]*?"|'
[\S\s]*?'|[^>]*?)+)?\s*>[\S\s]*?</script
\s*|(?:/?[\w:]+\s*/?)|(?:[\w:]+\s+(?:(?:
(?:"[\S\s]*?")|(?:'[\S\s]*?'))|(?:[^>]*?
))+\s*/?)|\?[\S\s]*?\?|(?:!(?:(?:DOCTYPE
[\S\s]*?)|(?:\[CDATA\[[\S\s]*?\]\])|(?:-
-[\S\s]*?--)|(?:ATTLIST[\S\s]*?)|(?:ENTI
TY[\S\s]*?)|(?:ELEMENT[\S\s]*?))))>
~x, $search_string;
## Escape the metacharacters from SearchParts
#SearchParts = grep { $_ = quotemeta } #SearchParts;
## Join the SearchParts into a regex
my $rx = '(?si)(?:' . ( join '.*?', #SearchParts ) . ')';
## Try to match SearchParts in the $subject
if ( $subject =~ /$rx/ )
{
print "Match in subject:\n'$&' \n";
}
Output:
Match in subject:
'Two years in,the company has expanded to 35 cities, five of which are outside the U.S.'
edit:
As a side note, each element of #SearchParts could be further split//
once again (on each character), joining with .*?.
This would get into the realm of a true partial match.
Not quite there though as each character is required to match.
The order is maintained, but each one would have to be optional.
Usually, without capture groups, there is no way to tell the percentage
of actual letter's matched.
If you were to use Perl however, it's fairly easy to count in
regex Code construct (?{{..}}) where a counter can be incremented.
I guess, at that point it becomes non-portable. Better to use C++.

Regexp that defines a string with varing number of three different character group

I'm creating in PHP a $pattern for a preg_match($pattern, $password) function, and it should be a regexp that defines a string made this way:
at least $str_lenght long AND
that has at least $num_capitals capital letters AND
that has at least $num_numerals numbers AND
that has at least $num_symbols from this range: !##$%^&*()-+?
How can I do it?
You can build your regex this way using lookaheads:
$re = '/(?=(.*?[A-Z]){' . $num_capitals . '})(?=(.*?[0-9]){' . $num_numerals .
'})(?=(.*?[!##$%^&*()+?-]){' . $num_symbols . '}).{' . $str_lenght . ',}/';
#anubhava greatly destroyed my answer, but I'll leave it for an alternative approach
. matches all characters, {5,} repeats it 5+ times. However, since we aren't making one long expression, I would still use the faster strlen(). Demo:
.{5,}
For the rest, I would match a character class and use preg_match_all() which will return the total number of matches (may be 0).
Here are the 3 character classes you want:
[A-Z]
[0-9] OR \d (\d will match other numeric characters, like Arabic, etc.)
[!##$%^&*()+?-]
An example implementation:
$count = preg_match_all('/[A-Z]/', 'FOObar', $matches);
// $count = 3;
Please note in the final character class ([!##$%^&*()+?-]), ^ must not come first and - must not be in the middle..otherwise you'll need to escape them with \ because they have special meanings.
Try this
$strRegex = '/((?=.*\d).{'.$num_numerals.'})((?=.*[a-z]).{'.$num_lower.'})((?=.*[A-Z]).{'$num_upper'})((?=.*[##$%]).{'.$num_symbols.'})/';
Or
$strRegex = '/((?=.*\d).{'.$num_numerals.'})((?=.*[a-z]).{1,'.$num_lower.'})((?=.*[A-Z]).{'$num_upper'})((?=.*[##$%]).{'.$num_symbols.'})/';
And based on $num_lower you can limit nuber of characters allowed in the password.
And the pass will accept lower case from 1 to $num_lower

PHP: Split a postcode into two parts?

I need to split a UK postcode into two. I have some code that gets the first half but it doesn't cover everything (such as gir0aa). Does anyone have anything better that validates all UK postcodes then breaks it into the first and second half? Thanks.
function firstHalf($postcode) {
if(preg_match('/^(([A-PR-UW-Z]{1}[A-IK-Y]?)([0-9]?[A-HJKS-UW]?[ABEHMNPRVWXY]?|[0-9]?[0-9]?))\s?([0-9]{1}[ABD-HJLNP-UW-Z]{2})$/i',$postcode))
return preg_replace('/^([A-Z]([A-Z]?\d(\d|[A-Z])?|\d[A-Z]?))\s*?(\d[A-Z][A-Z])$/i', '$1', $postcode);
}
will split ig62ts into ig6 or cm201ln into cm20.
The incode is always a single digit followed by two alpha characters, so the easiest way to split is to chop off the last three characters, allowing it to be validated easily.
Trim any spaces: they're used purely for ease of human readability.
The first part that remains is then the outcode. This can be a single alpha character followed by 1 or 2 digits; two alpha characters followed by 1 or 2 digits; or one or two characters followed by a single digit, followed by an additional alpha character.
There are a couple of notable exceptions: SAN TA1 is a recognised postcode, as is GIR 0AA; but these are the only two that don't follow the standard pattern.
To test if a postcode is valid, a regexp isn't really adequate... you need to do a lookup to retrieve that information.
If you do not care about validation, based on information here (at the bottom of the page there are different regexps, including yours) http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom your can use for everything except of Anguilla
$str = "BX3 2BB";
preg_match('#^(.*)(\s+)?(\d\w{2})$#', $str, $matches);
echo "Part #1 = " . $matches[1];
echo "<br>Part #2 = " . $matches[3];

PHP preg_match with regex: only single hyphens and spaces between words continue

I was trying to write an regex that allows single hyphens and single spaces only within words but not at the beginning or at the end of the words.
I thought I have this sorted from the answer I got yesterday, but I just realised there is small error which I don't quite understand,
Why it won't accept the inputs like,
'forum-category-b forum-category-a'
'forum-category-b Counter-terrorism'
'forum-category-a Preventing'
'forum-category-a Preventing Violent'
'forum-category-a International-Research-and-Publications'
'International-Research-and-Publications forum-category-b forum-category-a'
but it takes,
'forum-category-b'
'Counter-terrorism forum-category-a'
'Preventing forum-category-a'
'Preventing Violent forum-category-a'
'International-Research-and-Publications forum-category-b'
Why is that? How can I fix it? It Below is the regex with the initial test, but ideally it should accept all the combination inputs above,
$aWords = array(
'a',
'---stack---over---flow---',
' stack over flow',
'stack-over-flow',
'stack over flow',
'stacoverflow'
);
foreach($aWords as $sWord) {
if (preg_match('/^(\w+([\s-]\w+)?)+$/', $sWord)) {
echo 'pass: ' . $sWord . "\n";
} else {
echo 'fail: ' . $sWord . "\n";
}
}
accept/ to reject the input like these below,
---stack---over---flow---
stack-over-flow- stack-over-flow2
stack over flow
Thanks.
Your pattern does not do what you want. Let's break it apart:
^(\w+([\s-]\w+)?)+$
It matches strings that consist solely of one or more sequences of the pattern:
\w+([\s-]\w+)?
...which is a sequence of word characters, followed optionally by one other sequence of word characters, separated by one space or dash character.
In other words, your pattern searches for strings like:
xxx-xxxyyy-yyyzzz zzz
...but you intent to write a pattern that would find:
xxx-xxxxxx-xxxxxx yyy
In your examples, this one is matched:
Counter-terrorism forum-category-a
...but it is interpreted as the following sequence:
(Counter(-terroris)) (m( foru)) (m(-categor) (y(-a))
As you can see, the pattern did not really find the words you are looking for.
This example is not matched:
forum-category-a Preventing Violent
...since the pattern cannot form groups of "word characters, space-or-dash, word-characters" when it encounters a single word character followed by space or dash:
(forum(-categor)) (y(-a)) <Mismatch: Found " " but expected "\w">
If you would add another character to "forum-category-a", say "forum-category-ax", it would match again, since it could split at the "ax":
(forum(-categor)) (y(-a)) (x( Preventin)) (g( Violent))
What you are actually interested in is a pattern like
^(\w+(-\w+)*)(\s\w+(-\w+)*)*$
...which would find a sequence of words that may contain dashes, separated by spaces:
(forum(-category)(-a)) ( Preventing) ( Violent)
By the way, I tested this using a Python script, and while trying to match your pattern against the example string "International-Research-and-Publications forum-category-b forum-category-a", the regular expression engine seemed to run into an infinite loop...
import re
expr = re.compile(r'^(\w+([\s-]\w+)?)+$')
expr.match('International-Research-and-Publications forum-category-b forum-category-a')
the part of your pattern ([\s-]\w+)? is the issue. It's only allowing for one repetition (the trailing ?). Try changing the last ? to * and see if that helps.
Nope, I still believe that's the problem. The original pattern is looking for "word" or "word[space_hyphen]word" repeated 1+ times. Which is weird because the pattern should fall within another match. But switching the question mark worked for me.
There should be only one answer to this problem:
/^((?<=\w)[ -]\w|[^ -])+$/
There is only 1 rule as stated \w[ -]\w and thats it. And its on a per character basis granularity, and cannot be anthing else. Add the [^ -] for the rest.

Categories