I am trying to detect with regex, strings that have a pattern of {any_number}{x-}{large|medium|small} for a site with clothing I am building in PHP.
I have managed to match the sizes against a preconfigured set of strings by using:
$searchFor = '7x-large';
$regex = '/\b'.$searchFor.'\b/';
//Basically, it's finding the letters
//surrounded by a word-boundary (the \b bits).
//So, to find the position:
preg_match($regex, $opt_name, $match, PREG_OFFSET_CAPTURE);
I even managed to detect weird sizes like 41 1/2 with regex, but I am not an expert and I am having a hard time on this.
I have come up with
preg_match("/^(?<![\/\d])([xX\-])(large|medium|small)$/", '7x-large', $match);
but it won't work.
Could you pinpoint what I am doing wrong?
It sounds like you also want to match half sizes. You can use something like this:
$theregex = '~(?i)^\d+(?:\.5)?x-(?:large|medium|small)$~';
if (preg_match($theregex, $yourstring,$m)) {
// Yes! It matches!
// the match is $m[0]
}
else { // nah, no luck...
}
Note that the (?i) makes it case-insensitive.
This also assumes you are validating that an entire string conforms to the pattern. If you want to find the pattern as a substring of a larger string, remove the ^ and $ anchors:
$theregex = '~(?i)\d+(?:\.5)?x-(?:large|medium|small)~';
Look at the specification you have and build it up piece by piece. You want "{any_number}{x-}{large|medium|small}".
"{any_number}" would be \d+. This does not allow fractional numbers such as 12.34, but the question does not specify whether they are required.
"{x-}" is a simple string x-
"{large|medium|small}" is a choice between three alternatives large|medium|small.
Joining the pieces together gives \d+x-(large|medium|small). Note the brackets around the alternation, without then the expression would be interpreted as (\d+x-large)|medium|small.
You mention "weird sizes like 41 1/2" but without specifying how "weird" the number to be matched are. You need a precise specification of what you include in "weird" before you can extend the regular expression.
Related
It's a basic preg_replace that detects phone numbers (and just long numbers). My problem is I want to avoid detecting numbers between double "", single '' and forward slashes //
$text = preg_replace("/(\+?[\d-\(\)\s]{8,25}[0-9]?\d)/", "<strong>$1</strong>", $text);
I poked around but nothing is working for me. Your help will be appreciated.
I predict that your pattern is going to let you down more than it is going to satisfy you (or you are very comfortable with "over-matching" within the scope of your project).
While my suggestion really blows out the pattern length, a (*SKIP)(*FAIL) technique will serve you well enough by consuming and discarding the substrings that require disqualification. There may be a way of dictating the pattern logic with lookaround instead, but with an initial pattern with so many potential holes in it and no sample data, there are just too many variables to make a confident suggestion.
Regex101 Demo
Code: (Demo)
$text = <<<TEXT
A number 555555555 then some more text and a quoted number "(123)4567890" and
then 1 2 3 4 6 (54) 3 -2 and forward slashed /+--------0/ versus
+--------0 then something more realistic '234 588 9191' no more text.
This is not closed by the same character on both
ends: "+012345678901/ which of course is a _necessary_ check?
TEXT;
echo preg_replace(
'~([\'"/])\+?[\d()\s-]{8,25}\d{1,2}\1(*SKIP)(*FAIL)|((?!\s)\+?[\d()\s-]{8,25}\d{1,2})~',
"<strong>$2</strong>",
$text);
Output:
A number <strong>555555555</strong> then some more text and a quoted number "(123)4567890" and
then <strong>1 2 3 4 6 (54) 3 -2</strong> and forward slashed /+--------0/ versus
<strong>+--------0</strong> then something more realistic '234 588 9191' no more text.
This is not closed by the same character on both
ends: "<strong>+012345678901</strong>/ which of course is a _necessary_ check?
For the technical breakdown, see the Regex101 link.
Otherwise, this is effectively checking for "phone numbers" (by your initial pattern) and if they are wrapped by ', ", or / then the match is ignored and the regex engine continues looking for matches AFTER that substring. I have added (?!\s) at the start of the second usage of your phone pattern so that leading spaces are omitted from the replacement.
It seems that you're not validating, then you might be trying to write some expression with less boundaries, such as:
^\+?[0-9()\s-]{8,25}[0-9]$
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
I'm categorizing a few folders on my drives and I want to weed out low quality files using this regex (this works):
xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP
Now some filenames are in High Definition but still have DVD or XviD in their filenames but also 1080p, 720p, 1080i or 720i. I need a single regex to match the one above but exclude these words 1080p, 720p, 1080i or 720i.
Use two regex's
one to find if it matches
1080p|720p|1080i|720i
Then if it doesn't, that is no match is found for the above, check for matches:
xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP
Regular expressions don't support inverse matching, you could use negative look-arounds but for this task I wouldn't say they're appropriate. As you check for all the cases of 1080p-divx, you put a negative look ahead, however it doesn't catch divx-10bit-1080p, you couldn't achieve this in a simple regex.
You can use a negative lookahead for this
^(?!.*(?:1080p|720p|1080i|720i)).*(?:xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP)
This will match on your search strings, but fail if there is also 1080p|720p|1080i|720i in the string.
You can do it like this:
<pre><?php
$subjects = array('Arrival of the train at La Ciotat station.avi',
'Gardenator II - multi - DVDrip - 720i.mkv',
'The adventures of Roberto the bear - divx.avi',
'Tokyo’s Ginza District - dvdrip.mkv');
$pattern = '~(?(DEFINE)(?<excl>(?>d(?>vd(?>rip|scr)|ivx)|pdtv|xvid|320p|480p)))
(?(DEFINE)(?<keep>(?>[^17]+?|1(?!080[ip])|7(?!20[ip]))))
^\g<keep>*\g<excl>\g<keep>*$ ~ix';
foreach($subjects as $subject) {
if (preg_match($pattern, $subject)) echo $subject."\n"; }
The main interest is to avoid to test a lookahead on each character.
I found some partial help but cannot seem to fully accomplish what I need. I need to be able to do the following:
I need an regular expression to replace any 1 to 3 character words between two words that are longer than 3 characters with a match any expression:
For example:
walk to the beach ==> walk(.*)beach
If the 1 to 3 character word is not preceded by a word that's longer than 3 characters then I want to translate that 1 to 3 letter word to '<word> ?'
For example:
on the beach ==> on ?the ?beach
The simpler the rule the better (of course, if there's an alternative more complicated version that's more performant then I'll take that as well as I eventually anticipate heavy usage eventually).
This will be used in a PHP context most likely with preg_replace. Thus, if you can put it in that context then even better!
By the way so far I have got the following:
$string = preg_replace('/\s+/', '(.*)', $string);
$string = preg_replace('/\b(\w{1,3})(\.*)\b/', '${1} ?', $string);
but that results in:
walk to the beach ==> 'walk(.*)to ?beach'
which is not what I want. 'on the beach' seems to translate correctly.
I think you will need two replacements for that. Let's start with the first requirement:
$str = preg_replace('/(\w{4,})(?: \w{1,3})* (?=\w{4,})/', '$1(.*)', $str);
Of course, you need to replace those \w (which match letters, digits and underscores) with a character class of what you actually want to treat as a word character.
The second one is a bit tougher, because matches cannot overlap and lookbehinds cannot be of variable length. So we have to run this multiple times in a loop:
do
{
$str = preg_replace('/^\w{0,3}(?: \w{0,3})* (?!\?)/', '$0?', $str, -1, $count);
} while($count);
Here we match everything from the beginning of the string, as long as it's only up-to-3-letter words separated by spaces, plus one trailing space (only if it is not already followed by a ?). Then we put all of that back in place, and append a ?.
Update:
After all the talk in the comments, here is an updated solution.
After running the first line, we can assume that the only less-than-3-letter words left will be at the beginning or at the end of the string. All others will have been collapsed to (.*). Since you want to append all spaces between those with ?, you do not even need a loop (in fact these are the only spaces left):
$str = preg_replace('/ /', ' ?', $str);
(Do this right after my first line of code.)
This would give the following two results (in combination with the first line):
let us walk on the beach now go => let ?us ?walk(.*)beach ?now ?go
let us walk on the beach there now go => let ?us ?walk(.*)beach(.*)there ?now ?go
REGEX is something of a mystery to me. After searching on SO, I did download Espresso and went through the tutorial, but things still are not clicking for me. It may just be my specific need, but I haven't found any examples. What I want to do is find matches that are exactly two specific capital (or lowercase, mix) and then a string of numbers. Here are the cases I want to test against:
TL123
TL 123
tl123
tl 123
TLABC123
tlabc123
What I'm then trying to do is preg_replace the results for that match (and ultimately always return TL-123 - for example).
So, any letter or number combo after TL would return TL- and vice-versa. Any nudges in the right direction would be extremely helpful. Thanks!
Edit
It might actually be preg_match_all that I need for this.
To match the specified pattern, you can use:
TL(?:[^0-9]*)(\d+)
This will match a TL followed by anything that isn't a number (or nothing) and then a list of numbers.
You could use this with PHP's preg_replace() like:
$str = preg_replace('/TL(?:[^0-9]*)(\d+)/i', 'TL-$1', $str);
This example, of course, assumes that TL is the exact characters you want to match. If TL is just a placeholder and you could match anything, you could use the following:
preg_replace('/([a-z]{2})(?:[^0-9]*)(\d+)/i', '$1-$2', $str);
With this, I have it hardcoded to only allow 2 characters to match ({2}). You can modify this to any number if you need it to change.
Also, as you want the matched characters to always be uppercase, but can match lowercase, I would suggest to just use strtoupper() around the result (instead of a callback).
How can you match the following words by PHP, either by regex/globbing/...?
Examples
INNO, heppeh, isi, pekkep, dadad, mum
My attempt would be to make a regex which has 3 parts:
1st match match [a-zA-Z]*
[a-zA-Z]?
rotation of the 1st match // Problem here!
The part 3 is the problem, since I do not know how to rotate the match.
This suggests me that regex is not the best solution here, since it is too very inefficient for long words.
I think regex are a bad solution. I'd do something with the condition like: ($word == strrev($word)).
Regexs are not suitable for finding palindromes of an arbitrary length.
However, if you are trying to find all of the palindromes in a large set of text, you could use regex to find a list of things that might be palindromes, and then filter that list to find the words that actually are palindromes.
For example, you can use a regex to find all words such that the first X characters are the reverse of the last X characters (from some small fixed value of X, like 2 or 3), and then run a secondary filter against all the matches to see if the whole word is in fact a palindrome.
In PHP once you get the string you want to check (by regex or split or whatever) you can just:
if ($string == strrev($string)) // it's a palindrome!
i think this regexp can work
$re = '~([a-z])(.?|(?R))\1~';