Regular expression shouldn't match a certain number - php

I want to match every string like this
<img src="whatever" whatever alt="whatever" whatever height="any number but not 162" whatever />
in other words i want to match every string that, after the "link" contain whatever except the number 162 (entire number and not only the single character).
I use this
function embed($strr) {
$strr = preg_replace('#<img.*src="([^"]+)"(?:[^1]+|1(?:$|[^6]|6(?:$|[^2]))) />#is', '[img]$1[/img]', $strr);
return $strr;
}
but this don't match everything that contain 1 and not 162. How can i solve?

Instead of Regular Expression you can also use XPath which is specifically designed to extract information from structured markup documents. To get all the img nodes in the document not containing 162 for the height attribute, you would use
//img[not(contains(#height, 162))]
which I personally think is much easier to read than the Regex. Assuming that you just dont want the img nodes with fixed height of 162 instead of all that have 162 in the attribute, e.g. 2162 or 1623, etc, you can just do
//img[#height != 162]
There is various XML/HTML parsers that allow you to use XPath. For a decent list, see
Best Methods to parse HTML

You can use a negative lookahead like this
height="(?!162)([^"]+)
See it here on Regexr
(?!162) is a negative lookahead, it ensures, that "162" is not following at this position, but it does not match it.
I am not sure what you exactly want to match, but I think you get the idea.

Related

Remove multiple occurences of unknown text between tags

I want to use mySQL, or PHP (if too tough in SQL), to get rid of all occurrences of any text between certain strings/tags.
I have a database field that looks like the following:
<chrd>F Gm<br><indx>Here's a little song I wrote You might want to sing it note for note...<br><chrd> Bb C F<br><text>Don't Worry Be Happy<br><text>In every life we have some trouble When you worry you make it double...<br><text>Don't Worry Be Happy
I want to remove the text between the tags <chrd> and <br> (tags included or not). I have tried
SELECT substring_index(substring_index(text, '<chrd>', -1), '<br>', 1),'') FROM songs;
but returns only the last occurrence ( Bb C F). How can I select all occurrences?
Also, the above returns all the text if there is a song with no chords. I would like it to return an empty string.
After I get rid of the chords, I will do multiple REPLACE to remove all the tags, so that I will be left with only the plain text and the lyrics. (This is OK, I can do)
Note: I don't know about regular expressions and procedures
As the <chrd> tags have no closing tags in your string, a dom parser will be no use.
There are ways to do this using a regular expression or splitting strings, but I have to warn you they could be unreliable. That said, the following works, using a regular expression:
$string="
<chrd>F Gm<br><indx>Here's a little song I wrote You might want to sing it note for note...<br><chrd> Bb C F<br><text>Don't Worry Be Happy<br><text>In every life we have some trouble When you worry you make it double...<br><text>Don't Worry Be Happy";
$regex='/\<chrd\>.*?\<br\>/';
$result = preg_replace($regex,'',$string);
echo $result;
The regex breakdown:
\<chrd\> : search for <chrd> tag
.*? : any charachter 0 to unlimited times, as few as possible
\<br\> : untill it hits <br> (included)
With a fiddle

PHP preg_replace: find string part not starting with an exclamation point

I am working on some very messy Excel sheets, and trying to use PHP to find clues..
I have a MySQL database with all formulas from an excel document, and as usual, the cellnames from the current sheet do not have a "sheetname!" in front of it. To make it searchable (and find dead-routes in the formulas) I like to replace all formulas in the database with their sheetname as prefix.
Example:
=+(sheet_factory_costs!A17/sheet_employees!D23)+T12+W12
The database contains the name of the current sheet, and I like to change the formula above with that sheetname (let's call it "sheet_turnover").
=+(sheet_factory_costs!A17 / sheet_employees!D23)+sheet_turnover!T12+sheet_turnover!W12
I try this in PHP with preg_replace, and I think I need the following rules:
Find one or two letters, directly followed by a number. This is always a cell-adress within formulas.
When there is a ! on the position before, there is already a sheetname. So I am only looking for the letters and numbers NOT starting with an exclamation point.
The problem seems to be that the ! is also a special sign within patterns. Even if I try to escape it, it does not work:
$newformula =
preg_replace('/(?<\!)[A-Z]{1,2}[0-9]/',
'lala',
$oldformula);
(lala is my temporary marker to see if it is selecting the right cell-adresses)
(and yes, the lala is only places over the first number, but that's no issue right now)
(and yes, all Excel $..$.. (permanent) markers have already been replaced. No need to build that in the formula)
Your negative lookbehind is corrupt, you need to define it as (?<!!). However, you also need to use either a word boundary before it, or a (?<![A-Z]) lookbehind to make sure you have no other letters before the [A-Z]{1,2}.
So, you may use
'~\b(?<!!)[A-Z]{1,2}[0-9]~'
See the regex demo. Replace with sheet_turnover!$0 where $0 is the whole match value.
Details
\b - a word boundary (it is necessary, or name!AA11 would still get matched)
(?<!!) - no ! immediately to the left of the current location
[A-Z]{1,2} - 1 or 2 letters
[0-9] - a digit.
Another approach is match and skip "wrong" contexts and then match and keep the "right" ones:
'~\w+![A-Z]{1,2}[0-9](*SKIP)(*F)|\b[A-Z]{1,2}[0-9]~'
See this regex demo.
Here, \w+![A-Z]{1,2}[0-9](*SKIP)(*F)| part matches 1 or more word chars, then 1 or 2 uppercase ASCII letters and then a digit, and (*SKIP)(*F) will omit the match and will make the engine proceed looking for matches after the end of the previous match.

Detect cloth sizes with regex

I am trying to detect with regex, strings that have a pattern of {any_number}{x-}{large|medium|small} for a site with clothing I am building in PHP.
I have managed to match the sizes against a preconfigured set of strings by using:
$searchFor = '7x-large';
$regex = '/\b'.$searchFor.'\b/';
//Basically, it's finding the letters
//surrounded by a word-boundary (the \b bits).
//So, to find the position:
preg_match($regex, $opt_name, $match, PREG_OFFSET_CAPTURE);
I even managed to detect weird sizes like 41 1/2 with regex, but I am not an expert and I am having a hard time on this.
I have come up with
preg_match("/^(?<![\/\d])([xX\-])(large|medium|small)$/", '7x-large', $match);
but it won't work.
Could you pinpoint what I am doing wrong?
It sounds like you also want to match half sizes. You can use something like this:
$theregex = '~(?i)^\d+(?:\.5)?x-(?:large|medium|small)$~';
if (preg_match($theregex, $yourstring,$m)) {
// Yes! It matches!
// the match is $m[0]
}
else { // nah, no luck...
}
Note that the (?i) makes it case-insensitive.
This also assumes you are validating that an entire string conforms to the pattern. If you want to find the pattern as a substring of a larger string, remove the ^ and $ anchors:
$theregex = '~(?i)\d+(?:\.5)?x-(?:large|medium|small)~';
Look at the specification you have and build it up piece by piece. You want "{any_number}{x-}{large|medium|small}".
"{any_number}" would be \d+. This does not allow fractional numbers such as 12.34, but the question does not specify whether they are required.
"{x-}" is a simple string x-
"{large|medium|small}" is a choice between three alternatives large|medium|small.
Joining the pieces together gives \d+x-(large|medium|small). Note the brackets around the alternation, without then the expression would be interpreted as (\d+x-large)|medium|small.
You mention "weird sizes like 41 1/2" but without specifying how "weird" the number to be matched are. You need a precise specification of what you include in "weird" before you can extend the regular expression.

Regex match number consisting of specific range, and length?

I'm trying to match a number that may consist of [1-4], with a length of {1,1}.
I've tried multiple variations of the following, which won't work:
/^string\-(\d{1,1})[1-4]$/
Any guidelines? Thanks!
You should just use:
/^string-[1-4]$/
Match the start of the string followed by the word "string-", followed by a single number, 1 to 4 and the end of the string. This will match only this string and nothing else.
If this is part of a larger string and all you want is the one part you can use something like:
/string-[1-4]\b/
which matches pretty much the same as above just as part of a larger string.
You can (in either option) also wrap the character class ([1-4]) in parentheses to get that as a separate part of the matches array (when using preg_match/preg_match_all).
This is not hard:
/^string-([1-4]{1})$/

Rotation in PHP's regex

How can you match the following words by PHP, either by regex/globbing/...?
Examples
INNO, heppeh, isi, pekkep, dadad, mum
My attempt would be to make a regex which has 3 parts:
1st match match [a-zA-Z]*
[a-zA-Z]?
rotation of the 1st match // Problem here!
The part 3 is the problem, since I do not know how to rotate the match.
This suggests me that regex is not the best solution here, since it is too very inefficient for long words.
I think regex are a bad solution. I'd do something with the condition like: ($word == strrev($word)).
Regexs are not suitable for finding palindromes of an arbitrary length.
However, if you are trying to find all of the palindromes in a large set of text, you could use regex to find a list of things that might be palindromes, and then filter that list to find the words that actually are palindromes.
For example, you can use a regex to find all words such that the first X characters are the reverse of the last X characters (from some small fixed value of X, like 2 or 3), and then run a secondary filter against all the matches to see if the whole word is in fact a palindrome.
In PHP once you get the string you want to check (by regex or split or whatever) you can just:
if ($string == strrev($string)) // it's a palindrome!
i think this regexp can work
$re = '~([a-z])(.?|(?R))\1~';

Categories