I need a regular expression for php that outputs everything between <!--:en--> and <!--:-->.
So for <!--:en-->STRING<!--:--> it would output just STRING.
EDIT: oh and the following <!--:--> nedds to be the first one after <!--:en--> becouse there are more in the text..
The one you want is actually not too complicated:
/<!--:en-->(.*?)<!--:-->/gi
Your matches will be in capture group 1.
Explanation:
The .*? is a lazy quantifier. Basically, it means "keep matching until you find the shortest string that will still fit this pattern." This is what will cause the matching to stop at the first instance of <!--:-->, rather than sucking up everything until the last <!--:--> in the document.
Usage is something like preg_match("/<!--:en-->(.*?)<!--:-->/gi", $input) if I recall my PHP correctly.
If you have just that input
$input = '<!--:en-->STRING<!--:-->';
You can try with
$output = strip_tags($input);
Try:
^< !--:en-- >(.*)< !--:-- >$
I don't think any of the other characters need to be escaped.
<!--:en--\b[^>]*>(.*?)<!--:-->
This will match the things between your tags. This will break if you nest your tags, but you didnt say you were doing that :)
Related
I have the following content in a string (query from the DB), example:
$fulltext = "Thank you so much, {gallery}art-by-stephen{/gallery}. As you know I fell in love with it from the moment I saw it and I couldn’t wait to have it in my home!"
So I only want to extract what it is between the {gallery} tags, I'm doing the following but it does not work:
$regexPatternGallery= '{gallery}([^"]*){/gallery}';
preg_match($regexPatternGallery, $fulltext, $matchesGallery);
if (!empty($matchesGallery[1])) {
echo ('<p>matchesGallery: '.$matchesGallery[1].'</p>');
}
Any suggestions?
Try this:
$regexPatternGallery= '/\{gallery\}(.*)\{\/gallery\}/';
You need to escape / and { with a \ before it. And you where missing start and end / of the pattern.
http://www.phpliveregex.com/p/fn1
Similar to Andreas answer but differ in ([^"]*?)
$regexPatternGallery= '/\{gallery\}([^"]*?)\{\/gallery\}/';
Don't forget to put / at the beginning and the end of the Regex string. That's a must in PHP, different from other programming languages.
{,},/ are characters that can be confused as a Regex logic, so you have to escape it using \ like \{.
Use ? to make the string to non-greedy, thus saves memory. It avoids error when facing this kind of string "blabla {galery}you should only get this{/gallery} but you also got this instead.{/gallery} Rarely happens but be careful anyway".
Try this RegEx:
\{gallery\}(.*?)\{\/gallery\}
The problem with your RegEx was that you did not escape the / in the closing {gallery}. You also need to escape { and }.
You should use .*? for a lazy match, otherwise if there are 2 tags in one string, it will combine them. I.e. {gallery}by-joe{/gallery} and {gallery}by-tim{/gallery} would end up as:
by-joe{/gallery} and {gallery}by-tim
However, using a lazy match, you would get 2 results:
by-joe
by-tim
Live Demo on Regex101
I'm trying to make a replace in a string with a regex, and I really hope the community can help me.
I have this string :
031,02a,009,a,aaa,AZ,AZE,02B,975,135
And my goal is to remove the opposite of this regex
[09][0-9]{2}|[09][0-9][A-Za-z]
i.e.
a,aaa,AZ,AZE,135
(to see it in action : http://regexr.com?3795f )
My final goal is to preg_replace the first string to only get
031,02a,009,02B,975
(to see it in action : http://regexr.com?3795f )
I'm open to all solution, but I admit that I really like to make this work with a preg_replace if it's possible (It became something like a personnal challenge)
Thanks for all help !
As #Taemyr pointed out in comments, my previous solution (using a lookbehind assertion) was incorrect, as it would consume 3 characters at a time even while substrings weren't always 3 characters.
Let's use a lookahead assertion instead to get around this:
'/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*/'
The above matches the beginning of the string or a comma, then checks that what follows does not match one of the two forms you've specified to keep, and given that this condition passes, matches as many non-comma characters as possible.
However, this is identical to #anubhava's solution, meaning it has the same weakness, in that it can leave a leading comma in some cases. See this Ideone demo.
ltriming the comma is the clean way to go there, but then again, if you were looking for the "clean way to go," you wouldn't be trying to use a single preg_replace to begin with, right? Your question is whether it's possible to do this without using any other PHP functions.
The anwer is yes. We can take
'/(^|,)foo/'
and distribute the alternation,
'/^foo|,foo/'
so that we can tack on the extra comma we wish to capture only in the first case, i.e.
'/^foo,|,foo/'
That's going to be one hairy expression when we substitute foo with our actual regex, isn't it. Thankfully, PHP supports recursive patterns, so that we can rewrite the above as
'/^(foo),|,(?1)/'
And there you have it. Substituting foo for what it is, we get
'/^((?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*),|,(?1)/'
which indeed works, as shown in this second Ideone demo.
Let's take some time here to simplify your expression, though. [0-9] is equivalent to \d, and you can use case-insensitive matching by adding /i, like so:
'/^((?![09]\d{2}|[09]\d[a-z])[^,]*),|,(?1)/i'
You might even compact the inner alternation:
'/^((?![09]\d(\d|[a-z]))[^,]*),|,(?1)/i'
Try it in more steps:
$newList = array();
foreach (explode(',', $list) as $element) {
if (!preg_match('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $element) {
$newList[] = $element;
}
}
$list = implode(',', $newList);
You still have your regex, see! Personnal challenge completed.
Try matching what you want to keep and then joining it with commas:
preg_match_all('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $input, $matches);
$result = implode(',', $matches);
The problem you'll be facing with preg_replace is the extra-commas you'll have to strip, cause you don't just want to remove aaa, you actually want to remove aaa, or ,aaa. Now what when you have things to remove both at the beginning and at the end of the string? You can't just say "I'll just strip the comma before", because that might lead to an extra comma at the beginning of the string, and vice-versa. So basically, unless you want to mess with lookaheads and/or lookbehinds, you'd better do this in two steps.
This should work for you:
$s = '031,02a,009,a,aaa,AZ,AZE,02B,975,135';
echo ltrim(preg_replace('/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]+/', '', $s), ',');
OUTPUT:
031,02a,009,02B,975
Try this:
preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string);
this will remove all substrings starting with the start of the string or a comma, followed by a non allowed first character, up to but excluding the following comma.
As per #GeoffreyBachelet suggestion, to remove residual commas, you should do:
trim(preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string), ',');
<a href="/search?hl=en&pwst=1&sa=X&ei=RCPqTqkHycryA_bK_f0J&ved=0CCUQvwUoAQ&q=psychology&spell=1" class=spell><b><i>psychology</i></b></a>
Hi, I'm looking to create a regex which matches this anchor and returns the inner text of it.
This is what I've been trying as a regex but without success.
'/<a[^>]+class=\"spell\"[^>]*>(.*?)<\/a>/isU'
It's probably something really silly. Thanks.
Problem was missing quotes surrounding the class. Not proper html markup but I neglected to notice so I just changed my regex to have quotes as optional.
Final regex:
'/<a[^>]+class=\"?spell\"?[^>]*>(.*?)<\/a>/is'
The regex looks OK, although you don't need to escape the quotes. Perhaps PHP doesn't like it if you use unnecessary escapes, although I doubt it. The problem is more likely the way you're using the regex. Did you access group number 1?
if (preg_match('%<a[^>]+class="spell"[^>]*>(.*?)</a>%', $subject, $regs)) {
$result = $regs[1];
}
Your problem might be the combination of (.*?) and /isU modifier. That U alters the meaning of ? making your match group (.*) greedy actually. Then you will match parts beyond the <\/a> end marker, until it encounters another.
If you remove the /U it works as expected. With your given input text, at least.
Here are two options to fix your expression:
For starters, you can simplify your expression to:
class=\"spell\"[^>]*>(.*?)<\/a>
This captures
<b><i>psychology</i></b>
in Group 1. I assume this is what you want to achieve.
Then, if you want to capture "psychology" without the bold and italic tags, you can use:
class=\"spell\"[^>]*>\s*<(\w+)>?\s*<(\w+)>?\s*(.*?)<\/\2>\s*<\/\1>\s*<\/a>
This captures "psychology" in group 3.
In group 1, you will find the first optional tag, whether it be "b", "strong" or nothing.
In group 2, you will find the second optional tag, which was "i" in your example.
The multiple instances of \s* allow for optional space between the tags.
Is this what you were looking for?
I'm having this issue with a regular expression in PHP that I can't seem to crack. I've spent hours searching to find out how to get it to work, but nothing seems to have the desired effect.
I have a file that contains lines similar to the one below:
Total','"127','004"','"118','116"','"129','754"','"126','184"','"129','778"','"128','341"','"127','477"','0','0','0','0','0','0
These lines are inserted into INSERT queries. The problem is that values like "127','004" are actually supposed to be 127,004, or without any formatting: 127004. The latter is the actual value I need to insert into the database table, so I figured I'd use preg_replace() to detect values like "127','004" and replace them with 127004.
I played around with a Regular Expression designer and found that I could use the following to get my desired results:
Regular Expression
"(\d+)','(\d{3})"
Replace Expression
$1$2
The line on the top of this post would end up like this: (which is what I am after)
Total','127004','118116','129754','126184','129778','128341','127477','0','0','0','0','0','0
This, however, does not work in PHP. Nothing is being replaced at all.
The code I am using is:
$line = preg_replace("\"(\d+)','(\d{3})\"", '$1$2', $line);
Any help would be greatly appreciated!
There are no delimiters in your regex. Delimiters are required in order for PHP to know what is the pattern to match and what is a pattern modifier (e.g. i - case-insensitive, U - ungreedy, ...). Use a character that doesn't occur in your pattern, typically you'll see a slash '/' used.
Try this:
$line = preg_replace("/\"(\d+)','(\d{3})\"/", '$1$2', $line);
You forgot to wrap your regular expression in front-slashes. Try this instead:
"/\"(\d+)','(\d{3})\"/"
use preg_replace("#\"(\d+)','(\d+)\"#", '$1$2', $s); instead of yours
demo:
$str = 'bcs >Hello >If see below!';
$repstr = preg_replace('/>[A-Z0-9].*?see below[^,\.<]*/','',$str);
echo $repstr;
What I want this tiny programme to output is "bcs >Hello ",but in fact it's only "bcs "
What's wrong with my pattern?
I think the problem is that you're misinterpreting how a non-greedy quantifier acts. Once it's in operation, yes, it stops earlier than it would otherwise. But it isn't aware of what comes before it (or potentially the text that comes later, either). It's only concerned with it's current position. Hence, the regular expression you posted will match all of:
">Hello >If see below!"
Let's see how this works:
/>[A-Z0-9].*?see below[^,\.<]*/
The regex first looks for ">" in "bcs >Hello >If see below!", and finds the first one, which is the one right before "Hello". Ok, let's check the next part of the expression:
[A-Z0-9]
The next char is a H, which matches the pattern [A-Z0-9]. Still good! Next:
.*?
Now we match all non-newline chars until we get to the first instance to match the remaining expressions of "see below[^,.<]*". If we had used just a plain greedy quantifier, we could match through multiple cases of "see below[^,.<]*" until we matched the last possible one. (So if your string had continued on, and there'd been other text match that pattern, it would have captured that as well) The non-greedy quantifier doesn't mean that your whole pattern will return the smallest possible match of all possible matches in the string. It just dictates how that particular character match functions.
You might want to try the following pattern then:
/>[A-Z0-9][^>]*?see below[^,\.<]*/
Hopefully this clears it up!
Why don't you write it like this:
$str = 'bcs >Hello >If see below!';
$repstr = preg_replace('/>If see below[^,\.<]*/','',$str);
echo $repstr;
This might be a good alternative to what you have.
The problem with your regexp is that instead of selecting what you want, you are selecting what you don't want and replacing that with an empty string.
The best approach, in my opinion, is selecting what you want, that is what the code below does. What you end up with is what is what is matched by the first sub-pattern otherwise you get your string back.
$str = 'bcs >Hello >If see below!';
$repstr = preg_replace('/^([\w]+ >[\w]+).*?see below.*?$/i', '$1', $str);
var_dump($repstr);
I hope this helps.