I have this link inside an HTML page.
<img id="catImage" width="250" alt="" src="http://dev-server2/image2.png" />
I want to get the value of src and am not getting along with preg_match and all of this regex stuff. Is this one right?
preg_match(
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([[a-zA-Z0-9]\/-._]*)\"/",
$artist_page["content"], $matches);
I get an empty array!
First and foremost, the portion of your regex that deals with the src attribute doesn't account for the colon that appears in the URL.
I'd suggest changing the src portion (and any other attribute values) to look instead for the close quote and capture everything between:
... src=\"([^\"]*)\" ....
Does this work?
'/<img id="catImage"[^>]+src="([^"]*)"/'
I'm still really new on regex but I thought I would throw my thoughts out there and get some criticism for it. Should the expression be something like (?<=(src=")).*(?=["])? (not quite PHP formatted, yet). This would grab the contents of the src attribute.
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([a-zA-Z0-9/.:_-]*)\"/"
Should do. Note that I edited the range [ ... ] part. The hyphen (-) has a special meaning so I put it last to add it as a literal in the range. Also, I added the : char (thanks #user333699). This hints, however, that you should not try and think of any valid URL character. Instead, match anything until you know that the entire value of the src attribute is matched:
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([^\"]*)\"/"
I.e., anything that is not a quote (").
Note that in order to get the value of src you'll have to perform additional computation after the preg_match, as your match is going to return the entire tag.
It might be worth diving into XPath, depending on what you really want to do with it.
Related
I've got a database with a lot of user made entries grown about 10 years. The users had the option to put HTML-code in their content. And some didn't that well. So I've a lot of content in where the quotes are missing. Need a valid HTML-code for an ex/import via XML.
Had tested to replace width but my regex doesn't work. Do you've an idea where's my fault?
$out=preg_replace("/<a href=h(.)*>/","<a href=\"h$1\">",$out);
PS: If you have an idea how to automatically make a correction on wrong html source this would alternatively be great.
I think you wanted to use "/<a href=h(.*)>/" (mind the star inside the parenthesis) since you want to capture all characters after the h and before the > inside the capture group.
You can also use <a href=([^"].*)> since the href may not start with h. This regex captures all href values that do not start with ".
Yet, all of these assume that the href is the last attribute in your a, i.e.., ending with >.
As a more general rule, I came up with (?<key>\w*)\s*=\s*(?<value>[^"][^\s>]*) that finds attribute-value pairs, separated by =. The values may not start with ", and they go until the next whitespace or >. Use this with caution, since it may fail in serveral circumstances: Multi-line html, inline JavaScript, etc.
Whether it is a good idea to use RegEx for such a task is a different discussion.
I really don't get the right solution.
My standard img replace code is:
preg_replace('~\[img](.*?)\[/img\]~s','<img src="$1" />',$text);
Of course it works. But im trying to replace the bbcode if width and height is set. But thats optional, so it should work also if only 1 dimension is set or nothing.
The bbcode looks like: [img=12x12]link of the image[/img]
So the bbcode should look like:
preg_replace('~\[img=(.*?)x(.*?)\](.*?)\[/img\]~s','<img width="$1" height="$2" src="$3" />',$text);
I guess I got it wrong. Anybidy knows how to solve this?
Try this regex:
preg_replace('~\[img=?(\d+)?x?(\d+)?\](.*?)\[/img\]~s','<img width="$1" height="$2" src="$3" />',$text);
The way you coded it, it wouldn't match all 3 cases you wanted: [img], [img=NN], and [img=NNxNN]. It would only match in the case both dimensions were provided.
Your regexp should definitely work. I would have used \d+ though which makes sure the value exists and are of numeric type:
~\[img=(\d+)x(\d+)\](.*?)\[/img\]~s
What error are you getting with your code, or rather, what string are you expecting to match but you don't?
I am trying to remove a part of a document on the fly using preg_replace().
/* target example:
<li id="footer-poweredbyico">
<img src="//bits.wikimedia.org/skins-1.18/common/images/poweredby_mediawiki_88x31.png" alt="Powered by MediaWiki" width="88" height="31" />
</li>
*/
$reg = preg_quote('<li id="footer-poweredbyico">.*?</li>');
preg_replace($reg,"",$str);
Ignore any errors in PHP, this question is about how to format the regular expression correctly to remove anything matching the target example opening and closing tags. The contents of the containing HTML tags will be different each time, hence .*? (I think that's wrong).
The preg_quote function actually does the opposite of what you want: its purpose is to disable all regex-features in a string. So in your case, what you currently have is (roughly) looking for an actual .*? in your HTML, instead of looking for zero or more characters. What you want is:
$str = preg_replace('/<li id="footer-poweredbyico">.*?<\/li>/s', '', $str);
The .*? portion of your regex is being escaped. Therefore, it isn't matching anything. Try this.
$reg = preg_quote('<li id="footer-poweredbyico">') . '.*?' . preg_quote('</li>');
preg_replace($reg,"",$str);
you don't need to use this hack approach, read the faq
"How can I edit / remove the Powered by MediaWiki image in the footer?"
preg_quote() will disable all the special characters you used, like .*?.
Try something like:
preg_replace('#<li id="footer-poweredbyico">.*?</li>#s', '', $str);
Now, the difficult question is whether to make this regex "greedy". Right now, it's ungreedy, which means it will break your page if there's another <li> inside the one you're trying to remove. But if you make it greedy, it will remove everything from the beginning of the <li> tag until the end of the last <li> element in the page, even if it's a different <li> element. Neither is ideal. This is why a proper HTML parser usually does a better job at manipulating HTML.
But if the page is simple enough, a regex will work.
EDIT Corrected a gross error, thanks to #Nilpo.
I'm useless with regular expressions and haven't been able to google myself a clear solution to this one.
I want to search+replace some text ($content) for any url inside the anchor's href with a new url (stored as the variable $newurl).
Change this:
<img alt="foobar" src="http://blogurl.com/files/2011/03/foobar_thumb.jpg" />
To this:
<img alt="foobar" src="http://blogurl.com/files/2011/03/foobar_thumb.jpg" />
I imagine using preg_replace would be best for this. Something like:
preg_replace('Look for href="any-url"',
'href="$newurl"',$content);
The idea is to get all images on a WordPress front page to link to their posts instead of to full sized images (which is how they default). Usually there would be only one url to replace, but I don't think it would hurt to replace all potential matches.
Hope all that made sense and thanks in advance!
Here is the gist of what I came up with. Hopefully it helps someone:
$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);
echo $content;
Mucho thanko to #WiseGuyEh
This should do the trick- you can test it here
(?<=href=("|'))[^"']+(?=("|'))
It uses lookahead and lookbehind to assert that anything it matches starts with href=" or href=' and makes sure that it ends with a single or double quote.
Note: the regex will not be able to determine if this is a valid html document- if there is a mix of single then double quotes used to enclose a href value, it will ignore this error!
I need to preg_match for
src="http:// "
where the blank space following // is the rest of the url ending with the ". My adapted doesn't seem to work:
preg_match('#src="(http://[^"]+)#', $data, $match);
And I am also struggling to get text that starts with > and ends with EITHER a full stop . or an exclamation mark ! or a question mark ? I have no idea how to do this one. An example of the text I want to preg_match for is:
blahblahblah>Hello world this is what I want.
I'm hoping a kind preg_match guru can tell me the answer and save me hours of headscratching.
Thanks for reading.
As for the URL:
preg_match('#src="(.*?)"#', $data, $match);
and for the second case, use />(.*?)(\.|!|\?)/
(.*?)" will match any character greedily up until the time it sees the end double quote
It seems that you want to parse a document or string which follows a HTML, DOM, XML or something similiar structure.
Use XPath, and parse to the Tag and let it return the src Attribute, this will save much trouble and you can forget about regular expressions.
Example: CLICK ME