I have this regex in PHP:
$regex = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
It captures all image tag sources in a string, but I want to only capture JPG files. I've tried to mess around with (.*) but I've only proven that I suck at regex... Right now I'm filtering the array but feels too much like a hack when I can just do it straight up with a proper match.
Try this:
$regex = '/<img ([^>]* )?src=[\"\']([^\"\']*\.jpe?g)[\"\']/Ui';
I also removed the extra | in the character classes that was not needed.
First, get all img tags with an HTML parser. Then, take those whose src attribute's value is matched by the regex \.(jpeg|jpg)$.
For example, using this parser:
$html = file_get_html('http://example.foo.org/bar.html');
foreach ($html->find('img') as $img) {
if (preg_match ("\.(jpeg|jpg)$", $img->src) {
//save $img or $img->src or whatever you need
}
}
Edit: I shortened the regular expression. You can also use \.jpe?g$.
Try:
$regex = '/<img[^>]*'.'src=[\"|\'](.*[.]jpg)[\"|\']/Ui';
you have to be careful to escape ' since you are using it as PHP delimeter.
Also searching the file which end by .jpg or jpeg would make it.
$regex = '/<img[^>]*src=["\']([^\'"]*)\.(jpg|jpeg)["\'][^>]*>/Ui';
Just need to search for the .jpg before the closing quotes I believe
$regex = '/<img[^>]*'.'src=[\"|\'](.*\.jpg)[\"|\']/Ui';
You all forgot that tags may have spaces between < and img
So a correct regexp should start with
/<\s*img
Related
Trying to come up with a PHP regexp that would extract the content of the first [img]...[/img] tag in a text. Can be img or IMG as well.
Really appreciate any help.
Using my poor regexp, I came by with the following, which doesn`t work:
/[img](.+)[/img/]
Here is one example of text that should work:
http://pt.wikipedia.org/wiki/Erich_von_D%C3%A4niken]Erich Von Daniken[/url][/align] [align=center][img]http://www.ceticismoaberto.com/wp-content/uploads/2012/04/erich_von_daniken_7.jpg[/img]
It should return only:
http://www.ceticismoaberto.com/wp-content/uploads/2012/04/erich_von_daniken_7.jpg
I am using a webpage to text the regexp:
http://www.myregextester.com/index.php
the Php code I`m using is :
$message=$post["message"];
//try to locate the first image on the post text
if (preg_match("!http://[^?#]+\.(?:jpe?g|png|gif)!Ui", $message, $matches)) {
return $matches[0];
}
The regexp abovev didn`t work for some cases, like the one I showed before and that's why I'm trying a different approach.
You must scape all brackets characters, and perhaps you have carriage returns. Try this:
\[img\](.|\n)*\[/img\]
This should do the trick
/\[img\](.*?)\[\/img\]/i
[] characters should be escaped with \ because they are used by the regex parser.
I need to do some cleanup on strings that look like this:
$author_name = '<a href="http://en.wikipedia.org/wiki/Robert_Jones_Burdette>Robert Jones Burdette </a>';
Notice the href tag doesn't have closing quotes - I'm using the DOMParser on a large table of these to extract the text, and it borks on this.
I would like to look at the string in $author_name;
IF the first > does NOT have a " before it, replace it with "> to close the tag correctly. If it is okay, just skip and do the next step. Be sure not to replace the second > at all.
Using php regex, I haven't been able to find a working solution - I could chop up the whole thing and check its parts, but that would be slow and I think there must be a regex that can do what I want.
TIA
What you can do is, find the first closing tag, with or without the double-quote ("), and replace it with (">):
$author_name = preg_replace('/(.+?)"?>(.+?)/', '$1">$2', $author_name);
http://www.barattalo.it/html-fixer/
Download that, then include it in your php.
The rest is quite easy:
$dirty_html = ".....bad html here......";
$a = new HtmlFixer();
$clean_html = $a->getFixedHtml($dirty_html);
It's common for people to want to use regular expressions, but you must remember that HTML is not regular.
I was using c and c# for programming and I am using some third-party regular expression library to identify link pattern. But yesterday, for some reason, someone asked me to use php instead. I am not familiar with the php regular expression but I try, didn't get the result as expected. I have to extract and replace the link of an image src of the form :
<img src="/a/b/c/d/binary/capture.php?id=main:slave:demo.jpg"/>
I only want the path in the src but the quotation could be double or single, also the id could be vary form case to case (here it is main:slave:demo.jpg)
I try the following code
$searchfor = '/src="(.*?)binary\/capture.php?id=(.+?)"/';
$matches = array();
while ( preg_match($searchfor, $stringtoreplace, $matches) == 1 ) {
// here if mataches found, replace the source text and search again
$stringtoreplace= str_replace($matches, 'whatever', $stringtoreplace);
}
But it doesn't work, anything I miss or any mistake from above code?
More specifically, let say I have a image tag which give the src as
<img src="ANY_THING/binary/capture.php?id=main:slave:demo.jpg"/>
here ANY_THING could be anything and "/binary/capture.php?id=" will be fixed for all cases, the string after "id=" is of pattern "main:slave:demo.jpg", the string before colon will be changed from case to case, the name of the jpeg will be varied too. I would expect to have it replaced as
<img src="/main/slave/demo.jpg"/>
Since I only have right to modify the php script at specific and limit time, I want to debug my code before any modification made. Thanks.
First of all, as you may know, regex shouldn't be used to manipulate HTML.
However, try:
$stringtoreplace = '<img src="/a/b/c/d/binary/capture.php?id=main:slave:demo.jpg"/>';
$new_str = preg_replace_callback(
// The regex to match
'/<img(.*?)src="([^"]+)"(.*?)>/i',
function($matches) { // callback
parse_str(parse_url($matches[2], PHP_URL_QUERY), $queries); // convert query strings to array
$matches[2] = '/'.str_replace(':', '/', $queries['id']); // replace the url
return '<img'.$matches[1].'src="'.$matches[2].'"'.$matches[3].'>'; // return the replacement
},
$stringtoreplace // str to replace
);
var_dump($new_str);
So I am using the preg_match() function to extract the SRC file name of an image and that part is working well.
The problem though is it isn't stopping at the closing quotation marks of the src field. It keeps going to including "some" other stuff in the image tag but then just stops. I really just want the src URL only.
preg_match('/<img.+src=[\'"](?P<src>.+)[\'"].*>/i', $rss, $image);
echo $image['src'];
Try out this, this may work
$regex = '/src="(.+?)"/';
then preg_match($regex,$rss,$image);
i am doing this with;
preg_match('#src.=.".*"',$stringToMatch,$result);
Besides I don't advise to parse HTML like this, but this would work for you;
$str = '<img src="to/file/img.png" width="100">';
preg_match('~<img.*?src=[\'"]*(?P<src>[^\s\'"]*)~i', $str, $match);
print_r($match);
I have seen lots of similar queries to this, but am struggling to get them to work in my application because I still don't fully understand regular expressions!
I'm using the old FCKEditor WYSIWYG to upload an image, but need to store the src as the full URL rather than the relative path.
At the time I need to do the replace, I've already replaced quotes with " so the pattern I'm looking for needs to be:
src=\"/userfiles/
This needs to be replaced with
src=\"http://mydomain.com/userfiles/
Thanks for your suggestions!!
you can actually do this with a str_replace and it'd be simpler but here's a preg.
$html = preg_replace('!src="/userfiles/!', 'src="http://mydomain.com/userfiles", $html)
here's the str_replace
$html = str_replace('src="/userfiles/', 'src="http://mydomain.com/userfiles", $html)
if there are spaces here and there you'll need the preg and you'll want to add
\s* in the places that have spaces.