Using preg_match and stopping at image SRC - php

So I am using the preg_match() function to extract the SRC file name of an image and that part is working well.
The problem though is it isn't stopping at the closing quotation marks of the src field. It keeps going to including "some" other stuff in the image tag but then just stops. I really just want the src URL only.
preg_match('/<img.+src=[\'"](?P<src>.+)[\'"].*>/i', $rss, $image);
echo $image['src'];

Try out this, this may work
$regex = '/src="(.+?)"/';
then preg_match($regex,$rss,$image);

i am doing this with;
preg_match('#src.=.".*"',$stringToMatch,$result);

Besides I don't advise to parse HTML like this, but this would work for you;
$str = '<img src="to/file/img.png" width="100">';
preg_match('~<img.*?src=[\'"]*(?P<src>[^\s\'"]*)~i', $str, $match);
print_r($match);

Related

extracting only image src, not other 'src' tags in html with php

I've been able to use preg_match on getting the src of any image tags, but I only really need the src of images with class 'wp-post-image' in this case. However, this code is returning nothing for me
$pattern = '<img(?:[^>]+src="(.+?)"[^>]+(?:id|class)="image"|[^>]+(?:id|class)="wp-post-image"[^>]+src="(.+?)")
';
preg_match($pattern,$results[$k]['description'], $matches);
$results[$k]['image'] = $matches[0];
print_r($results[$k]['image']);
The old version returns all image matches which includes 4 that have the class I'm looking for so maybe my syntax is just wrong?
old version that returned all images:
$pattern = '%<img.*?src=["\'](.*?)["\'].*?/>%i';
preg_match($pattern,$results[$k]['description'], $matches);
$src = $matches[0];
//print_r($src);
Asking to parse HTML with regex on SO will get you flamed. Not without reason, but flamed nonetheless.
If you insist on using regex (which, if for nothing else, is good practice), I suggest using a regex sandbox to test out patterns on sample text. One I use is https://regex101.com/ .
The old version (which you say worked) is looking for either single or double quotes around the src attribute. The new version is only looking for double quotes, which is possibly why it's failing.
Rather than trying to write a more complicated regex, it may be easier to use your old regex -- which grabs all the image links -- along with an expanded capture, and then look through the captured links to sort out the ones you need:
$pattern = '%(<img.*?src=["\'].*?["\'].*?/>)%i';

preg_match and images URL

I have a little problem with preg_match function in PHP. I think that I never will learn how to use this function. I want to extract URL of image from HTML without name of image. For example, if I have some link for image:
"/data/images/2013-10-03/someimage.jpg"
or
"http://something.com//data/images/2013-10-03/someimage.jpg"
How can I use preg_match function to delete everything left of last forward slash, so I can get only image name from URL?
Maybe it's smarter to use different function but I dont know which one?
P.S. Can you give me some good tutorial for preg_match function?
Maybe I forgot to say... I dont know how long is image name or what is image name exactly. I need function for extract only what is on right side from last forward slash.
$pattern = '/[\w\-]+\.(jpg|png|gif|jpeg)/';
$subject = 'http://something.com//data/images/2013-10-03/someimage.png';
$result = preg_match($pattern, $subject, $matches);
echo $matches[0]; //someimage.jpg
No need for regex or anything fancy:
$var = "http://something.com/data/images/2013-10-03/someimage.jpg";
$image = basename($var);
U need use preg_replace() and u can try use online for play with regular, it is a fast way to learn regex. http://preg_replace.onlinephpfunctions.com/
For example: /\/someimage.jpg/ replace on ''(null).
It will return http://something.com//data/images/2013-10-03 from http://something.com//data/images/2013-10-03/someimage.jpg.
You can use Simple HTML DOM Parser to get href between the a tags.
For example:
foreach($html->find('a.[class="your class"]') as $var)
// echo "href." >sometext";
hope this helps!

searching link with php regular expression

I was using c and c# for programming and I am using some third-party regular expression library to identify link pattern. But yesterday, for some reason, someone asked me to use php instead. I am not familiar with the php regular expression but I try, didn't get the result as expected. I have to extract and replace the link of an image src of the form :
<img src="/a/b/c/d/binary/capture.php?id=main:slave:demo.jpg"/>
I only want the path in the src but the quotation could be double or single, also the id could be vary form case to case (here it is main:slave:demo.jpg)
I try the following code
$searchfor = '/src="(.*?)binary\/capture.php?id=(.+?)"/';
$matches = array();
while ( preg_match($searchfor, $stringtoreplace, $matches) == 1 ) {
// here if mataches found, replace the source text and search again
$stringtoreplace= str_replace($matches, 'whatever', $stringtoreplace);
}
But it doesn't work, anything I miss or any mistake from above code?
More specifically, let say I have a image tag which give the src as
<img src="ANY_THING/binary/capture.php?id=main:slave:demo.jpg"/>
here ANY_THING could be anything and "/binary/capture.php?id=" will be fixed for all cases, the string after "id=" is of pattern "main:slave:demo.jpg", the string before colon will be changed from case to case, the name of the jpeg will be varied too. I would expect to have it replaced as
<img src="/main/slave/demo.jpg"/>
Since I only have right to modify the php script at specific and limit time, I want to debug my code before any modification made. Thanks.
First of all, as you may know, regex shouldn't be used to manipulate HTML.
However, try:
$stringtoreplace = '<img src="/a/b/c/d/binary/capture.php?id=main:slave:demo.jpg"/>';
$new_str = preg_replace_callback(
// The regex to match
'/<img(.*?)src="([^"]+)"(.*?)>/i',
function($matches) { // callback
parse_str(parse_url($matches[2], PHP_URL_QUERY), $queries); // convert query strings to array
$matches[2] = '/'.str_replace(':', '/', $queries['id']); // replace the url
return '<img'.$matches[1].'src="'.$matches[2].'"'.$matches[3].'>'; // return the replacement
},
$stringtoreplace // str to replace
);
var_dump($new_str);

How to use substr at regexp of preg replace

i am making a bbcode for youtube videos.User can post a video as bbcode eg like [youtube]http://www.youtube.com/watch?v=ihK2pPcDSHM[/youtube]. Next, it will convert it to html code.But instead of video,i want to show also the image of the video. So i do it like this:
$string = preg_replace("~\[yt]http://www.youtube.com/watch\?v=(.*)\[/yt]~Uis","<img src=\"http://img.youtube.com/vi/\\1/0.jpg\" />", $string);
It shows the image, but when somebody puts a url like:
http://www.youtube.com/watch?v=ihK2pPcDSHM&feature=channel
Then the image url becomes http://img.youtube.com/vi/ihK2pPcDSHM&feature=channel1/0.jpg which does not lead to a valid image. I am trying to change the \\1 to ".substr('\\1', 0,11)." but it doesnt have any result.
Any suggestion to solve this? Thanks!
Try a different pattern like:
~\[yt]http://www\.youtube\.com/watch\?v=([a-z0-9-_]+).*?\[/yt]~is
Just tell your regex to stop on the first & character:
$string = preg_replace("~\[yt]http://www.youtube.com/watch\?v=([^\\&]*)\[/yt]~Uis","<img src=\"http://img.youtube.com/vi/\\1/0.jpg\" />", $string);

Regex to capture JPG images only

I have this regex in PHP:
$regex = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
It captures all image tag sources in a string, but I want to only capture JPG files. I've tried to mess around with (.*) but I've only proven that I suck at regex... Right now I'm filtering the array but feels too much like a hack when I can just do it straight up with a proper match.
Try this:
$regex = '/<img ([^>]* )?src=[\"\']([^\"\']*\.jpe?g)[\"\']/Ui';
I also removed the extra | in the character classes that was not needed.
First, get all img tags with an HTML parser. Then, take those whose src attribute's value is matched by the regex \.(jpeg|jpg)$.
For example, using this parser:
$html = file_get_html('http://example.foo.org/bar.html');
foreach ($html->find('img') as $img) {
if (preg_match ("\.(jpeg|jpg)$", $img->src) {
//save $img or $img->src or whatever you need
}
}
Edit: I shortened the regular expression. You can also use \.jpe?g$.
Try:
$regex = '/<img[^>]*'.'src=[\"|\'](.*[.]jpg)[\"|\']/Ui';
you have to be careful to escape ' since you are using it as PHP delimeter.
Also searching the file which end by .jpg or jpeg would make it.
$regex = '/<img[^>]*src=["\']([^\'"]*)\.(jpg|jpeg)["\'][^>]*>/Ui';
Just need to search for the .jpg before the closing quotes I believe
$regex = '/<img[^>]*'.'src=[\"|\'](.*\.jpg)[\"|\']/Ui';
You all forgot that tags may have spaces between < and img
So a correct regexp should start with
/<\s*img

Categories