Here is my regex to scrap image from page.
preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $html, $matches
But it fails when image url is like this:
src="//upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Adolescent_girl_sad_0001.jpg/200px-Adolescent_girl_sad_0001.jpg"
I think it need to add OR operation in above regex to allove image starting with //.
documentation says | pipe will do or operation. But how to add it in above regex?
You could just avoid the wrath of the pony instead...
$dom = new DOMDocument();
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
$sources = array();
foreach($image as $img) $sources[] = $img->getAttribute("src");
Done!
Related
I am using preg_replace to alter the image path only except image name like:
<img src="http://www.ByPasspublishing.com/uploadedImages/TinyUploadedImage/SOC_Aggression_Define_Fig Territorial Aggression.jpg" />
to
Below is the code I have tried but it replace the total path. Please help me to solve this problem:
$html = preg_replace('/<img([^>]+)src="([^"]+)"/i','<img\\1src="newfolder"',$slonodes[0]->SLO_content);
Another thing is that $slonodes[0]->SLO_content returns an HTML content within which I have to find the image and replace the path of that image so the path will not be same.
Thanks in advance.
Alternatively, you could use an HTML Parser for this task, DOMDocument in particular:
$html = '<img src="http://www.ByPasspublishing.com/uploadedImages/TinyUploadedImage/SOC_Aggression_Define_Fig Territorial Aggression.jpg" />';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$img = $dom->getElementsByTagName('img')->item(0);
$new_src = 'newfolder/' . basename($img->getAttribute('src'));
$img->setAttribute('src', $new_src);
echo $dom->saveHTML($img);
Why use regex? O.o
You can do something like this:
$path = "http://www.ByPasspublishing.com/uploadedImages/TinyUploadedImage/SOC_Aggression_Define_Fig Territorial Aggression.jpg";
$pathNew = "newfolder/".substr(strrchr($path, "/"), 1);
print $pathNew;
I cut on the last "/" char and then concatenate Strings and chars to obtain your desired output.
I'm trying to find ALL images in my blog posts with regex. The code below returns images IF the code is clean and the SRC tag comes right after the IMG tag. However, I also have images with other attributes such as height and width. The regex I have does not pick that up... Any ideas?
The following code returns images that looks like this:
<img src="blah_blah_blah.jpg">
But not images that looks like this:
<img width="290" height="290" src="blah_blah_blah.jpg">
Here is my code
$pattern = '/<img\s+src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
$html = <<<DATA
<img width="290" height="290" src="blah.jpg">
<img src="blah_blah_blah.jpg">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
blah.jpg
blah_blah_blah.jpg
Ever think of using the DOM object instead of regex?
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You'd better to use a parser, but here is a way to do with regex:
$pattern = '/<img\s.*?src="([^"]+)"/i';
The problem is that you only accept \s+ after <img. Try this instead:
$pattern = '/<img\s+[^>]*?src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Try this:
$pattern = '/<img\s.*?src=["\']([^"\']+)["\']/i';
Single or double quote and dynamic src attr position.
I am using preg_replace to delete from $content certain <img>:
$content=preg_replace('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/','',$content);
When I am now displaying the content using wordpress the_content function, I did indeed remove the <img>s from $content:
I'd like beforehand to get this images to place them elsewhere in the template. I am using the same regex pattern with preg_match_all:
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
But I can't get my imgs?
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
print_r($matches);
Array ( [0] => Array ( ) )
assuming and hopefully you are using php5, this is a task for DOMDocument and xpath. regex with html elements mostly will work, but check the following example from
<img alt=">" src="/path.jpg" />
regex will fail. since there aren't many guarantees in programming, take the guarantee that xpath will find EXACTLY what you want, at a perfomance cost, so to code it:
$doc = new DOMDocument();
$doc->loadHTML('<span><img src="com.png" /><img src="com2.png" /></span>');
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//span/img');
$html = '';
foreach($imgs as $img){
$html .= $doc->saveXML($img);
}
now you have all img elements in $html, use str_replace() to remove them from $content, and from there you can have a drink and be pleased that xpath with html elements is painless, just a little slower
ps. i couldnt be be bother understanding your regex, i just think xpath is better in your situation
at the end i have used preg_replace_callback:
$content2 = get_the_content();
$removed_imgs = array();
$content2 = preg_replace_callback('#(?!<img.+?id="featured_img".*?\/>)(<img.+? />)#',function($r) {
global $removed_imgs;
$removed_imgs[] = $r[1];
return '';
},$content2);
foreach($removed_imgs as $img){
echo $img;
}
I've got a large number of webpages stored in an MySQL database.
Most of these pages contain at least one (and occasionally two) entries like this...
<a href="http://first-url-which-always-ends-with-a-slash/">
<img src="http://second-different-url-which-always-ends-with.jpg" />
</a>
I'd like to just set up a little php loop to go through all the entires replacing the first url with a copy of the second url for that entry.
How can I use preg to:
find the second url from the image tag
replace the first url in the a tag, with a copy of the second url
Is this possible?
see this url
PHP preg match / replace?
see also:- http://php.net/manual/en/function.preg-replace.php
$qp = qp($html);
foreach ($qp->find("img") as $img) {
$img->attr("title", $img->attr("alt"));
}
print $qp->writeHTML();
Though it might be feasible in this simple case to resort to an regex:
preg_replace('#(<img\s[^>]*)(\balt=)("[^"]+")#', '$1$2$3 title=$3', $h);
(It would make more sense to use preg_replace_callback to ensure no title= attribute is present yet.)
You can do following :
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->formatOutput = true;
$source = "<a href=\"http://first-url-which-always-ends-with-a-slash/\">
<img src=\"http://second-different-url-which-always-ends-with.jpg\" />
</a>";
$dom->loadHTML($source);
$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
$atag = $tag->getAttribute('href');
$imgTag = $dom->getElementsByTagName('img');
foreach ($imgTag as $img) {
$img->setAttribute('src', $atag);
echo $img->getAttribute('src');
}
}
Thanks for the suggestions i can see how they are better than using Preg.
Even so i finally solved my own question like this...
$result = mysql_query($select);
while ($frow = mysql_fetch_array($result)) {
$page_content = $frow['page_content'];
preg_match("#<img\s+src\s*=\s*([\"']+http://[^\"']*\.jpg[\"']+)#i", $page_content, $matches1);
print_r($matches1);
$imageURL = $matches1[1] ;
preg_match("#<a\s+(?:[^\"'>]+|\"[^\"]*\"|'[^']*')*href\s*=\s(\"http://[^\"]+/\"|'http://[^']+/')#i", $page_content, $matches2);
print_r( $matches2 );
$linkURL = $matches2[1] ;
$finalpage=str_replace($linkURL, $imageURL, $page_content) ;
}
I have different format array of html
[amp;src]=>image, anotherone [posthtml]=>image2, anothertwo [nbsp;image3
How to extract img and text using common preg_match() by which we can get perfect image src and text from html. If it is not possible using preg_match(), is there another way to fix it.
If any one know please, reply it. How to fix it.
I need your hand.
The recommended way is to use DOM
$dom = new DOMDocument;
$dom->loadHTML($HTML);
$images = $dom->getElementsByTagName('img');
foreach($images as $im){
$attrs = $imgages->attributes();
$src = $attrs->getNamedItem('src')->nodeValue
}
Using Regular expression:
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $html, $m);
print_r($m);