I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.
I really only want the first image. A lot of times the second image is a next button
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];
now I can take this "$first_img" and stick it in front of the short description
<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>
If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?
<?php
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
echo $image['src'];
?>
Don't use regex to parse html.
Use an html-parsing lib/class, as phpquery:
require 'phpQuery-onefile.php';
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";
Download: http://code.google.com/p/phpquery/
After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html.
An example is provided in the response in the first link I posted.
function get_first_image($html){
require_once('SimpleHTML.class.php')
$post_html = str_get_html($html);
$first_img = $post_html->find('img', 0);
if($first_img !== null) {
return $first_img->src';
}
return null;
}
Enjoy
Related
I have string that contains all the html elements , i have to remove everything except images .
Currently i am using this code
$e->outertext = "<p class='images'>".str_replace(' ', ' ', str_replace('Â','',preg_replace('/#.*?(<img.+?>).*?#is', '',$e)))."</p>";
Its serving my purpose but very slow in execution . Any other way to do the same would be appreciable .
The code you provided seems to not work as it should and even the regex is malformed. You should remove the initial slash / like this: #.*?(<img.+?>).*?#is.
Your mindset is to remove everything and leave just the image tags, this is not a good way to do it. A better way is to think in just capturing all image tags and then using the matches to construct the output. First let's capture the image tags. That can be done using this regex:
/<img.*>/Ug
The U flag makes the regex engine become lazy instead of eager, so it will match the encounter of the first > it finds.
DEMO1
Now in order to construct the output let's use the method preg_match_all and put the results in a string. That can be done using the following code:
<?php
// defining the input
$e =
'<div class="topbar-links"><div class="gravatar-wrapper-24">
<img src="https://www.gravatar.com/avatar" alt="" width="24" height="24" class="avatar-me js-avatar-me">
</div>
</div> <img test2> <img test3> <img test4>';
// defining the regex
$re = "/<img.*>/U";
// put all matches into $matches
preg_match_all($re, $e, $matches);
// start creating the result
$result = "<p class='images'>";
// loop to get all the images
for($i=0; $i<count($matches[0]); $i++) {
$result .= $matches[0][$i];
}
// print the final result
echo $result."</p>";
DEMO2
A further way to improve that code is to use functional programming (array_reduce for example). But I'll leave that as a homework.
Note: There is another way to accomplish this which is parsing the html document and using XPath to find the elements. Check out this answer for more information.
I want to remove specific image from string.
I need to remove Image with specific width and height.
I have tried this, but this will remove first image.
$description = preg_replace('/<img.*?>/', '123', $description, 1);
I want to remove any/all image(s) with specific width and height.
E.g. Remove this image <img width="1" height="1" ..../>
I suggest that you move away from using regex expressions to parse (or manipulate) HTML, because it's not a good idea, and here's a great SO answer on why.
For example, by using Peter's approach (preg_match_all('~<img src="(.+?)" width="(.+?)">~is', $content, $return);), you are assuming that all your images start with <img, are followed by the src, and then contain the width=, all typed exactly like that and with those exact whitespace separations, and those particular quotes. That means that you will not capture any of these perfectly valid HTML images that you want to remove:
<img src='asd' width="123">
<img src="asd" width="123">
<img src="asd" class='abc' width="123">
<img src="asd" width = "123">
While it's of course perfectly possible to catch all these cases, do you really want to go through all that effort? Why reinvent the wheel when you can just parse the HTML with already-existing tools. Take a look at this other question.
Made a little example for you
<?php
$string = 'something something <img src="test.jpg" width="10" height="10" /> and something .. and <img src="test.jpg" width="10" height="10" /> and more and more and more';
preg_match_all('~<img(.+?)width="10"(.+?)height="10"(.+?)/>~is', $string, $return);
foreach ($return[0] as $image) {
$string = str_replace($image, '', $string);
}
echo $string;
I got the solution:
$description = preg_replace('!<img.*?width="1".*?/>!i', '', $description);
I need to get all the source values from all image inside a container. I'm having some difficulty with this.
Allow me to explain the process.
All the data comes from a database. Inside the backofficce the user enter all the text and the image inside a textarea. To separate the text with the image the user must enter a pagebreak.
Let's go to the code
while ($rowClients = mysql_fetch_array($rsClients)) {
$result = $rowClients['content'];
$resultExplode = explode('<!-- pagebreak -->', $result);
// with resultExplode[0] I get the code and with resultExplde[1] I get the image
// Now with I want to get only the src value from resultExplode[1]
I already tried with strip_tags
$imageSrc = strip_tags($resultadoExplode[1]);
but it doesn't print anything.
I found this post but without success. I stopped in the first print_r.
Can anyone help me??
Thanks
try foreach, if you can't print it out.. (if that's the problem)
foreach($resultExplode as $key => $value){
echo "[".$key."]".$value;
}
I found a solution:
continuing with the previous code I worked with the split function.
So I start to strip the tags. This way I get the img isolated from the rest.
$image = strip_tags($resultExplode[1],"<img>");
Because all the img has the same structure like this: <img width="111" height="28" alt="alternative text" src="/path/to/the/file.png" title="title of the image">
I split this string, using " as a delimiter
$imgSplit = split('"', $image);
$src = $imgSplit[3];
Voilá. It's working
What do you say about this procedeure??
I have some text with images within it. I want to replace specific images within the text with something else.
i.e. the text contains an a youtube img url that I want to replace with the actual video link.
<img class="mceItem" src="http://img.youtube.com/vi/1MsVzAkmds0/default.jpg" alt="1MsVzAkmds0">
and replace it with the youtube Iframe code:
<iframe title="'.$id.'" class="youtube-player" type="text/html" width="576" height="400" src="http://www.youtube.com/embed/'.$id.'" frameborder="0"></iframe>
my function looks like this:
function replacelink($link) {
$find= ("/<img src=[^>]+\>/i");
$replace = youtube("\\2");
return preg_replace($find,$replace);
}
What do I need to change in the regex to do the above?
Your regex is looking for <img src=, but there is a class attribute between img and src. Using $find= '/<img.*src=[^>]+>/i'; corrects the problem; however, this illustrates why you shouldn’t use regex to parse HTML.
You wrote:
I have some text with images within it.
If the text you’re referring to is actually HTML, then there are better alternatives to using regex for this.
Update
I believe this is what you’re looking for.
<?php
function replacelink($text) {
$replace = '<iframe title="$2" class="youtube-player" type="text/html" width="576" height="400" <iframe title="$2" class="youtube-player" type="text/html" width="576" height="400" src="http://www.youtube.com/embed/$2" frameborder="0"></iframe>';
$find = '/(<img.*?alt="([\da-z]+)".*?>)/i';
return preg_replace($find, $replace, $text);
}
$imagestr = '<img class="mceItem" src="http://img.youtube.com/vi/1MsVzAkmds0/default.jpg" alt="1MsVzAkmds0">';
echo replacelink($imagestr);
?>
There’s no need for a separate youtube() function.
If you want to replace more than one image, use preg_replace_all() instead of preg_replace().
The following regex would get all the images with a specific url. I not sure if this is what you wanted.
<img [^>]*?src="url"[^>]*?>
Previous anwser would fail if there were more than one image.
I'm trying to use preg_replace to filter member comments. To filter script and img tags. If src is from my site, allow it with tags, if from another site, just show the src
Regex Expression:
<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?
Using:
preg_replace($pattern, $2, $comment);
Comment :
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
<img src="http://www.mysite.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://www.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://fakesite.com/blah/blah/image.jpg"></img>
Which one is your favorite?
Wanted Outcome:
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
http://www.mysite.fakesite.com/blah/blah/image.jpg (notice that it's just url, because it's not from my site)
http://www.fakesite.com/blah/blah/image.jpg
http://fakesite.com/blah/blah/image.jpg
Which one is your favorite?
Anyone see anything wrong?
I'm trying to use preg_replace to filter member comments. To filter script and img tags.
HTML Purifier is going to be the best tool for this purpose, though you want a whitelist of acceptable tags and attributes, not a blacklist of specific harmful tags.
The biggest thing wrong I can see is trying to use regex to modify HTML.
You should use DOMDOcument.
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($content);
foreach($dom->getElementsByTag('img') as $element) {
if ( ! $element->hasAttribute('src')) {
continue;
}
$src = $element->getAttribute('src');
$elementHost = parse_url($src, PHP_URL_HOST);
$thisHost = $_SERVER['SERVER_NAME'];
if ($elementHost != $thisHost) {
$element->parentNode->insertBefore($dom->createTextNode($src), $element);
$element->parentNode->removeChild($element);
}
}
you shoud use im mode;
#<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?#im