php regex to change paths - php

I am new to PHP and regex hence need help using it to che some image paths.
My CMS generates image paths like this:
<img src="http://localhost/test/images/normal/ima1.jpg" width="100" height="100" alt="ima1">
I am using PHP and i have a variable $size and I want that if $size = 'small' then the path should be
<img src="http://localhost/test/images/small/ima1.jpg" width="100" height="100" alt="ima1">
and if if $size = 'medium' then the path should be
<img src="http://localhost/test/images/medium/ima1.jpg" width="100" height="100" alt="ima1">
These links are dynamically generated by my CMS hence I am looking for the PHP code which will replace the flder name in these links after the page is rendered.
All I want to replace is the word between images/ and the / after the replacing word.

Try $blabla = preg_replace( "/images\/[a-zA-Z]\//" , "images\/" . $size . "\/" , $sourceCode );
Now, $blabla is a random name. You could change it to whatever you want.
$sourceCode is also a name. You need to replace it with the string you want to replace.
E.g. $sourceCode = "<img src=\"http://localhost/test/images/small/ima1.jpg\" width=\"100\" height=\"100\" alt=\"ima1\">".
The syntax of the function preg_replace is as follows: preg_replace ( mixed $pattern , mixed $replacement , mixed $subject ).
It means: $pattern - the pattern you would like to replace in you string (our case $sourceCode), like "/images\/[a-zA-Z]\//". You could read about regexp syntax here.
$replacement - the text you want to put instead of the pattern. Since we are looking for everything that looks like "images/SOME_TEXT/" - we are replacing the whole pattern. To fill the src attribute correctly, we make our replacement as "image/" . $size . "/".
If we wrote a single $size as the replacement, we would get our $blalba as "<img src=\"http://localhost/test/smallima1.jpg\" width=\"100\" height=\"100\" alt=\"ima1\">".
Notice the smallima1.jpg (that's in case $size = "small").
P.S. Notice the backslashes before every ". They are preventing from the php parser to think it's the end of the string input. E.g. $name = "The "Batman""; will return error, while $name = "The \"Batman\""; will assign The "Batman" to the variable $name.
They are neccessary, if you assign a string that contains quotes to a variable.

My question for you is why do you want to replace ??? I if your site is properly structured you don't need replace
I expect something like this
$sizes = array("normal"=>array(500,500),
"small"=>array(100,100),
"medium"=>array(300,200));
$selected = "small" ;
$imageHost= "http://localhost/test/images" ;
$imagePath = "/public_html/test/images" ;
$imageName = "ima1.jpg" ;
$tag = "<img src=\"{$imageHost}/%s/$imageName\" width=\"%d\" height=\"%s\" alt=\"ima1\">";
Simple Demo to output all sizes
echo "<pre>" ;
foreach($sizes as $size => $dim){
if(!file_exists($imagePath . DIRECTORY_SEPARATOR . $selected . DIRECTORY_SEPARATOR . $imageName))
{
// Am sure you want to either create the image or copu the thumn here
}
echo printf($tag,$size,$dim[0],$dim[1]) . PHP_EOL;
}
Output
<img src="http://localhost/test/images/normal/ima1.jpg" width="500" height="500" alt="ima1">
<img src="http://localhost/test/images/small/ima1.jpg" width="100" height="100" alt="ima1">
<img src="http://localhost/test/images/medium/ima1.jpg" width="300" height="200" alt="ima1">

Providing that all your images are under the same path, I recommend you leave regular expressions alone (this time), and opt for a much easier explode() method.
Using the PHP explode() function you can split a string into an array using a delimiter.
$str = 'http://localhost/test/images/small/ima1.jpg';
$arr = explode('/',$str);
This should give you something like this -
Array
(
[0] => http:
[1] =>
[2] => localhost
[3] => test
[4] => images
[5] => small
[6] => ima1.jpg
)
// remove the protocol specification and the empty element.
// You'll see that the `explode()` function actually removes the slashes
// (including the two at the beginning of the URL in the protocol specification),
// you'll have to return them once you have finished.
array_shift($arr);
array_shift($arr);
Now you are left with -
Array
(
[0] => localhost
[1] => test
[2] => images
[3] => small
[4] => ima1.jpg
)
Providing the URL's are the same for all images, you can simply replace the fourth element ($arr[3]) with the relevant size and then reassemble your URL using the implode() function.
array_unshift($arr,'/'); // adds an element to the beginning of an array
array_unshift($arr,'http:');
$finalURL = implode('/',$arr);
Relevant documentation -
explode() - http://www.php.net/manual/en/function.explode.php
implode() - http://php.net/manual/en/function.implode.php
array_shift() - http://php.net/manual/en/function.array-shift.php
array_unshift() - http://www.php.net/manual/en/function.array-unshift.php

Related

Converting text to smiley if multiple smileys are combined together not working

I'm trying to convert text ($icon) to smiley image ($image). I used to do it with str_replace(), but that seems to perform the replace sequentially and as such it also replaces items in previously converted results (for example in the tag).
I am now using the following code:
foreach($smiliearray as $image => $icon){
$pattern[]="/(?<!\S)" . preg_quote($icon, '/') . "(?!\S)/u";
$replacement[]=" <img src='$image' border='0' alt=''> ";
}
$text = preg_replace($pattern,$replacement,$text);
This code works, but only if the smiley code is surrounded by whitespace. So basically if someone types ":);)", it won't catch it as two separate smilieys, but ":) ;)" does.
How can I fix it so that also a string of smileys (not separated by space) are converted?
Note that there can be unlimited kinds of smiley codes and smiley images. I do not know beforehand which ones, because other people can submit codes and smileys, so it is not just ":)" and ";)", but can also be "rofl", ":eh", ":-{", etc.
I can partially fix it by adding a \W non-word to the end of the 2nd capturegroup: (?!\S\W), and further by adding a 2nd $pattern and $replacement with a \W to the first capturegroup. But I don't think that is the way it should be done, and it only partially solves it.
I used to do it with str_replace(), but that seems to perform the
replace sequentially and as such it also replaces items in previously
converted results...
A good and true reason to use strtr(). You don't even need Regular Expressions:
<?php
// I assume your original array looks like this
$origSmileys = [
"/1.png" => ':)',
"/2.png" => ':(',
"/3.png" => ':P',
"/4.png" => '>:('
];
// sample input string
$str = " I'm :) but :(>:(:( now :P";
// iterating over smileys to add html tag
$newSmileys = array_map(function($value) {
return "<img src='$value' border='0' alt=''>";
}, array_flip($origSmileys));
// replace
echo strtr($str, $newSmileys);
Live demo

PHP Using str_replace along with preg_replace

I have a string of comma separated values that comes from a database, which actually are image paths. Like so:
/images/us/US01021422717777-m.jpg,/images/us/US01021422717780-m.jpg,/images/us/US01021422717782-m.jpg,/images/us/US01021422718486-m.jpg
I then do like below, to split them at the , and convert them into paths for the web page.
preg_replace('~\s?([^\s,]+)\s?(?:,|$)~','<img class="gallery" src="$1">', $a)
Works well, but in one place further in my page, I need to change the -m to -l (which means large)
When I do like below (put a str_replace inside the preg_replace), nothing happens. How can I do something like this?
preg_replace('~\s?([^\s,]+)\s?(?:,|$)~','<img class="gallery" src="$1" data-slide="'.str_replace('-m','-l','$1').'">', $a)
You're putting the str_replace() call in the output pattern for the preg_replace() call. That means preg_replace() is treating it as literal text.
What you want is something like this:
$imgtag = preg_replace(match, replacement, $a);
$imgtag = str_replace('-m','-l',$imgtag);
But, in my opinion it would be safer and easier to debug this stuff if you changed the order of your replacement operations, something like this:
foreach ($path in explode(",", $a)) {
$path = str_replace('-m','-l',$path);
$imgtag= sprintf ('<img class="gallery" src="%s">', $path);
/* do something with the $imgtag */
}
That way you don't have to whistle into your modem :-) to program that regexp.
Use str_replace on preg_replace return
$large = str_replace('-m','-l', preg_replace('~\s?([^\s,]+)\s?(?:,|$)~','<img class="gallery" src="$1">', $a));
Output will be
<img class="gallery" src="/images/us/US01021422717777-l.jpg">
<img class="gallery" src="/images/us/US01021422717780-l.jpg">
<img class="gallery" src="/images/us/US01021422717782-l.jpg">
<img class="gallery" src="/images/us/US01021422718486-l.jpg">
Use preg_replace_callback():
preg_replace_callback(
'~\s?([^\s,]+)\s?(?:,|$)~',
function (array $matches) {
$src = $matches[1]; // this is "$1"
$slide = str_replace('-m', '-l', $matches[1]);
return '<img class="gallery" src="'.$src.'" data-slide="'.$slide.'">';
},
$a
);
Instead of the replace expression, preg_replace_callback() gets as its second argument a function that receives the list of matched expressions and returns the replacement string.
Actually your str_replace is simply invoked before preg_replace is invoked. Result of str_replace is then passed as argument to preg_replace.
What I could suggest is using preg_replace_callback:
function replace_img($match)
{
return '<img class="gallery" src="' .
$match[1] .
'" data-slide="' .
str_replace('-m','-l',$match[1]) .
'">';
}
preg_replace_callback('~\s?([^\s,]+)\s?(?:,|$)~','replace_img', $a);
If you need two separate outputs from each comma-separated value, I would write a pattern that stores the fullstring match and the substrings on either side of the m in each file.
*note: I match the trailing - in the first capture group and the leading . in the second capture group for minimal assurance of accuracy. This is somewhat weak validation; you can firm it up if your project requires it by adding literal or more restrictive pattern components in the capture groups.
Code: (Demo)
$csv='/images/us/US01021422717777-m.jpg,/images/us/US01021422717780-m.jpg,/images/us/US01021422717782-m.jpg,/images/us/US01021422718486-m.jpg';
if(preg_match_all('~([^,]+-)m(\.[^,]+)~',$csv,$out,PREG_SET_ORDER)){
foreach($out as $m){
$mediums[]="<img class=\"gallery\" src=\"{$m[0]}\">";
$larges[]="<img class=\"gallery\" src=\"{$m[0]}\" data-slide=\"{$m[1]}l{$m[2]}\">";
}
}
var_export($mediums);
echo "\n\n";
var_export($larges);
Output:
array (
0 => '<img class="gallery" src="/images/us/US01021422717777-m.jpg">',
1 => '<img class="gallery" src="/images/us/US01021422717780-m.jpg">',
2 => '<img class="gallery" src="/images/us/US01021422717782-m.jpg">',
3 => '<img class="gallery" src="/images/us/US01021422718486-m.jpg">',
)
array (
0 => '<img class="gallery" src="/images/us/US01021422717777-m.jpg" data-slide="/images/us/US01021422717777-l.jpg">',
1 => '<img class="gallery" src="/images/us/US01021422717780-m.jpg" data-slide="/images/us/US01021422717780-l.jpg">',
2 => '<img class="gallery" src="/images/us/US01021422717782-m.jpg" data-slide="/images/us/US01021422717782-l.jpg">',
3 => '<img class="gallery" src="/images/us/US01021422718486-m.jpg" data-slide="/images/us/US01021422718486-l.jpg">',
)

How would one use PHP preg_match_all to differentiate anchor elements identified by attribute of inner HTML element?

I have sets of HTML anchor elements enclosing image elements. For each set, using PHP-CLI, I want to pull the URLs and classify them according to their types. The type of anchor can only be determined by an attribute of its child image element. It would be easy if there was only one of each type per set. My problem is when two anchor elements of one type are separated by one or more of the other types. My non-greedy parenthesized sub-pattern seems to become greedy and expands to find the second relevant child attribute. In my test script I'm trying to pull the 'Userlink' URLs from amongst the other types. Using a simple pattern like:
#<a href="(.*?)" custattr="value1"><img alt="Userlink"#
On a set like:
<li><img alt="Userlink" class="common_link_class" height="123" src="pic0.png" width="123" style="width: 123px;"></li><li><img alt="Socnet1" class="common_link_class" height="123" src="pic1.png" width="123" style="width: 123px;"></li><li><img alt="Socnet2" class="common_link_class" height="123" src="pic2.png" width="123" style="width: 123px;"></li><li><img alt="Usermail" class="common_link_class" height="123" src="pic3.png" width="123" style="width: 123px;"></li><li><img alt="Userlink" class="common_link_class" height="123" src="pic4.png" width="123" style="width: 123px;"></li>
(sorry, but the actual html is on one line like that)
My sub-pattern captures from the beginning of the first "Userlink" URL to the end of the last one.
I've tried many variations of look-aheads, not sure I should list them all here. So far they've either returned no match at all or the same as described above.
Here's my test script (running in a Bash shell):
#!/usr/bin/php
<?
$lines = 0;
$input = "";
$matches = array();
while ($line = fgets(STDIN)){
$input .= $line;
$lines++;
}
fwrite(STDERR, "Processing $lines\n");
$pcre = '#<a href="(.*?)" custattr="value1"><img alt="Userlink"#';
if (preg_match_all($pcre,$input,$matches)){
fwrite(STDERR, "\$matches has " . count($matches) . " elements\n");
foreach ($matches[1] as $match){
fwrite(STDOUT, $match . "\n");
}
}
?>
What PCRE pattern for PHP's preg_match_all() would return the two "Userlink" URLs in the above example?
I have taken the liberty of changing your variable names:
$pattern = '~<a href="([^"]++)" custattr="value1"><img alt="Userlink"~';
if ($nb = preg_match_all($pattern, $input, $matches)) {
fwrite(STDERR, "\$matches has " . $nb . " elements\n");
fwrite(STDOUT, implode("\n", $match) . "\n");
}
Note that the preg_match_all function returns the number of matches.
This regex should work -
<a href="([^"]*?)"[^>]*\><img alt="Userlink"
You can see how it work here.
Testing it -
$pcre = '/<a href="([^"]*?)"[^>]*\><img alt="Userlink"/';
if (preg_match_all($pcre,$input,$matches)){
var_dump($matches);
//$matches[1] will be the array containing the urls.
}
/*
OUTPUT-
array
0 =>
array
0 => string '<a href="http://www.userlink1.com/my/page.html" custattr="value1"><img alt="Userlink"' (length=85)
1 => string '<a href="http://www.userlink2.com/my/page.html" custattr="value1"><img alt="Userlink"' (length=85)
1 =>
array
0 => string 'http://www.userlink1.com/my/page.html' (length=37)
1 => string 'http://www.userlink2.com/my/page.html' (length=37)
*/

Smiley Replace within CDATA of an HTML-String

i have got a simple problem :( I need to replace text smilies with the according smiley-image. ok.. thats not really complex, but now i have to replace only smilie appereances outside of HTML Tags. short examplae:
Text:
Thats a good example :/ .. with a link inside.
i want to replace ":/" with the image of this smiley...
ok, how to do that the best way?
I won't try to create some super script but think about it.... smilies are just about always surrounded by spaces. So str replace ' :/ ' with the smiley. You could be saying "what about a smiley at the end of a sentence(where it would be used the most)". Well just check for at least one space on either the left or the right of a potential smiley.
Using the above scripts:
$smiley_array = array(
":) " => "<a href...>",
" :)" => "<a href...>",
":/ " => "<a href...>",
" :/" => "<a href...>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
If you rather not have to type everything twice you can generate the array from a single smiley array.
Why don't you just try to use some special chars around your smiley text like this maybe -:/-
This will make your smiley text some kind of unique and easy to recognize
Use preg_replace with a lookbehind assertion. Example:
$smileys = array(
':/' => '<img src="..." alt=":/">'
);
foreach ($smileys as $smile => $img) {
$text = preg_replace('#(?<!<[^<>]*)' . preg_quote($smile, '#') . '#',
$img, $text);
}
The regex should match only smileys that are not inside angle brackets. This might be slow if you have a lot of false positives.
I wouldn't know about the best way, only the way I would do it.
Build an array having the smiley codes as the keys and the link as the value. The use str_replace. Pass as "needle" an array of the keys (the smiley codes) and as "replace" an array of the values.
For instance, suppose you have something like this:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
EDIT: In case this could accidentally replace other instances with smiley-links you should consider using regexes with preg_replace. Obviously preg_replace is slower than str_replace.
You can use regex, or the extra sloppy version of the above:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace("://", "%%QF%%", $str);
$str = str_replace($codes, $links, $str);
$str = str_replace("%%QF%%", "://", $str);
Actually, assuming str_replace follows the array sorting...
this should work:
$smiley_array = array("://" => "%%QF%%", ":)" => "<a href...>",
":(" => "<a href=....>", "%%QF%%" => "://");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
Possible overkill (increased cpu/load), but 99.99999999% safe:
<?php
$n = new DOMDocument();
$n->loadHTML('<p>Thats a good example :/ .. with a link inside.</p>');
$x = new DOMXPath($n);
$instances = $x->query('//text()[contains(.,\':/\')]');//or use '//*[child::text()]' for all textnodes
foreach($instances as $node){
if($node instanceof DOMText && preg_match_all('/:\//',$node->wholeText,$matches,PREG_OFFSET_CAPTURE|PREG_SET_ORDER)){
foreach($matches[0] as $match){
$newnode = $node->splitText($match[1]);
$newnode->replaceData(0,strlen($match[0]),'');
$img = $n->createElement('img');
$img->setAttribute('src','smily.gif');
$img = $newnode->parentNode->insertBefore($img,$newnode);
//var_dump($match);
}
}
}
var_dump($n->saveHTML());
?>
But in reality you do not want to do this all that often, save once, show many, if you are letting users edit the html (beit in wysiwyg or elsewise, the 'return' transformation (img to text) is a whole lot lighter. Up to you to expand with different smilies (one monster regex to match them, or several smaller ones / strstr()'s for readability, and a array for smiley to src (e.g. array(':/'=>'frown.gif')) would be the way to go.

How to extract img src, title and alt from html using php? [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I would like to create a page where all images which reside on my website are listed with title and alternative representation.
I already wrote me a little program to find and load all HTML files, but now I am stuck at how to extract src, title and alt from this HTML:
<img src="/image/fluffybunny.jpg" title="Harvey the bunny" alt="a cute little fluffy bunny" />
I guess this should be done with some regex, but since the order of the tags may vary, and I need all of them, I don't really know how to parse this in an elegant way (I could do it the hard char by char way, but that's painful).
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
EDIT : now that I know better
Using regexp to solve this kind of problem is a bad idea and will likely lead in unmaintainable and unreliable code. Better use an HTML parser.
Solution With regexp
In that case it's better to split the process into two parts :
get all the img tag
extract their metadata
I will assume your doc is not xHTML strict so you can't use an XML parser. E.G. with this web page source code :
/* preg_match_all match the regexp in all the $html string and output everything as
an array in $result. "i" option is used to make it case insensitive */
preg_match_all('/<img[^>]+>/i',$html, $result);
print_r($result);
Array
(
[0] => Array
(
[0] => <img src="/Content/Img/stackoverflow-logo-250.png" width="250" height="70" alt="logo link to homepage" />
[1] => <img class="vote-up" src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" />
[2] => <img class="vote-down" src="/content/img/vote-arrow-down.png" alt="vote down" title="This was not helpful (click again to undo)" />
[3] => <img src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG" height=32 width=32 alt="gravatar image" />
[4] => <img class="vote-up" src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" />
[...]
)
)
Then we get all the img tag attributes with a loop :
$img = array();
foreach( $result as $img_tag)
{
preg_match_all('/(alt|title|src)=("[^"]*")/i',$img_tag, $img[$img_tag]);
}
print_r($img);
Array
(
[<img src="/Content/Img/stackoverflow-logo-250.png" width="250" height="70" alt="logo link to homepage" />] => Array
(
[0] => Array
(
[0] => src="/Content/Img/stackoverflow-logo-250.png"
[1] => alt="logo link to homepage"
)
[1] => Array
(
[0] => src
[1] => alt
)
[2] => Array
(
[0] => "/Content/Img/stackoverflow-logo-250.png"
[1] => "logo link to homepage"
)
)
[<img class="vote-up" src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" />] => Array
(
[0] => Array
(
[0] => src="/content/img/vote-arrow-up.png"
[1] => alt="vote up"
[2] => title="This was helpful (click again to undo)"
)
[1] => Array
(
[0] => src
[1] => alt
[2] => title
)
[2] => Array
(
[0] => "/content/img/vote-arrow-up.png"
[1] => "vote up"
[2] => "This was helpful (click again to undo)"
)
)
[<img class="vote-down" src="/content/img/vote-arrow-down.png" alt="vote down" title="This was not helpful (click again to undo)" />] => Array
(
[0] => Array
(
[0] => src="/content/img/vote-arrow-down.png"
[1] => alt="vote down"
[2] => title="This was not helpful (click again to undo)"
)
[1] => Array
(
[0] => src
[1] => alt
[2] => title
)
[2] => Array
(
[0] => "/content/img/vote-arrow-down.png"
[1] => "vote down"
[2] => "This was not helpful (click again to undo)"
)
)
[<img src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG" height=32 width=32 alt="gravatar image" />] => Array
(
[0] => Array
(
[0] => src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"
[1] => alt="gravatar image"
)
[1] => Array
(
[0] => src
[1] => alt
)
[2] => Array
(
[0] => "http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"
[1] => "gravatar image"
)
)
[..]
)
)
Regexps are CPU intensive so you may want to cache this page. If you have no cache system, you can tweak your own by using ob_start and loading / saving from a text file.
How does this stuff work ?
First, we use preg_ match_ all, a function that gets every string matching the pattern and ouput it in it's third parameter.
The regexps :
<img[^>]+>
We apply it on all html web pages. It can be read as every string that starts with "<img", contains non ">" char and ends with a >.
(alt|title|src)=("[^"]*")
We apply it successively on each img tag. It can be read as every string starting with "alt", "title" or "src", then a "=", then a ' " ', a bunch of stuff that are not ' " ' and ends with a ' " '. Isolate the sub-strings between ().
Finally, every time you want to deal with regexps, it handy to have good tools to quickly test them. Check this online regexp tester.
EDIT : answer to the first comment.
It's true that I did not think about the (hopefully few) people using single quotes.
Well, if you use only ', just replace all the " by '.
If you mix both. First you should slap yourself :-), then try to use ("|') instead or " and [^ΓΈ] to replace [^"].
Just to give a small example of using PHP's XML functionality for the task:
$doc=new DOMDocument();
$doc->loadHTML("<html><body>Test<br><img src=\"myimage.jpg\" title=\"title\" alt=\"alt\"></body></html>");
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title'];
}
I did use the DOMDocument::loadHTML() method because this method can cope with HTML-syntax and does not force the input document to be XHTML. Strictly speaking the conversion to a SimpleXMLElement is not necessary - it just makes using xpath and the xpath results more simple.
If it's XHTML, your example is, you need only simpleXML.
<?php
$input = '<img src="/image/fluffybunny.jpg" title="Harvey the bunny" alt="a cute little fluffy bunny"/>';
$sx = simplexml_load_string($input);
var_dump($sx);
?>
Output:
object(SimpleXMLElement)#1 (1) {
["#attributes"]=>
array(3) {
["src"]=>
string(22) "/image/fluffybunny.jpg"
["title"]=>
string(16) "Harvey the bunny"
["alt"]=>
string(26) "a cute little fluffy bunny"
}
}
I used preg_match to do it.
In my case, I had a string containing exactly one <img> tag (and no other markup) that I got from Wordpress and I was trying to get the src attribute so I could run it through timthumb.
// get the featured image
$image = get_the_post_thumbnail($photos[$i]->ID);
// get the src for that image
$pattern = '/src="([^"]*)"/';
preg_match($pattern, $image, $matches);
$src = $matches[1];
unset($matches);
In the pattern to grab the title or the alt, you could simply use $pattern = '/title="([^"]*)"/'; to grab the title or $pattern = '/title="([^"]*)"/'; to grab the alt. Sadly, my regex isn't good enough to grab all three (alt/title/src) with one pass though.
You may use simplehtmldom. Most of the jQuery selectors are supported in simplehtmldom. An example is given below
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
The script must be edited like this
foreach( $result[0] as $img_tag)
because preg_match_all return array of arrays
I have read the many comments on this page that complain that using a dom parser is unnecessary overhead. Well, it may be more expensive than a mere regex call, but the OP has stated that there is no control over the order of the attributes in the img tags. This fact leads to unnecessary regex pattern convolution. Beyond that, using a dom parser provides the additional benefits of readability, maintainability, and dom-awareness (regex is not dom-aware).
I love regex and I answer lots of regex questions, but when dealing with valid HTML there is seldom a good reason to regex over a parser.
In the demonstration below, see how easy and clean DOMDocument handles img tag attributes in any order with a mixture of quoting (and no quoting at all). Also notice that tags without a targeted attribute are not disruptive at all -- an empty string is provided as a value.
Code: (Demo)
$test = <<<HTML
<img src="/image/fluffybunny.jpg" title="Harvey the bunny" alt="a cute little fluffy bunny" />
<img src='/image/pricklycactus.jpg' title='Roger the cactus' alt='a big green prickly cactus' />
<p>This is irrelevant text.</p>
<img alt="an annoying white cockatoo" title="Polly the cockatoo" src="/image/noisycockatoo.jpg">
<img title=something src=somethingelse>
HTML;
libxml_use_internal_errors(true); // silences/forgives complaints from the parser (remove to see what is generated)
$dom = new DOMDocument();
$dom->loadHTML($test);
foreach ($dom->getElementsByTagName('img') as $i => $img) {
echo "IMG#{$i}:\n";
echo "\tsrc = " , $img->getAttribute('src') , "\n";
echo "\ttitle = " , $img->getAttribute('title') , "\n";
echo "\talt = " , $img->getAttribute('alt') , "\n";
echo "---\n";
}
Output:
IMG#0:
src = /image/fluffybunny.jpg
title = Harvey the bunny
alt = a cute little fluffy bunny
---
IMG#1:
src = /image/pricklycactus.jpg
title = Roger the cactus
alt = a big green prickly cactus
---
IMG#2:
src = /image/noisycockatoo.jpg
title = Polly the cockatoo
alt = an annoying white cockatoo
---
IMG#3:
src = somethingelse
title = something
alt =
---
Using this technique in professional code will leave you with a clean script, fewer hiccups to contend with, and fewer colleagues that wish you worked somewhere else.
Here's A PHP Function I hobbled together from all of the above info for a similar purpose, namely adjusting image tag width and length properties on the fly ... a bit clunky, perhaps, but seems to work dependably:
function ReSizeImagesInHTML($HTMLContent,$MaximumWidth,$MaximumHeight) {
// find image tags
preg_match_all('/<img[^>]+>/i',$HTMLContent, $rawimagearray,PREG_SET_ORDER);
// put image tags in a simpler array
$imagearray = array();
for ($i = 0; $i < count($rawimagearray); $i++) {
array_push($imagearray, $rawimagearray[$i][0]);
}
// put image attributes in another array
$imageinfo = array();
foreach($imagearray as $img_tag) {
preg_match_all('/(src|width|height)=("[^"]*")/i',$img_tag, $imageinfo[$img_tag]);
}
// combine everything into one array
$AllImageInfo = array();
foreach($imagearray as $img_tag) {
$ImageSource = str_replace('"', '', $imageinfo[$img_tag][2][0]);
$OrignialWidth = str_replace('"', '', $imageinfo[$img_tag][2][1]);
$OrignialHeight = str_replace('"', '', $imageinfo[$img_tag][2][2]);
$NewWidth = $OrignialWidth;
$NewHeight = $OrignialHeight;
$AdjustDimensions = "F";
if($OrignialWidth > $MaximumWidth) {
$diff = $OrignialWidth-$MaximumHeight;
$percnt_reduced = (($diff/$OrignialWidth)*100);
$NewHeight = floor($OrignialHeight-(($percnt_reduced*$OrignialHeight)/100));
$NewWidth = floor($OrignialWidth-$diff);
$AdjustDimensions = "T";
}
if($OrignialHeight > $MaximumHeight) {
$diff = $OrignialHeight-$MaximumWidth;
$percnt_reduced = (($diff/$OrignialHeight)*100);
$NewWidth = floor($OrignialWidth-(($percnt_reduced*$OrignialWidth)/100));
$NewHeight= floor($OrignialHeight-$diff);
$AdjustDimensions = "T";
}
$thisImageInfo = array('OriginalImageTag' => $img_tag , 'ImageSource' => $ImageSource , 'OrignialWidth' => $OrignialWidth , 'OrignialHeight' => $OrignialHeight , 'NewWidth' => $NewWidth , 'NewHeight' => $NewHeight, 'AdjustDimensions' => $AdjustDimensions);
array_push($AllImageInfo, $thisImageInfo);
}
// build array of before and after tags
$ImageBeforeAndAfter = array();
for ($i = 0; $i < count($AllImageInfo); $i++) {
if($AllImageInfo[$i]['AdjustDimensions'] == "T") {
$NewImageTag = str_ireplace('width="' . $AllImageInfo[$i]['OrignialWidth'] . '"', 'width="' . $AllImageInfo[$i]['NewWidth'] . '"', $AllImageInfo[$i]['OriginalImageTag']);
$NewImageTag = str_ireplace('height="' . $AllImageInfo[$i]['OrignialHeight'] . '"', 'height="' . $AllImageInfo[$i]['NewHeight'] . '"', $NewImageTag);
$thisImageBeforeAndAfter = array('OriginalImageTag' => $AllImageInfo[$i]['OriginalImageTag'] , 'NewImageTag' => $NewImageTag);
array_push($ImageBeforeAndAfter, $thisImageBeforeAndAfter);
}
}
// execute search and replace
for ($i = 0; $i < count($ImageBeforeAndAfter); $i++) {
$HTMLContent = str_ireplace($ImageBeforeAndAfter[$i]['OriginalImageTag'],$ImageBeforeAndAfter[$i]['NewImageTag'], $HTMLContent);
}
return $HTMLContent;
}
Here is THE solution, in PHP:
Just download QueryPath, and then do as follows:
$doc= qp($myHtmlDoc);
foreach($doc->xpath('//img') as $img) {
$src= $img->attr('src');
$title= $img->attr('title');
$alt= $img->attr('alt');
}
That's it, you're done !

Categories