simple html dom get in-line background-image URL from Attribute

simple html dom get in-line background-image URL from Attribute - php

I'm trying to get CSS background-image URL from HTML Attribute using Simple HTML DOM .
This the codes
$String=' <div class="Wrapper"><i style="background-image: url(https://example.com/backgroud-1035.jpg);" class="uiMediaThumbImg"></i></div>';
$html = new simple_html_dom();
$html->load($String);
foreach($html->find('i') as $a0)
$src[$i++]=$a0->style;
foreach( $src as $css )
print($css);
The output is Like this :-
background-image: url(https://example.com/backgroud-1035.jpg);
All I want is strip background Url from the rest of CSS tags . Like this
https://example.com/backgroud-1035.jpg

You can use regex to strip out the text between parentheses.
foreach($html->find('i') as $a0){
$style = $a0->style;
preg_match('/\(([^)]+)\)/', $style, $match);
$src[$i++] = $match[1];
//echo $match[1];
}

Maybe you found the answer, but for who did not
I will use the explode() function so I can break the string into an array,
see more about explode() function here.
First I splited the $css variable into array. the array be like this:
background-image: url(https://example.com/backgroud-1035.jpg);"
Array ( [0] => background-image: [1] => https://example.com/backgroud-1035.jpg);
And then break the ['1'] into array, then it looks like this
Array ( [0] => https://example.com/backgroud-1035.jpg [1] => ; )
And then print the ['0'] .
// it will output https://example.com/backgroud-1035.jpg
full code:
$String=' <div class="Wrapper"><i style="background-image:
url(https://example.com/backgroud-1035.jpg);" class="uiMediaThumbImg"></i></div>';
$html = new simple_html_dom();
$html->load($String);
foreach($html->find('i') as $a0)
$src[$i++]=$a0->style;
foreach( $src as $css )
$explode1 = explode("url(",$css);
$explode2 = explode(")",$explode1['1']);
print_r ($explode2['0']);

Related

Get string between two strings [PHP]

Okay this is probably all over the internet but I can't find a solution and been searching and trying different ways.
So the main way i've tried so far is as following:
string:
<div data-image-id="344231" style="height: 399.333px; background-image: url("/website/view_image/344231/medium"); background-size: contain;"></div>
code:
preg_match_all('/(style)=("[^"]*")/i', $value, $match);
preg_match('/background-image: url("(.*?)");/', $match[2][0], $match);
print_r($match);
I'm guessing I can't use:
background-image: url(" and "); instead the preg_match
Could someone give me some guidence on how I can achieve getting:
"/website/view_image/344231/medium"

If you use single quotes for the background image url instead of double quotes you could use DOMDocument and get the style attribute from the div.
Then use explode("; ") which will return an array where one item of that array will be "background-image: url('/website/view_image/344231/medium')".
Loop through the array and use preg_match with a regex like for example background-image: url\(([^)]+)\) which will capture in a group what is between the parenthesis.
If there is a regex match, store the value from the group.
$html = <<<HTML
<div data-image-id="344231" style="height: 399.333px; background-image: url('/website/view_image/344231/medium'); background-size: contain;"></div>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$elm = $doc->getElementsByTagName("div");
$result = array ();
$style = $doc->getElementsByTagName("div")->item(0)->getAttribute("style");
foreach (explode("; ", $style) as $str)
if (preg_match ('/background-image: url\(([^)]+)\)/', $str, $matches)) {
$result[] = $matches[1];
}
echo $result[0];
That will give you:
'/website/view_image/344231/medium'
Demo Php

Regex to match placeholders that contain HTML within them

I have placeholders that users can insert into a WYSIWYG editor (which contains HTML code). Sometimes when they paste from apps like Word etc it injects HTML within them.
Eg: It pastes %<span>firstname</span>% instead of %firstname%.
Here is an example of my regex code:
$html = '
<p>%firstname%</p>
<p>%<span>firstname</span>%</p>
<p>%<span class="blah">firstname</span>%</p>
<p>%<span><span>firstname</span></span>%</p>
<p>%<span><span><span>firstname</span></span></span>%</p>
<p>%<span class="blah"><span>firstname</span></span>%</p>
<div>other random <strong>HTML</strong> that needs to be preserved.</div>
';
preg_match_all(
'/\%(?![0-9])((?:<[^<]+?>)?[a-zA-z0-9_-]+(?:[\s]?<[^<]+?>)?)\%/U',
$html,
$matches
);
echo '<pre>';
print_r($matches);
echo '</pre>';
Which outputs the following:
Array
(
[0] => Array
(
[0] => %firstname%
[1] => %firstname%
[2] => %firstname%
)
[1] => Array
(
[0] => firstname
[1] => firstname
[2] => firstname
)
)
As soon as there is more than one span inside the placeholder it doesn't work. I'm not quite sure what to adjust in my regex.
/\%(?![0-9])((?:<[^<]+?>)?[a-zA-z0-9_-]+(?:[\s]?<[^<]+?>)?)\%/U
How would I achieve this?

Try this Regex. It should help you out!
/\%(?![0-9])(?:<[^<]+?>)*([a-zA-z0-9_-]+)(?:[\s]?<\/[^<]+?>)*\%/U

You could use a parser and the textContent property if it is a WYSIWYG editor anyway:
<?php
$html = '
<p>%firstname%</p>
<p>%<span>firstname</span>%</p>
<p>%<span class="blah">firstname</span>%</p>
<p>%<span><span>firstname</span></span>%</p>
<p>%<span><span><span>firstname</span></span></span>%</p>
<p>%<span class="blah"><span>firstname</span></span>%</p>
<div>A cool div with %firstname%</div>
<span>And a very neat span with %firstname%</span>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# query only root elements here
$containers = $xpath->query("/*");
foreach ($containers as $container) {
echo $container->textContent . "\n";
}
?>
This outputs %firstname% a couple of times, see a demo on ideone.com.

Do you really need a regex for this? You could have simply used strip_tags() here.
Try this:
echo strip_tags($html);

php get css class and assign it to rel attribute

How do I take this:
<img class="classone twoclass alignLEFT" src="xxxx" />
search for the word "align" in the class array, take the remainder of the word "align" (in this case "left") and assign it to an actual align property?
<img class="classone twoclass alignLEFT" align="LEFT" src="xxxx" />
I know I need
$needle = "align";
$haystack = "<img class="classone twoclass alignLEFT" src="xxxx" />"
and what I'm looking for is
$pincushion = {{the rest of the word from $needle}}
so basically I'm doing a preg_match for $needle. If found, how do I get the rest of that word (i.e., $pincushion) ?
I tried preg_split but it wouldn't allow me to use "align" as a delimiter. (makes sense)
This can NOT be jquery / javascript - it must take place in the rendered html code.
Any thoughts? I've spent 10 hours now searching for an answer with no real luck.
I did come across DomDocument but couldn't make that find what I needed either.

preg_match_all('#align(.*?)(" )#si', '<img class="classone twoclass alignLEFT" src="xxxx" />', $arr, PREG_PATTERN_ORDER);
Result:
Array
(
[0] => Array
(
[0] => alignLEFT"
)
[1] => Array
(
[0] => LEFT
)
[2] => Array
(
[0] => "
)
)
Explaination:
#align(.*?)(" )#si looks for algin followed by n-chars delimited by " or .

Since you're working with HTML you can get DOMDocument to work. It's a bit more drawn out, but probably easier to read and change than a complicated regex
There are libraries out there that are better at dealing with HTML fragments than domdocument, but if you want to use built in functions, domdocument is the way to go.
<?php
$html = '<img class="" src="xxxx" />adasdasdasd<img class="classone twoclass alignLEFT" src="xxxx" />';
$domdoc = new DOMDocument('');
$domdoc->loadHTML($html);
$xpath = new DOMXpath($domdoc);
$imgs = $xpath->query('//img');
foreach ($imgs as $element) {
$class = $element->getAttribute('class');
if (strpos($class, 'alignLEFT') !== false) {
$element->setAttribute('align', 'left');
}
}
// DOM Document works with full html documents, so now we have to isolate our fragment
$bodyelement = $xpath->query('/html/body');
$bodyhtml = $domdoc->saveXML($bodyelement->item(0));
echo str_replace(array('<body>', '</body>'), '', $bodyhtml);

PHP preg_match_all regex to extract only number in string

I can't seem to figure out the proper regular expression for extracting just specific numbers from a string. I have an HTML string that has various img tags in it. There are a bunch of img tags in the HTML that I want to extract a portion of the value from. They follow this format:
<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />
So, varying lengths of numbers before what 'usually' is a .jpg (it may be a .gif, .png, or something else too). I want to only extract the number from that string.
The 2nd part of this is that I want to use that number to look up an entry in a database and grab the alt/title tag for that specific id of image. Lastly, I want to add that returned database value into the string and throw it back into the HTML string.
Any thoughts on how to proceed with it would be great...
Thus far, I've tried:
$pattern = '/img src="http://domain.com/images/[0-9]+\/.jpg';
preg_match_all($pattern, $body, $matches);
var_dump($matches);

I think this is the best approach:
Use an HTML parser to extract the image tags
Use a regular expression (or perhaps string manipulation) to extract the ID
Query for the data
Use the HTML parser to insert the returned data
Here is an example. There are improvements I can think of, such as using string manipulation instead of a regex.
$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';
$doc = new DOMDocument;
$doc->loadHtml( $html);
foreach( $doc->getElementsByTagName('img') as $img)
{
$src = $img->getAttribute('src');
preg_match( '#/images/([0-9]+)\.#i', $src, $matches);
$id = $matches[1];
echo 'Fetching info for image ID ' . $id . "\n";
// Query stuff here
$result = 'Got this from the DB';
$img->setAttribute( 'title', $result);
$img->setAttribute( 'alt', $result);
}
$newHTML = $doc->saveHtml();

Using regular expressions, you can get the number really easily. The third argument for preg_match_all is a by-reference array that will be populated with the matches that were found.
preg_match_all('/<img src="http:\/\/domain.com\/images\/(\d+)\.[a-zA-Z]+"/', $html, $matches);
print_r($matches);
This would contain all of the stuff that it found.

Consider using preg_replace_callback.
Use this regex: (images/([0-9]+)[^"]+")
Then, as the callback argument, use an anonymous function. Result:
$output = preg_replace_callback(
"(images/([0-9]+)[^\"]+\")",
function($m) {
// $m[1] is the number.
$t = getTitleFromDatabase($m[1]); // do whatever you have to do to get the title
return $m[0]." title=\"".$t."\"";
},
$input
);

use preg_match_all:
preg_match_all('#<img.*?/(\d+)\.#', $str, $m);
print_r($m);
output:
Array
(
[0] => Array
(
[0] => <img src="http://domain.com/images/59.
[1] => <img src="http://domain.com/images/549.
[2] => <img src="http://domain.com/images/1249.
[3] => <img src="http://domain.com/images/6.
)
[1] => Array
(
[0] => 59
[1] => 549
[2] => 1249
[3] => 6
)
)

This regex should match the number parts:
\/images\/(?P<digits>[0-9]+)\.[a-z]+
Your $matches['digits'] should have all of the digits you want as an array.

Regular expressions alone are a bit on the loosing ground when it comes to parsing crappy HTML. DOMDocument's HTML handling is pretty well to serve tagsoup hot and fresh, xpath to select your image srcs and a simple sscanf to extract the number:
$ids = array();
$doc = new DOMDocument();
$doc->loadHTML($html);
foreach(simplexml_import_dom($doc)->xpath('//img/#src[contains(., "/images/")]') as $src) {
if (sscanf($src, '%*[^0-9]%d', $number)) {
$ids[] = $number;
}
}
Because that only gives you an array, why not encapsulate it?
$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';
$imageNumbers = new ImageNumbers($html);
var_dump((array) $imageNumbers);
Which gives you:
array(4) {
[0]=>
int(59)
[1]=>
int(549)
[2]=>
int(1249)
[3]=>
int(6)
}
By that function above nicely wrapped into an ArrayObject:
class ImageNumbers extends ArrayObject
{
public function __construct($html) {
parent::__construct($this->extractFromHTML($html));
}
private function extractFromHTML($html) {
$numbers = array();
$doc = new DOMDocument();
$preserve = libxml_use_internal_errors(TRUE);
$doc->loadHTML($html);
foreach(simplexml_import_dom($doc)->xpath('//img/#src[contains(., "/images/")]') as $src) {
if (sscanf($src, '%*[^0-9]%d', $number)) {
$numbers[] = $number;
}
}
libxml_use_internal_errors($preserve);
return $numbers;
}
}
If your HTML should be that malformatted that even DOMDocument::loadHTML() can't handle it, then you only need to handle that internally in the ImageNumbers class.

$matches = array();
preg_match_all('/[:digits:]+/', $htmlString, $matches);
Then loop through the matches array to both reconstruct the HTML and to do you look up in the database.

Extract all images from a Joomla article

I have this code that extracts the first image from an article in joomla:
<?php preg_match('/<img (.*?)>/', $this->article->text, $match); ?>
<?php echo $match[0]; ?>
Is there a way to extract all the images that are available in the article and not only one?

I may suggest first to not use Regular Expressions to parse HTML. You should use an appropiate parser such as DOMDocument::loadHTML which uses libxml.
Then you may query for the desired tags you want. Something like this may work (untested):
$doc = new DOMDocument;
$doc->loadHTML($htmlSource);
$xpath = new DOMXPath($doc);
$query = '//img';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
// $entry->getAttribute('src')
}

Use preg_match_all. And you'll want to modify the pattern like so to take into account the trailing '/' inside the img tag.
$str = '<img src="asdf" />stuff more stuff <img src="qwerty" />';
preg_match_all('/<img (.*?)\/>/', $str, $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => <img src="asdf" />
[1] => <img src="qwerty" />
)
[1] => Array
(
[0] => src="asdf"
[1] => src="qwerty"
)
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

simple html dom get in-line background-image URL from Attribute - php

You can use regex to strip out the text between parentheses. foreach($html->find('i') as $a0){ $style = $a0->style; preg_match('/\(([^)]+)\)/', $style, $match); $src[$i++] = $match[1]; //echo $match[1]; }

Related

Get string between two strings [PHP]

Regex to match placeholders that contain HTML within them

php get css class and assign it to rel attribute

PHP preg_match_all regex to extract only number in string

Extract all images from a Joomla article

Categories

Resources