Get string between two strings [PHP] - php

Okay this is probably all over the internet but I can't find a solution and been searching and trying different ways.
So the main way i've tried so far is as following:
string:
<div data-image-id="344231" style="height: 399.333px; background-image: url("/website/view_image/344231/medium"); background-size: contain;"></div>
code:
preg_match_all('/(style)=("[^"]*")/i', $value, $match);
preg_match('/background-image: url("(.*?)");/', $match[2][0], $match);
print_r($match);
I'm guessing I can't use:
background-image: url(" and "); instead the preg_match
Could someone give me some guidence on how I can achieve getting:
"/website/view_image/344231/medium"

If you use single quotes for the background image url instead of double quotes you could use DOMDocument and get the style attribute from the div.
Then use explode("; ") which will return an array where one item of that array will be "background-image: url('/website/view_image/344231/medium')".
Loop through the array and use preg_match with a regex like for example background-image: url\(([^)]+)\) which will capture in a group what is between the parenthesis.
If there is a regex match, store the value from the group.
$html = <<<HTML
<div data-image-id="344231" style="height: 399.333px; background-image: url('/website/view_image/344231/medium'); background-size: contain;"></div>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$elm = $doc->getElementsByTagName("div");
$result = array ();
$style = $doc->getElementsByTagName("div")->item(0)->getAttribute("style");
foreach (explode("; ", $style) as $str)
if (preg_match ('/background-image: url\(([^)]+)\)/', $str, $matches)) {
$result[] = $matches[1];
}
echo $result[0];
That will give you:
'/website/view_image/344231/medium'
Demo Php

Related

Extract css from html

I'm looking for a clean way to grab and remove all the css between the <style></style> tags.
For example:
<style>
foo
</style>
content
<style>
bar
</style>
here
By the end of the process I should have a string with content\nhere (everything not between the style tags) and an array of matches between the style tags ['foo', 'bar'] or something similar. I've tried a lot of different regex approaches and none of the seemed to work for me. (I'm no regex pro though..)
You should not access html with regular expressions. Just try DomDocument instead. It 's more cleaner and easier to access.
$dom = new DomDocument();
$dom->loadHTML('Your HTML here');
$elements = $dom->getElementsByTagName('style');
for ($i = $elements->length; --$i >= 0;) {
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
echo $dom->saveHTML();
This code is an example and not tested.
I found the answer:
$html = "<style>css css css</style>my html<style>more css</style>";
preg_match_all("/<style>(.*?)<\/style>/is", $html, $matches);
$html = str_replace($matches[0], '', $html);
$css = implode("\n", $matches[1]);
echo $html; // my html
echi $css; // css css css \n more css
Originally I was looking for a pure regex solution, but this is fine for me.

Regex preg_replace find image in string WITH img attributes

I'm trying to find ALL images in my blog posts with regex. The code below returns images IF the code is clean and the SRC tag comes right after the IMG tag. However, I also have images with other attributes such as height and width. The regex I have does not pick that up... Any ideas?
The following code returns images that looks like this:
<img src="blah_blah_blah.jpg">
But not images that looks like this:
<img width="290" height="290" src="blah_blah_blah.jpg">
Here is my code
$pattern = '/<img\s+src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
$html = <<<DATA
<img width="290" height="290" src="blah.jpg">
<img src="blah_blah_blah.jpg">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
blah.jpg
blah_blah_blah.jpg
Ever think of using the DOM object instead of regex?
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You'd better to use a parser, but here is a way to do with regex:
$pattern = '/<img\s.*?src="([^"]+)"/i';
The problem is that you only accept \s+ after <img. Try this instead:
$pattern = '/<img\s+[^>]*?src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Try this:
$pattern = '/<img\s.*?src=["\']([^"\']+)["\']/i';
Single or double quote and dynamic src attr position.

preg_replace and preg_match_all to move img from wordpress $content

I am using preg_replace to delete from $content certain <img>:
$content=preg_replace('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/','',$content);
When I am now displaying the content using wordpress the_content function, I did indeed remove the <img>s from $content:
I'd like beforehand to get this images to place them elsewhere in the template. I am using the same regex pattern with preg_match_all:
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
But I can't get my imgs?
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
print_r($matches);
Array ( [0] => Array ( ) )
assuming and hopefully you are using php5, this is a task for DOMDocument and xpath. regex with html elements mostly will work, but check the following example from
<img alt=">" src="/path.jpg" />
regex will fail. since there aren't many guarantees in programming, take the guarantee that xpath will find EXACTLY what you want, at a perfomance cost, so to code it:
$doc = new DOMDocument();
$doc->loadHTML('<span><img src="com.png" /><img src="com2.png" /></span>');
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//span/img');
$html = '';
foreach($imgs as $img){
$html .= $doc->saveXML($img);
}
now you have all img elements in $html, use str_replace() to remove them from $content, and from there you can have a drink and be pleased that xpath with html elements is painless, just a little slower
ps. i couldnt be be bother understanding your regex, i just think xpath is better in your situation
at the end i have used preg_replace_callback:
$content2 = get_the_content();
$removed_imgs = array();
$content2 = preg_replace_callback('#(?!<img.+?id="featured_img".*?\/>)(<img.+? />)#',function($r) {
global $removed_imgs;
$removed_imgs[] = $r[1];
return '';
},$content2);
foreach($removed_imgs as $img){
echo $img;
}

preg_match_all to scrape found word between html tags

I have the following piece of code which should match the provided string to $contents. $contents variable has a web page contents stored through file_get_contents() function:
if (preg_match('~<p style="margin-top: 40px; " class="head">GENE:<b>(.*?)</b>~iU', $contents, $match)){
$found_match = $match[1];
}
The original string on the said webpage looks like this:
<p style="margin-top: 40px; " class="head">GENE:<b>TSPAN6</b>
I would like to match and store the string 'TSPAN6' found on the web page through (.*?) into $match[1]. However, the matching does not seem to work. Any ideas?
Unfortunately, your suggestion did not work.
After some hours of looking through the html code I have realized that the regex simply had a blank space right after the colon. As such, the code snippet now looks like this:
$pattern = '#GENE: <b>(.*)</b>#i';
preg_match($pattern1, $contents, $match1);
if (isset($match1[1]))
{
$found_flag = $match1[1];
}
Try this:
preg_match( '#GENE:<b>([^<]+)</b>si#', $contents, $match );
$found_match = ( isset($match[1]) ? $match[1] : false );

Unable to use regex to search in PHP?

I'm trying to get the code of a html document in specific tags.
My method works for some tags, but not all, and it not work for the tag's content I want to get.
Here is my code:
<html>
<head></head>
<body>
<?php
$url = "http://sf.backpage.com/MusicInstruction/";
$data = file_get_contents($url);
$pattern = "/<div class=\"cat\">(.*)<\/div>/";
preg_match_all($pattern, $data, $adsLinks, PREG_SET_ORDER);
var_dump($adsLinks);
foreach ($adsLinks as $i) {
echo "<div class='ads'>".$i[0]."</div>";
}
?>
</body>
</html>
The above code doesn't work, but it works when I change the $pattern into:
$pattern = "/<div class=\"date\">(.*)<\/div>/";
or
$pattern = "/<div class=\"sponsorBoxPlusImages\">(.*)<\/div>/";
I can't see any different between these $pattern. Please help me find the error.
Thanks.
Use PHP DOM to parse HTML instead of regex.
For example in your case (code updated to show HTML):
$doc = new DOMDocument();
#$doc->loadHTML(file_get_contents("http://sf.backpage.com/MusicInstruction/"));
$nodes = $doc->getElementsByTagName('div');
for ($i = 0; $i < $nodes->length; $i ++)
{
$x = $nodes->item($i);
if($x->getAttribute('class') == 'cat');
echo htmlspecialchars($x->nodeValue) . "<hr/>"; //this is the element that you want
}
The reason your regex fails is that you are expecting . to match newlines, and it won't unless you use the s modifier, so try
$pattern = "/<div class=\"cat\">(.*)<\/div>/s";
When you do this, you might find the pattern a little too greedy as it will try to capture everything up to the last closing div element. To make it non-greedy, and just match up the very next closing div, add a ? after the *
$pattern = "/<div class=\"cat\">(.*?)<\/div>/s";
This just serves to illustrate that for all but the simplest cases, parsing HTML with regexes is the road to madness. So try using DOM functions for parsing HTML.

Categories