Unfortunately I really cannot get my head around regular expressions so my last resort is to ask the help of you fine people.
I have this existing code:
<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>
Due to a number of reasons, I have to use preg_replace to inject an additional piece of code:
Link 1
I think you can guess where that should go, but for the sake of clarity, my desire is for the resulting string to look like:
<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 1
Link 2
Link 3
</div>
</li>
Can anyone help me with the appropriate regular expression to achieve this?
try this
$html = '<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>';
$eleName = 'a';
$eleAttr = 'href';
$eleAttrValue = 'link2';
$addBefore = 'Link 1';
$result = regexAddBefore($html, $eleName, $eleAttr, $eleAttrValue, $addBefore);
var_dump($result);
function regexAddBefore($subject, $eleName, $eleAttr, $eleAttrValue, $addBefore){
$regex = "/(<\s*".$eleName."[^>]*".$eleAttr."\s*=\s*(\"|\')?\s*".$eleAttrValue."\s*(\"|\')?[^>]*>)/s";
$replace = $addBefore."\r\n$1";
$subject = preg_replace($regex, $replace, $subject);
return $subject;
}
I can suggest two things (Although I couldn't understand your problem clearly)
$newStr = preg_replace ('/<[^>]*>/', ' ', $htmlText);
this will remove all the html tags from the string. I don't know if it will be usefull for you.
Another recommendation would be to use strip_tags function. The second parameter of strip_tags is optional. You can define the tags you want to keep with the help of 2nd parameter.
$str = '<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>';
echo strip_tags ($str,'<a>');
This will give you an output just with the links and whatever text in the html string.
Sorry if this also doesn't help.
Related
I'm trying to scrape a webpage using phpsimpledom.
$html = '<div class="namepageheader">
<div class="u">Name: Noor Shaad
<div class="u">Age: </div>
</div> '
$name=$html->find('div[class="u"]', 0)->innertext;
$age=$html->find('div[class="u"]', 1)->innertext;
I tried my best to get text from each class="u" but it didn't work because there is missing closing tag </div> on first tag <div class="u">. Can anyone help me out with that....
You can find an element close to where the tag should have been closed and then standardize the html by replacing it.
For example, you can replace the </a> tag by </a></div>.
str_replace('</a>','</a></div>',$html)
or if there are too many closed </a> tags , replace </a><div class="u"> with </a></div><div class="u">
str_replace('</a><div class="u">','</a></div><div class="u">',$html)
There may be another problem. There is a gap between the tags and the replacement does not work properly. To solve this problem, you can first delete the spaces between the tags and then replace them.
$html = '<div class="namepageheader">
<div class="u">Name: Noor Shaad
<div class="u">Age: </div>
</div> ' ;
$html = preg_replace('~>\\s+<~m', '><', $html);
str_replace('</a><div class="u">','</a></div><div class="u">',$html);
$name=$html->find('div[class="u"]', 0)->innertext;
$age=$html->find('div[class="u"]', 1)->innertext;
I am having some issues exploding this title Song Artist – Song Name I am using the following code and having not very much luck.
$title2 = $html2->find('header.section-header h2',0);
$links = $title2->plaintext;
$str = explode ("–", $links);
$artist = preg_replace('#\[[a-zA-Z].*\]#','',$str[0]);
$song = preg_replace('#\[[a-zA-Z].*\]#','',$str[1]);
print '<div class="song"> <div class="options"> <a class="play" href="'.$url.'" data-url="'.$url.'" data-title="'.$artist.'"> </a> <a class="download" href="'.$url.'"> </a> </div> <div class="info"> <a class="direct" href="'.$url.'"> <div class="artist">'.$artist.'</div> <div class="title">A Rainy Night In Harlem (Freestyle)</div> </a> </div> </div>';
It should look like this when I display.
But instead it returns something that looks like this.
I found something that might interest you:
Add echo htmlentities($title)."<br>"; under $title2=$title->plaintext;, IN YOUR ORIGINAL code. Like this
$title2 = $title->plaintext;
echo htmlentities($title)."<br>";
Gives me: (for example:)
<h2 itemprop="name">Chris Brown – You Make Me This Way (I Got You) (LQ)</h2>
No - but a –
That is why the explode didn't work. You might get away with exploding on –
Checked it over here, and it seems to work.
Sorry for all the edits, I had a hard time displaying – :-)
You may use trim to remove any unwanted whitespaces:
<?php
$title2 = $html2->find('header.section-header h2',0);
$links = $title2->plaintext;
$str = explode ("–", $links);
$artist = trim($str[0]);
$song = trim($str[1]);
?>
<div class="song">
<div class="options">
<a class="play" href="" data-url="" data-title=""></a>
<a class="download" href=""></a>
</div>
<div class="info">
<a class="direct" href="">
<div class="artist"><?php echo $artist;?></div>
<div class="title"><?php echo $song;?></div>
</a>
</div>
</div>
The html content is:
<div id="sns-availability" class="a-section a-spacing-none">
<div class="a-section a-spacing-mini">
<span class="a-size-medium a-color-success">
In Stock.
</span>
<span class="a-size-base">
Ships Soon.
</span>
</div>
</div>
and from my code below, the output is :
In Stock. Ships soon.
I'm wondering how to extract only :
In Stock.
Can someone help?
include_once('simple_html_dom.php');
$url = "xxx";
$html = file_get_html($url);
$output = $html->find('div[id=sns-availability]');
$output = $output[0]->first_child();
echo $output;
That would simply be:
$html->find('#sns-availability span', 0);
You can probably add another firstchild()
$output = $output[0]->first_child()->first_child();
You only navigate to the div that groups the two sub-divs whos content is echoed. You need to get to the first one of those two children. As illustrated in my simplification here:
<div>
<div> <-- you are here
<span>In stock</span> <-- need to get here
<span>Ships soon</span>
</div>
<div>
According documentation
// Find all <span> with class=gb1
$result = $dom->find('span.gb1');
try to
$result = $dom->find('span.a-size-medium a-color-success');
echo $result->plaintext;
I am trying to extract contents that lie outside two sets of html tags.
The HTML is set up like so:
<div class="col-md-4 col-sm-6 col-lg-3">
<small class="text-muted pull-right">4.4</small>
<i class="custom-icon"></i>
desired content to retrieve
<span class="text-muted">some other text here</span>
</div>
I need to retrieve the content "desired content to retrieve" which lies after the </i> and before the <span class="text-muted">.
I've tried:
$custom_regex= '#</i>(.*?)<span class="text-muted">#';
$text_scan = preg_match_all( $custom_regex, $content_to_scan, $text_array );
with no success. The $text_array variable returns empty.
I'm not that great with regex, so maybe my expression is incorrect for what I'm after.
Wouldn't usage of lookarounds be better?
(?<=<\/i>)\s*(.*?)\n.*(?=<span)
Demo: https://regex101.com/r/zK2wD8/8
If you insist on regex, try this.
/<\/i>\s*(.*?)\n.*<span class="text-muted"/g
Good day.
I want to file_get_content() to load a webpage and use strip_tags() to get the string: Category:Apple.
<div class="category">
<span style="font-size:11px; font-weight:bold;">Category:</span>
<a href="/listing/A/new/yp/search.do?applicationInd=A"
class="category">Apple
</a>
</div>
eg.
$text = '<div class="category">
<span style="font-size:11px; fontweight:bold;">
Category:</span><a href="/listing/A/new/yp/search.do?applicationInd=A"
class="category">Apple</a></div>';
echo strip_tags($text); //Category:Apple
What php statement do I need to do to pass variable to $text for that ... ?
Use preg_match?
Try this
preg_match('|<a href="([^"]*?)" class="category">([^>]*?)<\/a>|smi', $text, $matches);
$cat = "Category:" . $matches[2];
echo $cat;