$string = '<ul>
<li id="34334" class="some_class">Text</li>
<li id="someid" class="more_class">Text</li>
</ul>';
What I need (actually don't know how to do, please help):
We must check <a> inside each <li>
If href of <a> == /stack/, then add extra class current for parent <li>
Like, if we are searching for /stack/, we should get this:
$string = '<ul>
<li id="34334" class="some_class current">Text</li>
<li id="someid" class="more_class">Text</li>
<li id="34334" class="some_class current">Text</li>
</ul>';
$string = '<ul>
<li class="some_class">Text</li>
<li class="more_class">Text</li>
</ul>';
$new_string = preg_replace('/(<li( +id="[^"]*")? +class=")([^"]*)(" *>)( *<a +href="\/stack\/">)/', '$1$3 current$4$5', $string);
Note: Updated
Use jquery
$(document).ready(function(){
$('li > a[href*="stack"]').parent().addClass('newclass');});
Related
I am new to PHP and trying to write a scraper for a website.
I am trying to get an element with class name categories. I have use
$showPage = '<li class="categories">Categories<ul> <li class="cat-item cat-item-940"><a href="http://www.desitvbox.me/category/star-plus/amul-taste-of-india/" >Amul Taste of India</a>
</li>
<li class="cat-item cat-item-942"><a href="http://www.desitvbox.me/category/star-plus/dance-plus/" >Dance Plus</a>
</li>
<li class="cat-item cat-item-239"><a href="http://www.desitvbox.me/category/star-plus/diya-aur-baati-hum-star/" >Diya Aur Baati Hum</a>
</li>
<li class="cat-item cat-item-745"><a href="http://www.desitvbox.me/category/star-plus/suhani-si-ek-ladki/" >Suhani Si Ek Ladki</a>
</li>
<li class="cat-item cat-item-147"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/" >Star Plus Completed Shows</a>
<ul class="children">
<li class="cat-item cat-item-772"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/airlines/" >Airlines</a>
</li>
<li class="cat-item cat-item-518"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/arjun/" >Arjun</a>
</li>
<li class="cat-item cat-item-237"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/chef-pankaj-ka-zayka/" >Chef Pankaj Ka Zayka</a>
</li>
</ul>
</li>
</ul></li>';
$dom = new DOMDocument();
$dom->validateOnParse = true;
$dom->loadHTML($showPage);
$dom->preserveWhiteSpace = false;
$allShowsList = new DOMXPath($dom);
$allShowsTableHTML = $allShowsList->query('//li[contains(#class, "categories")]');
However, I want to now read the values of all a href mentioned in $allShowsTableHTML.
Can you please advise how can I do that?
As you can see one the record also have ul class = 'childern'. which I also want to read.
I need to get the href and the title.
I have tried below but no result.
$allShowTableDom = new DOMDocument();
foreach ($allShowTableHTML as $showLink)
{
$allShowTableDom->appendChild($allShowTableDom->importNode($showLink,true));
}
$showsArray = $allShowsTableHTML->getElementsByTagName('a');
I think it is not going in foreach loop.
To get all href attributes of the hyperlinks, add some more axis steps, finally loop over the result list, where the ->value property will contain the URIs.
Given you can just dump all href attributes inside the whole <li> element, simply extend your query by //a/#href:
$document = new DOMXPath($dom);
$hrefs = $document->query('//li[contains(#class, "categories")]//a/#href');
foreach ($hrefs as $href) {
echo $href->value;
}
If this contains nodes you don't want to get, you could also descend the contain unsorted list and select with a more specific query:
//li[contains(#class, "categories")]/ul/li/a/#href
This is my HTML part of code:
<ul>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li>
<h5>Price</h5>
<span>100$</span>
</li>
</ul>
In my php I am using php-simple-dom for finding tags. So php part looks something like this:
foreach($html->find("li") as $li)
{
if(strpos($li->plaintext,"<h5>Price</h5>") !== false)
{
var_dump($li->plaintext); // result: string("<h5>Price</h5><span>100$</span>")
}
}
I have some other idea:
foreach($html->find("h5") as $h5)
{
if(strpos($h5->plaintext,"Price") !== false)
{
// finding some way to read next tag
}
}
What I need ?
I need to get <span> value. This is example, in real code there are more tags and multiple spans in one <li>. But point is that next tag contain wanted information.
I'm not pretty sure how many tags could be in one <li>, but I belive <span> you are looking for is always after <h5>. You can use method $e->next_sibling() as follows:
foreach ($html->find('li h5') as $h5) {
$price = $h5->next_sibling();
echo $price->plaintext;
}
So you want to get a value of a specific tag, you could find DOMDocument::getElementsByTagName useful.
Return Values
A new DOMNodeList object containing all the matched elements.
Here is how you would use it:
$html = <<< HTML
<ul>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li> something,,,,... </li>
<li>
<h5>Price</h5>
<span>100$</span>
</li>
</ul>
HTML;
$dom = new DOMDocument;
$dom->loadXML($html);
$prices = $dom->getElementsByTagName('span');
foreach ($prices as $price) {
echo $price->nodeValue, PHP_EOL;
}
The above example will output: 100$
Go ahead and try it with several prices. It works as excepted.
You might also find the DOM documentation useful.
echo $nav gives code like this:
<ul>
<li class="someclass">sometext
<ul>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
</ul>
</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
</ul>
There are list items with class spacer inside each child ul, after each normal list item.
How do I remove the spacer list items which are grandchildren of the main list, using PHP?
Example: <ul> <li> <ul> <li class="spacer">
I'm searching for a regular expression, which should erase <li class="spacer"></li> only in a child <ul> element.
If you don't have access to the $nav variable to remove it (which you likely do) then I'd just use CSS to hide it, something like this should work:
li ul li.spacer {
display:none;
}
If however you have access to $nav - delete that spacer li from the code. Simples.
Also, on a side note. having empty elements like that on the page as "spacers" is semantically bad. This should be handled via CSS, add margins/padding on other elements on the page, don't use a class of spacer, if you do then you may as well go back to using stray <br /> tags everywhere to create spaces.
$xml = new SimpleXMLElement($nav);
$spacers = $xml->xpath('li//li[#class="spacer"]');
foreach($spacers as $i => $n) {
unset($spacers[$i][0]);
}
echo $xml->asXML();
This is converting to XML (use a recent PHP 5.3 version and DOMDocument to export to HTML). Output:
<?xml version="1.0"?>
<ul>
<li class="someclass">sometext
<ul>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
</ul>
</li>
<li class="spacer"/>
<li class="someclass">sometext</li>
<li class="spacer"/>
</ul>
How about str_replace?
$nav = str_replace('<li class="spacer"></li>','',$nav);
edited code below
Based on the new requirement this code works. I know its hacky and sloppy but it works:
$temp = explode("\n",$nav);
for ($i=0;$i<count($temp);$i++) {
if (strstr($temp[$i],"<ul>")) {
$nested_ul = 1;
}
if (strstr($temp[$i],"</ul>")) {
$nested_ul = 0;
}
if ($nested_ul==0) {
if (!strstr($temp[$i],"spacer")) {
$new_nav .= $temp[$i]."\n";
}
} else {
$new_nav .= $temp[$i]."\n";
}
}
echo $new_nav;
"Easily" is relative. It depends on a few things. If you want, modify where the $nav is getting generated from.
use preg_replace to replace the li tags:
$new_nav = preg_replace('/<li class="spacer"></li>/', '', $nav);
echo $nav;
There are multiple ways:
Do not create it. It will be easier if you do not create something you do not want. It will be easier to maintain. So if you have any control over what is generated into $var string, just change it.
Simply replace it like that: str_replace('<li class="spacer"></li>', $var).
Use some HTML parser and remove the nodes.
Use JavaScript to remove <li class="spacer"></li> on client side.
Use substr_replace and strpos instead of str_replace, and specify an offset just after the first spacer.
http://www.php.net/manual/en/function.substr-replace.php
http://www.php.net/manual/en/function.strpos.php
Add the following CSS
ul ul li.spacer { display: none; }
Try this:
$nav = str_replace('<li class="spacer"></li>', '', $nav);
I'm trying to split some html content using php's preg-match-all function:
<li class="cat-item"><a title="blabla" href="#">parent 1</a>
<ul class="children">
<li class="cat-item"><a title="" href="#">child 1</a></li>
</ul>
</li>
<li class="cat-item cat-item-4"><a title="blabla" href="#">father 2</a>
<ul class="children">
<li class="cat-item"><a title="" href="#">child 1</a></li>
<li class="cat-item"><a title="bla" href="#">child 2</a></li>
</ul>
</li>
I want to be able to change the link description, for example;
<a title="" href="#">child 1</a>
to
<a title="" href="#">I changed that</a>
while keeping the structure of the original html.
so far, I succeeded to split the links using :
$results = preg_match_all('/<a\s[^>]*href\s*=\s*(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU', $html, $tokens);
foreach ( $tokens[0] as $category)
{
echo $category.'<br>';
}
the drawback of this is that it discards child lists, and outputs all the list items in the same level; no distinction between parent and child.
any idea to keep original hierarchy?
thanx :)
use preg_replace to replace strings! something like this here:
$output = preg_replace("/^([123]0|[012][1-9]|31)(\\.|-|\/|,)(0[1-9]|1[012])(\\.|-|\/)(19[0-9]{2}|2[0-9]{3})$/","$1",$in_nn_date);
where $1 or $2 is the thing you have searched with regex and grouped in.
best would be that you use some online editor or something... like this one
and try there! hope it helps...
We have a variable with hmtl code inside.
<?php echo $list; ?>
This will give something like:
<li><a href='http://site.com/2010/' title='2010'>2010</a></li>
<li><a href='http://site.com/2009/' title='2009'>2009</a></li>
<li><a href='http://site.com/2008/' title='2008'>2008</a></li>
Want to add class for each <li>, it can be taken from title attribute:
<li class="y2010"><a href='http://site.com/2010/' title='2010'>2010</a></li>
<li class="y2009"><a href='http://site.com/2009/' title='2009'>2009</a></li>
<li class="y2008"><a href='http://site.com/2008/' title='2008'>2008</a></li>
We should work with variable $list.
Tentative scheme:
search for title attribute in each
<li>....</li>
throw its value to the class, which we add for opening <li>
PHP solution wanted.
Thanks.
Parsing the DOM sounds like overkill to me, if I understand the problem you're facing. Assuming that you know for sure that the entire contents of the $list variable will be structured as <li><a href='foo' title='bar'>bar</a></li> then you can do what you're asking pretty easily by combining regular expressions with a loop:
$list = "<li><a href='http://site.com/2010/' title='2010'>2010</a></li>
<li><a href='http://site.com/2009/' title='2009'>2009</a></li>
<li><a href='http://site.com/2008/' title='2008'>2008</a></li>";
preg_match_all("/title='([^']*)'/s",$list,$matches); //this gets all titles
$output=$list;
foreach($matches[1] as $match) { //this applies the titles to the li elements
$location = strpos($output,"<li>");
$output = substr($output,0,$location)."<li class='".$match."'>".substr($output,$location+4);
}
If you echo $output:
<li class="y2010"><a href='http://site.com/2010/' title='2010'>2010</a></li>
<li class="y2009"><a href='http://site.com/2009/' title='2009'>2009</a></li>
<li class="y2008"><a href='http://site.com/2008/' title='2008'>2008</a></li>
I accomplished this by splitting the text into an array, and performing a search/replace once the year is obtained.
$carrReturn="\r\n"; //Set the Newline and Return string to search for
$arr = explode($carrReturn, $list); //Break the text into an array
$list=""; //clear $list
for ($x=0; $x<count($arr); $x++){
$current=$arr[$x];
$year= strip_tags($current); //Get the year by stripping the HTML tags.
$list.=str_replace("<li", "<li class=\"y".$year."\"",$current)."\r\n";
//Reconstruct $list
}
Output
<li class="y2010"><a href='http://site.com/2010/' title='2010'>2010</a></li>
<li class="y2009"><a href='http://site.com/2009/' title='2009'>2009</a></li>
<li class="y2008"><a href='http://site.com/2008/' title='2008'>2008</a></li>
I dont know why you guys are so obsessed with Regex. DOM is clean and readable:
$dom = new DOMDocument;
$dom->loadXML("<ul>$list</ul>");
$xPath = new DOMXPath($dom);
foreach($xPath->query('//li/a/#title') as $node) {
$node->parentNode->parentNode->setAttribute('class', $node->nodeValue);
}
echo $dom->saveXML($dom->documentElement);
Outputs:
<ul>
<li class="2010">2010</li>
<li class="2009">2009</li>
<li class="2008">2008</li>
</ul>
RegEx:
preg_replace("/<li>(<a .+ title=')(\d{4})'/", "<li title='y$2'>$1$2", $string);
This really depends on every li and anchor being formatted the same exact way each time though.