PHP string search and replace - possible use of DOM Needed - php

I cant seem to figure out how to achieve my goal.
I want to find and replace a specific class link based off of a generated RSS feed (need the option to replace later no matter what link is there)
Example HTML:
<a class="epclean1" href="#">
WHAT IT SHOULD LOOK LIKE:
<a class="epclean1" href="google.com">
May need to incorporate get element using DOM as the Full php has a created document. If that is the case I would need to know how to find by class and add the href url that way.
FULL PHP:
<?php
$rss = new DOMDocument();
$feed = array();
$urlArray = array(array('url' => 'https://feeds.megaphone.fm')
);
foreach ($urlArray as $url) {
$rss->load($url['url']);
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue
);
array_push($feed, $item);
}
}
usort( $feed, function ( $a, $b ) {
return strcmp($a['title'], $b['title']);
});
$limit = sizeof($feed);
$previous = null;
$count_firstletters = 0;
for ($x = 0; $x < $limit; $x++) {
$firstLetter = substr($feed[$x]['title'], 0, 1); // Getting the first letter from the Title you're going to print
if($previous !== $firstLetter) { // If the first letter is different from the previous one then output the letter and start the UL
if($count_firstletters != 0) {
echo '</ul>'; // Closing the previously open UL only if it's not the first time
echo '</div>';
}
echo '<button class="glanvillecleancollapsible">'.$firstLetter.'</button>';
echo '<div class="glanvillecleancontent">';
echo '<ul style="list-style-type: none">';
$previous = $firstLetter;
$count_firstletters ++;
}
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
echo '<li>';
echo '<a class="epclean'.$i++.'" href="#" target="_blank">'.$title.'</a>';
echo '</li>';
}
echo '</ul>'; // Close the last UL
echo '</div>';
?>
</div>
</div>
The above fullphp shows on site like so (this is shortened as there is 200+):
<div class="modal-glanvillecleancontent">
<span class="glanvillecleanclose">×</span>
<p id="glanvillecleaninstruct">Select the first letter of the episode that you wish to get clean version for:</p>
<br>
<button class="glanvillecleancollapsible">8</button>
<div class="glanvillecleancontent">
<ul style="list-style-type: none">
<li><a class="epclean1" href="#" target="_blank">80's Video Vixen Tawny Kitaen 044</a></li>
</ul>
</div>
<button class="glanvillecleancollapsible">A</button>
<div class="glanvillecleancontent">
<ul style="list-style-type: none">
<li><a class="epclean2" href="#" target="_blank">Abby Stern</a></li>
<li><a class="epclean3" href="#" target="_blank">Actor Nick Hounslow 104</a></li>
<li><a class="epclean4" href="#" target="_blank">Adam Carolla</a></li>
<li><a class="epclean5" href="#" target="_blank">Adrienne Janic</a></li>
</ul>
</div>

You're not very clear about how your question relates to the code shown, but I don't see any attempt to replace the attribute within the DOM code. You'd want to look at XPath to find the desired elements:
function change_clean($content) {
$dom = new DomDocument;
$dom->loadXML($content);
$xpath = new DomXpath($dom);
$nodes = $xpath->query("//a[#class='epclean1']");
foreach ($nodes as $node) {
if ($node->getAttribute("href") === "#") {
$node->setAttribute("href", "https://google.com/");
}
}
return $dom->saveXML();
}
$xml = '<?xml version="1.0"?><foo><bar><a class="epclean1" href="#">test1</a></bar><bar><a class="epclean1" href="https://example.com">test2</a></bar></foo>';
echo change_clean($xml);
Output:
<foo><bar><a class="epclean1" href="https://google.com/">test1</a></bar><bar><a class="epclean1" href="https://example.com">test2</a></bar></foo>

Hmm. I think your pattern and replacement might be your problem.
What you have
$pattern = 'class="epclean1 href="(.*?)"';
$replacement = 'class="epclean1 href="google.com"';
Fix
$pattern = '/class="epclean1" href=".*"/';
$replacement = 'class="epclean1" href="google.com"';

Related

Problem to get content from html string variable in php

I have this code:
$doc = new \DOMDocument();
$doc->loadHTML($content);
$links = [];
$container = $doc->getElementById("content");
$arr = $container->getElementsByTagName("a");
foreach($arr as $item) {
$href = $item->getAttribute("href");
$title = $item->getAttribute("title");
$links[] = [
'href' => $href,
'title' => $title
];
}
for($i = 0, $l = count($links); $i < $l; ++$i) {
echo $links[$i]['title'].' '.$links[$i]['href'].'<br />';
}
The html structure is like that:
<div class="post-content right-col">
<a title="" href="https://www.swisscars.pl/samochody/516321/">
<img src="https://swisscars.pl/uploads2/180843_0.jpg" alt="" class="thumb alignleft" height="75" width="75"/>
</a>
<h2 style="line-height:150%;">
<a href="https://www.swisscars.pl/samochody/516321/" rel="bookmark" title="Renault Kangoo II (96’011 km)">
Renault Kangoo II (96’011 km) </a>
</h2>
Do końca aukcji: <span id="countdown100">2018-10-23 14:00:00 GMT+02:00</span><p>DATA ZAKONCZENIA AUKCJI: 2018-10-23 14:00</p>
</div>
</div>
I want to get only values from a tag witch attribute rel="bookmark". Please help me with this. I try to use hasAttribute function but is not working. Please describe me what I can get only content from a tag with rel="bookmark" attribute. PHP have hasAttribute() function or something like this function?
Thanks for help
XPath might be an option, you could do this:
$doc = new \DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXPath($dom);
$links = $xpath->query('//a[#rel="bookmark"]');
That should return a DOMNodeList you could loop through.

parse the html data to array data in php

I am trying to parse the html format data into arrays using the a tag classes but i was not able to get the desired format . Below is my data
$text ='<div class="result results_links results_links_deep web-result ">
<div class="links_main links_deep result__body">
<h2 class="result__title">
<a rel="nofollow" class="result__a" href="">Text1</a>
</h2>
<a class="result__snippet" href="">Text1</a>
<a class="result__url" href="">
example.com
</a>
</div>
</div>
<div class="result results_links results_links_deep web-result ">
<div class="links_main links_deep result__body">
<h2 class="result__title">
<a rel="nofollow" class="result__a" href="">text3</a>
</h2>
<a class="result__snippet" href="">text23</a>
<a class="result__url" href="">
text.com
</a>
</div>
</div>';
I am trying to get the result using below code
$lines = explode("\n", $text);
$out = array();
foreach ($lines as $line) {
$parts = explode(" > ", $line);
$ref = &$out;
while (count($parts) > 0) {
if (isset($ref[$parts[0]]) === false) {
$ref[$parts[0]] = array();
}
$ref = &$ref[$parts[0]];
array_shift($parts);
}
}
print_r($out);
But i need the result exactly like below
array:2 [
0 => array:3 [
0 => "Text1"
1 => "Text1"
2 => "example.com"
]
1 => array:3 [
0 => "text3"
1 => "text23"
2 => "text.com"
]
]
Demo : https://eval.in/746170
Even i was trying dom like below in laravel :
$dom = new DOMDocument;
$dom->loadHTML($text);
foreach($dom->getElementsByTagName('a') as $node)
{
$array[] = $dom->saveHTML($node);
}
print_r($array);
So how can i use the classes to separate the data as i wanted .Any suggestions please.Thank you .
Here you go, try this and tell me if you need any more help:
<?php
$test = <<<EOS
<div class="result results_links results_links_deep web-result ">
<div class="links_main links_deep result__body">
<h2 class="result__title">
<a rel="nofollow" class="result__a" href="">Text1</a>
</h2>
<a class="result__snippet" href="">Text1</a>
<a class="result__url" href="">
example.com
</a>
</div>
</div>
<div class="result results_links results_links_deep web-result ">
<div class="links_main links_deep result__body">
<h2 class="result__title">
<a rel="nofollow" class="result__a" href="">text3</a>
</h2>
<a class="result__snippet" href="">text23</a>
<a class="result__url" href="">
text.com
</a>
</div>
</div>
EOS;
$document = new DOMDocument();
$document->loadHTML($test);
// first extract all the divs with the links_deep class
$divs = [];
foreach ($document->getElementsByTagName('div') as $div) {
$classes = $div->attributes->getNamedItem('class')->nodeValue;
if (!$classes) continue;
$classes = explode(' ', $classes);
if (in_array('links_main', $classes)) {
$divs[] = $div;
}
}
// now iterate through them and retrieve all the links in order
$results = [];
foreach ($divs as $div) {
$temp = [];
foreach ($div->getElementsByTagName('a') as $link) {
$temp[] = $link->nodeValue;
}
$results[] = $temp;
}
var_dump($results);
Working version - http://sandbox.onlinephpfunctions.com/code/e7ed2615ea32c5b9f0a89e3460da28a2702343f1
I will do it using DOMDocument and DOMXPath to target interesting parts more easily. In order to be more precise, I register a function that checks if a class attribute contains a set of classes:
function hasClasses($attrValue, $requiredClasses) {
$requiredClasses = explode(' ', $requiredClasses);
$classes = preg_split('~\s+~', $attrValue, -1, PREG_SPLIT_NO_EMPTY);
return array_diff($requiredClasses, $classes) ? false : true;
}
$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);
$xp = new DOMXPath($dom);
$xp->registerNamespace('php', 'http://php.net/xpath');
$xp->registerPhpFunctions('hasClasses');
$mainDivClasses = 'result results_links results_links_deep web-result';
$childDivClasses = 'links_main links_deep result__body';
$divNodeList = $xp->query('//div[php:functionString("hasClasses", #class, "' . $mainDivClasses . '")]
/div[php:functionString("hasClasses", #class, "' . $childDivClasses . '")]');
$results = [];
foreach ($divNodeList as $divNode) {
$results[] = [
trim($xp->evaluate('string(./h2/a[#class="result__a"])', $divNode)),
trim($xp->evaluate('string(.//a[#class="result__snippet"])', $divNode)),
trim($xp->evaluate('string(.//a[#class="result__url"])', $divNode))
];
}
print_r($results);
without registering a function, you can also use the XPath function contains in your predicates. It's less precise since it only checks if a substring is in a larger string (and not if a class attribute have a specific class like the hasClasses function) but it must be enough:
$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);
$xp = new DOMXPath($dom);
$divNodeList = $xp->query('//div[contains(#class, "results_links_deep")]
[contains(#class, "web-result")]
/div[contains(#class, "links_main")]
[contains(#class, "links_deep")]
[contains(#class, "result__body")]');
$results = [];
foreach ($divNodeList as $divNode) {
$results[] = [
trim($xp->evaluate('string(./h2/a[#class="result__a"])', $divNode)),
trim($xp->evaluate('string(.//a[#class="result__snippet"])', $divNode)),
trim($xp->evaluate('string(.//a[#class="result__url"])', $divNode))
];
}
print_r($results);

Want to get specific data from a webpage

I am trying hard to get data from following portion of a webpage
<div id="menu_pannel">
<ul class="sf-menu" id="nav">
<li class="current"><a href="/" class="current" >Home</a></li>
<li class="">Schedule</li>
<li class="">All Channels</li>
<li class="">Sports Channels
<ul id="submenu">
<li>Sky Sports 1</li>
<li>Sky Sports 2</li>
<li><a href="http://www.time4tv.com/2011/03/sky-sports-3.php">Sky Sports
I want to get data from for that i am using
$pattern = '|<ul id="nav" class="sf-menu">(.*?)</ul>|';
preg_match($pattern, $html, $data);
but getting emty array .
if strip_tags($html) doesn't returns what you want, you can use this example to get an array of text:
function getTextBetweenTags($string, $tagname) {
preg_match_all("#<$tagname.*?>([^<]+)</$tagname>#", $string, $matches);
return $matches[1];
}
$values = getTextBetweenTags ($html, 'a' );
foreach($values as $value) {
echo $value . '<br>';
}
where $html is a var containing your html.
If you decide to use dom parser
$doc = new DOMDocument();
$doc->loadHTML($str);
$x = new DomXpath($doc);
$ul = $x->query('//ul[#id="nav"]'); // 'id' is a unique identifier!
// Echo outerHTML of ul[#id="nav"]
echo $doc->saveHTML($ul->item(0));
demo
Use DOMDocument class for manipulating HTML content:
// $html_str is your html fragment
$doc = new DOMDocument();
$doc->loadHTML($html_str);
$ul_content = "";
$ul = $doc->getElementsByTagName("ul")->item(0);
if ($ul && $ul->getAttribute('class') == 'sf-menu') {
foreach ($ul->childNodes as $n) {
$ul_content .= $doc->saveHTML($n);
}
}
echo $ul_content;

Getting element inside other element by class php DOMDocument

Hi Guys i do have this Html Code :
<div class="post-thumbnail2">
<a href="http://example.com" title="Title">
<img src="http://linkimgexample/image.png" alt="Title"/>
</a>
</div>
I want to get the value of src image (http://linkimgexample/image.png) and the value of the href link (http://example.com) using php DOMDocument
what i did to get the link was something like that :
$divs = $dom->getElementsByTagName("div");
foreach($divs as $div) {
$cl = $div->getAttribute("class");
if ($cl == "post-thumbnail2") {
$links = $div->getElementsByTagName("a");
foreach ($links as $link)
echo $link->getAttribute("href")."<br/>";
}
}
i could do the same for src img
$imgs = $div->getElementsByTagName("img");
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
but sometime in the website there is no image and the Html code is like that :
<div class="post-thumbnail2">
</div>
so my questions is how could i get the 2 value at the same time it means when there is no image i show some message
to be more clear this is an example :
<div class="post-thumbnail2">
<a href="http://example1.com" title="Title">
<img src="http://linkimgexample/image1.png" alt="Title"/>
</a>
</div>
<div class="post-thumbnail2">
</div>
<div class="post-thumbnail2">
<a href="http://example3.com" title="Title">
<img src="http://linkimgexample/image2.png" alt="Title"/>
</a>
</div>
i want the result to be
http://example1.com - http://linkimgexample/image1.png
http://example2.com - there is no image here !
http://example3.com - http://linkimgexample/image2.pn
DOMElement::getElementsByTagName returns a DOMNodeList, that means you can find out if a img-element was found by checking the length property.
$imgs = $div->getElementsByTagName("img");
if($imgs->length > 0) {
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
} else {
echo "there is no image here!<br/>";
}
You should think about using XPath - it makes your life traversing the DOM a bit easier:
$doc = new DOMDocument();
if($doc->loadHtml($xmlData)) {
$xpath = new DOMXPath($doc);
$postThumbLinks = $xpath->query("//div[#class='post-thumbnail2']/a");
foreach($postThumbLinks as $link) {
$imgList = $xpath->query("./img", $link);
$imageLink = "there is no image here!";
if($imgList->length > 0) {
$imageLink = $imgList->item(0)->getAttribute('src');
}
echo $link->getAttribute('href'), " - ", $link->getAttribute('title'),
" - ", $imageLink, "<br/>", PHP_EOL;
}
} else {
echo "can't load HTML document!", PHP_EOL;
}

Catch first and last <li> inside php variable

echo $ul; // gives this code:
<ul id="menu">
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class">...</li>
</ul>
How to add some class for the first and the last <li>?
Need a regex solution.
echo $ul; should give (if we add class my_class for the last <li>):
<ul id="menu">
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class my_class">...</li>
</ul>
The DOM solution
$dom = new DOMDocument;
$dom->loadHTML( $ul );
$xPath = new DOMXPath( $dom );
$xPath->query( '/html/body/ul/li[last()]/#class' )
->item( 0 )
->value .= ' myClass';
echo $dom->saveXml( $dom->getElementById( 'menu' ) );
If you know the HTML to be valid, you can also use loadXML instead. That would make DOM not add ther HTML skeleton. Note that you have to change the XPath to '/ul/li[last()]/#class' then.
In case you are not familiar with XPath queries, you can also use the regular DOM interface, e.g.
$dom = new DOMDocument;
$dom->loadHTML( $ul );
$liElements = $dom->getElementsByTagName( 'li' );
$lastLi = $liElements->item( $liElements->length-1 );
$classes = $lastLi->getAttribute( 'class' ) . ' myClass';
$lastLi->setAttribute( 'class', $classes );
echo $dom->saveXml( $dom->getElementById( 'menu' ) );
EDIT Since you changed the question to have classes for first and last now, here is how to do that using XPath. This assumes your markup is valid XHTML. If not, switch back to loadHTML (see code above):
$dom = new DOMDocument;
$dom->loadXML( $html );
$xpath = new DOMXPath( $dom );
$first = $xpath->query( '/ul/li[1]/#class' )->item( 0 );
$last = $xpath->query( '/ul/li[last()]/#class' )->item( 0 );
$last->value .= ' last';
$first->value .= ' first';
echo $dom->saveXML( $dom->documentElement );
Alternatively, you could use "#menu li:last-child" in your CSS instead of a class name, that way you don't have to modify your PHP code.
If you MUST use regex for this(not exactly to be advised).
I think this should work...
$replacement1 = "<li\s.*?class="(.*?)".*?>.*?</li>\s</ul>";
$string1 = "$1 class_last";
$ul = preg_replace($ul, $replacement1, $string1);)
$replacement2 = "<ul.*?>\s<li\s.*?class="(.*?)".*?>";
$string2 = "$1 class_first";
$ul = preg_replace($ul, $replacement2, $string2);)
If you really want to do that job with regex, you could try :
$ul = '
<ul id="menu">
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class">...</li>
</ul>
';
// explode input string to an array
$lines = explode("\n", $ul);
$found_first = 0; // is the first li founded
$found_last = 0; // index of the last li
$class_first = "class_first"; // class for the first li
$class_last = "class_last"; // class for the last li
// loop on all lines
for ($i = 0; $i < count($lines); $i++) {
$line = $lines[$i];
// the line begins with <li
if (preg_match("/^<li/", $line)) {
// is it the first one
if (!$found_first) {
// add the class
$lines[$i] = preg_replace('/ class="([^"]+?)"/', " class=\"$1 $class_first\"", $lines[$i]);
// the first li have been found
$found_first = 1;
}
// memo the last line proceded
$found_last = $i;
}
}
// this will add class_last even if the last li
// is also the first one (ie: only one li)
if ($found_last) {
$lines[$found_last] = preg_replace('/ class="([^"]+?)"/', " class=\"$1 $class_last\"", $lines[$found_last]);
}
$ul = implode("\n", $lines);
echo $ul;
Ouput:
<ul id="menu">
<li id="some_id" class="some_class class_first">...</li>
<li id="some_id" class="some_class">...</li>
<li id="some_id" class="some_class class_last">...</li>
</ul>
You can use counters:
<?php
$list = array('aaa', 'bbb', 'ccc', 'ddd');
$items = count($list); // count items in list
$i = 1; // set counter to one, because first item in list will be item number: 1
echo '<ul>';
// create loop
foreach($list as $value) {
// first item
if($i == 1) {
$class = 'some_class first_class';
// last item
} elseif ($i == $items) {
$class = 'some_class last_class';
// not first / not last item
} else {
$class = 'some_class';
}
echo '<li id="some_id" class="'. $class .'">' . $value . '</li>';
$i++; // raise $i by one
}
echo '</ul>';
?>
Will output:
<ul>
<li id="some_id" class="some_class first_class">aaa</li>
<li id="some_id" class="some_class">bbb</li>
<li id="some_id" class="some_class">ccc</li>
<li id="some_id" class="some_class last_class">ddd</li>
</ul>
However, my suggestion would be:
<ul id="menu">
<li>aaa</li>
<li>bbb</li>
<li>ccc</li>
<li>ddd</li>
</ul>
Within your css:
#menu {
}
#menu li:first-child {
}
#menu li {
}
#menu li:last-child {
}

Categories