substr and mb_substr return nothing - php

I do not know what is wrong in the code below:
<?php
$html = file_get_contents('https://www.ibar.az/en/');
$doc = new domDocument();
$doc->loadHTML($html);
$doc->preserveWhiteSpace = false;
$ExchangePart = $doc->getElementsByTagName('li');
/*for ($i=0; $i<=$ExchangePart->length; $i++) {
echo $i . $ExchangePart->Item($i)->nodeValue . "<br>";
}*/
$C=$ExchangePart->Item(91)->nodeValue;
var_dump ($C);
$fff=mb_substr($C, 6, 2, 'UTF-8');
echo $fff;
?>
I have tried both substr and mb_substr but in both cases echo $fff; returns nothing.
Could anybody suggest what I am doing wrong?

This is the item 91 node:
<ul>
<li>USD</li>
<li>1.5072</li>
<li>1.462</li>
<li>1.5494</li>
<li class="down"> </li>
</ul>
This is node value:
¶
····························USD¶
································1.5072¶
································1.462¶
································1.5494¶
································•¶
····························
( · = space; • = nbsp )
substr( $C, 6, 2 ) is a string of two spaces.
To correct retrieve all values:
foreach( $ExchangePart->Item(91) as $node )
{
if( trim($node->nodeValue) ) echo $node->nodeValue . '<br>';
}
Otherwise, you can replace all node value spaces:
$C = str_replace( ' ', '', $C );

Related

How to search in XML? (PHP)

I am working on a word application. I'm trying to get values from XML. My goal is getting the first and last letter of a word. Could you help me, please?
<!--?xml version='1.0'?-->
<Letters>
<Letter category='A'>
<FirstLetter>
<Property>First letter is A.</Property>
</FirstLetter>
<LastLetter>
<Property>Last letter is A.</Property>
</LastLetter>
</Letter>
<Letter category='B'>
<FirstLetter>
<Property>First letter is B.</Property>
</FirstLetter>
<LastLetter>
<Property>Last letter is B.</Property>
</LastLetter>
</Letter>
<Letter category='E'>
<FirstLetter>
<Property>First letter is E.</Property>
</FirstLetter>
<LastLetter>
<Property>Last letter is E.</Property>
</LastLetter>
</Letter>
</Letters>
PHP code:
<?php
$word = "APPLE";
$alphabet = "ABCÇDEFGĞHIİJKLMNOÖPQRSŞTUÜVWXYZ";
$index = strpos($alphabet, $word);
$string = $xml->xpath("//Letters/Letter[contains(text(), " . $alfabe[$rakam] . ")]/FirstLetter");
echo "<pre>" . print_r($string, true) . "</pre>";
The letter is in an attribute named 'category'.
$word = "APPLE";
// bootstrap DOM
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
// get first and last letter
$firstLetter = substr($word, 0, 1);
$lastLetter = substr($word, -1);
// fetch text from property elements
var_dump(
$xpath->evaluate(
"string(/Letters/Letter[#category = '$firstLetter']/FirstLetter/Property)"
),
$xpath->evaluate(
"string(/Letters/Letter[#category = '$lastLetter']/LastLetter/Property)"
)
);
Or in SimpleXML
$word = "APPLE";
$letters = new SimpleXMLElement($xml);
$firstLetter = substr($word, 0, 1);
$lastLetter = substr($word, -1);
// SimpleXML does not allow for xpath expression with type casts
// So validation and cast has to be done in PHP
var_dump(
(string)($letters->xpath(
"/Letters/Letter[#category = '$firstLetter']/FirstLetter/Property"
)[0] ?? ''),
(string)($letters->xpath(
"/Letters/Letter[#category = '$lastLetter']/LastLetter/Property"
)[0] ?? '')
);

How to concatenate string continuously in php?

<?php
$test = ' /clothing/men/tees';
$req_url = explode('/', $test);
$c = count($req_url);
$ex_url = 'http://www.test.com/';
for($i=1; $c > $i; $i++){
echo '/'.'<a href="'.$ex_url.'/'.$req_url[$i].'">
<span>'.ucfirst($req_url[$i]).'</span>
</a>';
//echo '<br/>'.$ex_url;....//last line
}
?>
OUTPUT - 1 //when comment last line
/ Clothing / Men / Tees
OUTPUT - 2 //when un-comment last line $ex_url shows
/ Clothing
http://www.test.com// Men
http://www.test.com// Tees
http://www.test.com/
1. Required output -
In span - / Clothing / Men / Tees and last element should not be clickable
and link should created in this way
http://www.test.com/clothing/Men/tees -- when click on Tees
http://www.test.com/clothing/Men -- when click on Men
...respectively
2. OUTPUT 2 why it comes like that
Try this:
<?php
$test = '/clothing/men/tees';
$url = 'http://www.test.com';
foreach(preg_split('!/!', $test, -1, PREG_SPLIT_NO_EMPTY) as $e) {
$url .= '/'.$e;
echo '/<span>'.ucfirst($e).'</span>';
}
?>
Output:
/Clothing/Men/Tees
HTML output:
/<span>Clothing</span>/<span>Men</span>/<span>Tees</span>
Try using foreach() to iterate the array and you'll have to keep track of the path after the url. Try it like so (tested and working code):
<?php
$test = '/clothing/men/tees';
$ex_url = 'http://www.test.com';
$items = explode('/', $test);
array_shift($items);
$path = '';
foreach($items as $item) {
$path .= '/' . $item;
echo '/ <span>' . ucfirst($item) . '</span>';
}
Try this.
<?php
$test = '/clothing/men/tees';
$req_url = explode('/', ltrim($test, '/'));
$ex_url = 'http://www.test.com/';
$stack = array();
$reuslt = array_map(function($part) use($ex_url, &$stack) {
$stack[] = $part;
return sprintf('%s', $ex_url, implode('/', $stack), ucfirst($part));
}, $req_url);
print_r($reuslt);
<?php
$sTest= '/clothing/men/tees';
$aUri= explode( '/', $sTest );
$sBase= 'http://www.test.com'; // No trailing slash
$sPath= $sBase; // Will grow per loop iteration
foreach( $aUri as $sDir ) {
$sPath.= '/'. $sDir;
echo ' / '. ucfirst( $sDir ). ''; // Unnecessary <span>
}
?>

php String Replace Regardless of Capitalization or Quotes

Is there a way to write a string replace over looking capitalization or quotes instead of writing an array for every possible situation?
str_replace(array('type="text/css"','type=text/css','TYPE="TEXT/CSS"','TYPE=TEXT/CSS'),'',$string);
In this case you could do a case-insensitive regular expresion replacement:
Codepad example
preg_replace('/\s?type=["\']?text\/css["\']?/i', '', $string);
You can use DOMDocument to do these kind of things: (thanks for #AlexQuintero for the array of styles)
<?php
$doc = new DOMDocument();
$str[] = '<style type="text/css"></style>';
$str[] = '<style type=text/css></style>';
$str[] = '<style TYPE="TEXT/CSS"></style>';
$str[] = '<style TYPE=TEXT/CSS></style>';
foreach ($str as $myHtml) {
echo "before ", $myHtml, PHP_EOL;
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
removeAttr("style", "type", $doc);
echo "after: ", $doc->saveHtml(), PHP_EOL;
}
function removeAttr($tag, $attr, $doc) {
$nodeList = $doc->getElementsByTagName($tag);
for ($nodeIdx = $nodeList->length; --$nodeIdx >= 0; ) {
$node = $nodeList->item($nodeIdx);
$node->removeAttribute($attr);
}
}
Online example

find a element in html and explode it for stock

I want to retrieve an HTML element in a page.
<h2 id="resultCount" class="resultCount">
<span>
Showing 1 - 12 of 40,923 Results
</span>
</h2>
I have to get the total number of results for the test in my php.
For now, I get all that is between the h2 tags and I explode the first time with space.
Then I explode again with the comma to concatenate able to convert numbers results in European format. Once everything's done, I test my number results.
define("MAX_RESULT_ALL_PAGES", 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$htmlResultCountPage = file_get_html($queryUrl);
$htmlResultCount = $htmlResultCountPage->find("h2[id=resultCount]");
$resultCountArray = explode(" ", $htmlResultCount[0]);
$explodeCount = explode(',', $resultCountArray[5]);
$europeFormatCount = '';
foreach ($explodeCount as $val) {
$europeFormatCount .= $val;
}
if ($europeFormatCount > MAX_RESULT_ALL_PAGES) {*/
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
At the moment the total number of results is not well recovered and the condition does not happen even when it should.
Someone would have a solution to this problem or any other way?
I would simply fetch the page as a string (not html) and use a regular expression to get the total number of results. The code would look something like this:
define('MAX_RESULT_ALL_PAGES', 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);
if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
$totalResults = (int) str_replace(',', '', $matches[1]);
} else {
throw new \RuntimeException('Total number of results not found');
}
if ($totalResults > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
// ...
}
A regex would do it:
...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...
Please try this code.
define("MAX_RESULT_ALL_PAGES", 1200);
// new dom object
$dom = new DOMDocument();
// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');
// Store total result count
$totalCount = 0;
// loop over the all h2 tags and print result
foreach ($nodes as $node) {
if ($node->hasAttributes()) {
foreach ($node->attributes as $attribute) {
if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
$inner_html = str_replace(',', '', trim($node->nodeValue));
$inner_html_array = explode(' ', $inner_html);
// Print result to the terminal
$totalCount += $inner_html_array[5];
}
}
}
}
// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
Give this a try:
$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);
The RegEx reads the number between "of " and " Results", it matches numbers with ',' seperator.

Strip tag with class in PHP

So I need to strip the span tags of class tip.
So that would be <span class="tip"> and the corresponding </span>, and everything inside it...
I suspect a regular expression is needed but I terribly suck at this.
Laugh...
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>
Gives no error... But
<?php
$str = preg_replace('<span class="tip">.+</span>', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>
Gives me the error:
Warning: preg_replace() [function.preg-replace]: Unknown modifier '.' in <A FILE> on line 4
previously, the error was at the ); in the 2nd line, but now.... >.>
This is the "proper" method (adapted from this answer).
Input:
<?php
$str = '<div>lol wut <span class="tip">remove!</span><span>don\'t remove!</span></div>';
?>
Code:
<?php
function recurse(&$doc, &$parent) {
if (!$parent->hasChildNodes())
return;
for ($i = 0; $i < $parent->childNodes->length; ) {
$elm = $parent->childNodes->item($i);
if ($elm->nodeName == "span") {
$class = $elm->attributes->getNamedItem("class")->nodeValue;
if (!is_null($class) && $class == "tip") {
$parent->removeChild($elm);
continue;
}
}
recurse($doc, $elm);
$i++;
}
}
// Load in the DOM (remembering that XML requires one root node)
$doc = new DOMDocument();
$doc->loadXML("<document>" . $str . "</document>");
// Iterate the DOM
recurse($doc, $doc->documentElement);
// Output the result
foreach ($doc->childNodes->item(0)->childNodes as $node) {
echo $doc->saveXML($node);
}
?>
Output:
<div>lol wut <span>don't remove!</span></div>
A simple regular expression like:
<span class="tip">.+</span>
Wont work, the issue being that if another span was opened and closed inside the tip span, your regex will terminate with its ending, rather than the tip one. DOM Based tools like the one linked in the comments will really provide a more reliable answer.
As per my comment below, you need to add pattern delimiters when working with regular expressions in PHP.
<?php
$str = preg_replace('\<span class="tip">.+</span>\', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>
may be moderately more successful. Please take a look at the documentation page for the function in question.
Now without regexp, and without heavy XML parsing:
$html = ' ... <span class="tip"> hello <span id="x"> man </span> </span> ... ';
$tag = '<span class="tip">';
$tag_close = '</span>';
$tag_familly = '<span';
$tag_len = strlen($tag);
$p1 = -1;
$p2 = 0;
while ( ($p2!==false) && (($p1=strpos($html, $tag, $p1+1))!==false) ) {
// the tag is found, now we will search for its corresponding closing tag
$level = 1;
$p2 = $p1;
$continue = true;
while ($continue) {
$p2 = strpos($html, $tag_close, $p2+1);
if ($p2===false) {
// error in the html contents, the analysis cannot continue
echo "ERROR in html contents";
$continue = false;
$p2 = false; // will stop the loop
} else {
$level = $level -1;
$x = substr($html, $p1+$tag_len, $p2-$p1-$tag_len);
$n = substr_count($x, $tag_familly);
if ($level+$n<=0) $continue = false;
}
}
if ($p2!==false) {
// delete the couple of tags, the farest first
$html = substr_replace($html, '', $p2, strlen($tag_close));
$html = substr_replace($html, '', $p1, $tag_len);
}
}

Categories