I'm very new to php. I understand that echo is how you output text, but not sure how to apply it with the below scenario. Below, data is being scraped and outputted. Wondering if there's a way with the file_put_contents to add a text to the output, and the text I'm trying to add is a "%". Reason is the output of the below code is a random number that changes daily, and it's in fact a percent, so I'd like to add that to the end of the output every time.
Thanks so much for any assistance.
// get japanchange
function getJapanchange(){
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('http://________________//global-
indices/');
$xpath = new DOMXPath($doc);
$query = "//div[#class='MT10']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key=>$val){
$ret_[$key]=trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
file_put_contents(globalVars::$_cache_dir . "japanchange",
$ret_[56]);
}
}
If you just want to add a % to the end of the output to the file your already using. You could simple do
file_put_contents(globalVars::$_cache_dir . "japanchange",
$ret_[56].'%');
Related
I need to get the image src based on the class of the image.
This is the code I wrote.
It works but it is extremely slow.
$url='https://' . $_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$html= file_get_contents($url);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//img[#class='imgbanner']");
if ($nodes->length > 0) {
$src = $nodes->item(0)->getAttribute('src');
}
else {
$src = null;
}
Any clues on how to improve speed?
Assuming you're trying to parse a somewhat convoluted HTML document, and especially considering your rather limited use case, you might be better off resorting to regular expressions and some string parsing (again, in this concrete circumstances, cf. this post's closing remarks).
For testing purposes, let's set up an HTML document with 10,000 image tags, each of them looking like this one:
<img class="imgbanner" src="a49851fb74.jpg">
To benchmark both approaches more easily (XPath vs. regular expression + string parsing), let's wrap them in two functions (the first one is pretty much the same as the sample code you've provided):
function xpath(string $html): array {
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//img[#class='imgbanner']");
$src = [];
if ($nodes) {
foreach ($nodes as $node) {
$src[] = $node->getAttribute('src');
}
}
libxml_clear_errors(); // Free up memory
return $src;
}
function regex(string $html): array {
preg_match_all("/<img[^>]+src=[\"']([^\"']+)[\"'][^>]*>/i", $html, $matches);
$matches = array_combine($matches[0], $matches[1]);
$filtered = [];
foreach ($matches as $key => $value) {
if (strpos($key, 'class="imgbanner"') || strpos($key, "class='imgbanner'")) {
$filtered[] = $value;
}
}
return $filtered;
}
Since the HTML document doesn't contain much else but the image tags, XPath is pretty fast (~0.06 seconds over the course of ten runs):
$start = microtime(true);
$html = file_get_contents('pics.html'); // 10,000 random image tags
$src = xpath($html);
$time_elapsed_secs = (microtime(true) - $start);
echo "Total execution time: {$time_elapsed_secs}\n"; // ~0.06 sec
Nevertheless, the second approach turned out to be about ten times faster (~0.005 seconds over the course of ten runs):
$start = microtime(true);
$html = file_get_contents('pics.html'); // 10,000 random image tags
$src = regex($html);
$time_elapsed_secs = (microtime(true) - $start);
echo "Total execution time: {$time_elapsed_secs}\n"; // ~0.005 sec
While the second approach is obviously faster for this very limited use case, bear in mind it's usually a bad idea to parse HTML using regular expressions:
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
If your parsing needs grow in complexity (i.e., anything above the case at hand), you should consider factoring out the parsing into a dedicated command line script and cache its results.
I want to append some text to divs which has same class.
$dom = new DOMdocument();
$dom->formatOutput = true;
#$dom->loadHTMLFile('first.html');
$xpath = new DOMXPath($dom)
$after = new DOMText('Newly appended text');
$elements = $xpath->query('//div[#class="mix"]');
foreach($elements as $element)
{
$element->appendChild($after);
//echo $dom->saveHTML();
}
$dom->saveHTMLFile('first.html');
But when I open first.html, The appended text is only appeded to last div of above class.
If I uncomment saveHTML() then it shows perfect result. Just problem after saving.
You cannot append the same DOM node to multiple points in the tree, which is what you are doing here. You need to create a separate (but identical) node each time:
foreach($elements as $element)
{
$after = new DOMText('Newly appended text'); // moved this inside the loop
$element->appendChild($after);
}
I want to check whether a <img> tag has alt="" text or not and also need to find what line number in DOM that img tag is. At the moment I have the following codes written but stuck with finding the line number.
for example:
$doc = new DOMDocument();
$doc->loadHTMLFile('http://www.google.com');
$htmlElement = $doc->getElementsByTagName('html');
$tags = $doc->getElementsByTagName('img');
echo $tags->item(0)->getLineNo();
foreach ($tags as $image) {
// Get sizes of elements via width and height attributes
$alt = $image->getAttribute('alt');
if($alt == ""){
$src = $image->getAttribute('src');
echo "No alt text ";
echo '<img src="http://google.com/'.$src.'" alt=""/>'. '<br>';
}
else{
$src = $image->getAttribute('src');
echo '<img src="http://google.com/'.$src.'" alt=""/>'. '<br>';
}
}
from the above code at the moment I am getting images and text saying that "no alt text" beside the image, but I want to get what line number that img tag appears.
for example here the line number is 57,
56. <div class="work_item">
57. <p class="pich"><img src="images/works/1.jpg" alt=""></p>
58. </div>
Use DOMNode::getLineNo(), e.g.$line = $image->getLineNo().
HTML has no real concept of line numbers, since they are just whitespace.
With that in mind, you might be able to count how many newlines there are in all the text nodes preceding the target node. You might be able to do this with DOMXPath:
$xpath = new DOMXPath($doc);
$node = /* your target node */;
$textnodes = $xpath->query("./preceding::*[contains(text(),'\n')]",$node);
$line = 1;
foreach($textnodes as $textnode) $line += substr_count($textnode->textContent,"\n");
// $line is now the line number of the node.
Please note that I have not tested this, nor have I ever used axes in xpath.
I think i have figured out what i was trying to achieve but not sure is that the right way. It is doing the job. Please leave comments or any other idea how can i improve it.
If you go to the following site and type any URL. It will produce a report with accessibility issues in a webpage. It is an accessibility checker tool.
http://valet.webthing.com/page/
All i am trying to do is achieve that kind of layout. The code below will produce the DOM of supplied URL and find any image tag that does not have alternative text.
<html>
<body>
<?php
$dom = new domDocument;
// load the html into the object
$dom->loadHTMLFile('$yourURLAddress');
// keep white space
$dom->preserveWhiteSpace = true;
// nicely format output
$dom->formatOutput = true;
$new = htmlspecialchars($dom->saveHTML(), ENT_QUOTES);
$lines = preg_split('/\r\n|\r|\n/', $new); //split the string on new lines
echo "<pre>";
//find 'alt=""' and print the line number and html tag
foreach ($lines as $lineNumber => $line) {
if (strpos($line, htmlspecialchars('alt=""')) !== false) {
echo "\r\n" . $lineNumber . ". " . $line;
}
}
echo "\n\n\nBelow is the whole DOM\n\n\n";
//print out the whole DOM including line numbers
foreach ($lines as $lineNumber => $line) {
echo "\r\n" . $lineNumber . ". " . $line;
}
echo "</pre>";
?>
</body>
</html>
I like to thank everyone who helped specially "chwagssd" and Mike Johnson.
I'm creating a tool that works with file strings and I need to get the line number where a node is found. It is, I have this:
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//text()") as $q) {
// $line = WHAT???
$strings[trim($q->nodeValue)] = $line;
}
and I need to know in which line begins the string I'm storing in $strings array. Is it possible to get it?
Each DOMNode object has a getLineNo() function that returns this. In your case it's a DOMText object that extends from DOMNode:
foreach ($xpath->query("//text()") as $q) {
$line = $q->getLineNo();
$strings[trim($q->nodeValue)] = $line;
}
You might need to upgrade to PHP 5.3 if you have not yet to make use of that function.
I want to add/display data from querying from the database and add it into an XML file.
Example, I have a table_persons which has a name and age. I create a mysql query to get its name and age. Then simply put the data(name and age of persons) into an XML file.
How would you do that? Or is it possible?
I suggest you use DomDocument and file_put_contents to create your XML file.
Something like this:
// Create XML document
$doc = new DomDocument('1.0', 'UTF-8');
// Create root node
$root = $doc->createElement('persons');
$root = $doc->appendChild($root);
while ($row = mysql_fetch_assoc($result)) {
// add node for each row
$node = $doc->createElement('person');
$node = $root->appendChild($node);
foreach ($row as $column => $value) {
$columnElement = $doc->createElement($column);
$columnElement = $node->appendChild($columnElement);
$columnValue = $doc->createTextNode($value);
$columnValue = $columnElement->appendChild($columnValue);
}
}
// Complete XML document
$doc->formatOutput = true;
$xmlContent = $doc->saveXML();
// Save to file
file_put_contents('persons.xml', $xmlContent);
<?php
[snip] //database code here
$f = fopen('myxml.xml', 'a+');
foreach($row = mysqli_fetch_assoc($resultFromQuery))
{
$str = "<person>
<name>{$row['name']}</name>
<age>{$row['age']}</age>
</person>\n";
fwrite($f, $str);
}
fclose($f);
?>
Assuming you use mysqli, this code works. If not, suit to fit. In the fopen function call, the a+ tells it to open it for reading at writing, placing the pointer at the end of the file.
Best of luck.