variable echoing out the last of 3 numbers - php

Hi so I've currently got a output echoing 176 8 58 from a web scraping script. I want to pack this script up into a variable and echo it out in other places on the website.
I've packed this up by doing this
ob_start();
echo $node->nodeValue. "\n";
$thenumbers = ob_get_contents();
ob_end_clean();
but when I echo it out like this
Now on the website the numbers are in spans and are split up by "/" do I need to do anything fancy? I'm kind of new to PHP so let me know if its something stupid!
<?php echo $thenumbers ?>
my output is then 176 8 58
Would really appreciate a bit of help
(web scraping script i'm using had to hide the website i'm scraping as its in development)
<?php
$teamlink = rwmb_meta( 'WEBSITE_HIDDEN' );
$arr = array( $teamlink );
foreach ($arr as &$value) {
$file = $DOCUMENT_ROOT. $value;
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(#class, 'table')]/tr[3]/td[3]/span");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
ob_start();
echo $node->nodeValue. "\n";
$win_loss = ob_get_contents();
ob_end_clean();
}
}
}
}
?>
p.s I know the script works as its currently outputting standard text fine.

My apoligies if I have completely misunderstood your question.
If you want to add a "/" between the numbers, where the spaces are you could:
echo str_replace(' ','/',$thenumbers);
If you just want to show the last 3 digits (cleaning out the spaces from the string) you could;
echo substr(str_replace(' ','',$thenumbers),-3);

Related

Scraping specific text from a webpage using xpath

I've searched and tried multiple ways to get this but I'm not sure why it won't find most of the information on the webpage.
Page to scrape:
https://m.safeguardproperties.com/
Info needed:
Version number for PhotoDirect for Apple (currently 4.4.0)
Xpath to text needed (I think) : /html/body/div[1]/div[2]/div[1]/div[4]/div[3]/a
Attempts:
<?php
$file = "https://m.safeguardproperties.com/";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("/html/body/div[1]/div[2]/div[1]/div[4]/div[3]/a");
echo "<PRE>";
if (!is_null($elements)) {
foreach ($elements as $element) {
var_dump ($element);
echo "<br/>[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
echo "</PRE>";
?>
Second Attempt:
<?PHP
$file = "https://m.safeguardproperties.com/";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
echo '<pre>';
// trying to find all links in document to see if I can see the correct one
$links = [];
$arr = $doc->getElementsByTagName("a");
foreach($arr as $item) {
$href = $item->getAttribute("href");
$text = trim(preg_replace("/[\r\n]+/", " ", $item->nodeValue));
$links[] = [
'href' => $href,
'text' => $text
];
}
var_dump($links);
echo '</pre>';
?>
For that particular website, the versions are being loaded from JSON data client side, you won't find them in the base document.
http://m.safeguardproperties.com/js/photodirect.json
This was located by comparing the original document source to the finished DOM and inspecting the network activity in the developer console.
$url = 'https://m.safeguardproperties.com/js/photodirect.json';
$json = file_get_contents( $url );
$object = json_decode( $json );
echo $object->ios->version; //4.4.0
Please respect other websites and cache your GET request.

php echo "%" displaying more than once

Testing with data scraping. The output I'm scraping, is a percent. So I basically slapped on a
echo "%<br>";
At the end of the actual number output which is
echo $ret_[66];
However there's an issue where the percent is actually appearing before the number as well, which is not desirable. This is the output:
%
-0.02%
Whereas what I'm trying to get is just -0.02%
Clearly I'm doing something wrong with the PHP. I'd really appreciate any feedback/solutions. Thank you!
Full code:
<?php
error_reporting(E_ALL^E_NOTICE^E_WARNING);
include_once "global.php";
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('http://www.moneycontrol.com/markets/global-indices/');
$xpath = new DOMXPath($doc);
$query = "//div[#class='MT10']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key => $val){
$ret_[$key] = trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
echo $ret_[66];
echo "%<br>";
}
<?php
echo "%<br>";
?>
On a seperate following PHP code. Does the same thing.

How to get an exact value from a website using php DOM and save it in a database?

I want to get the span id "CPH1_lblCurrent" from the url and save it in the database.
here is the code that i tried by seeing some examples.
<?php
$file = $DOCUMENT_ROOT. "http://www.mypetrolprice.com/2/Petrol-price-in-Delhi";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query('//span[#id="CPH1_lblCurrent"]');
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
This shows me the following.
Current Delhi Petrol Price = 67.12 Rs/Ltr
but i want only the value 67.12.
Can somebody help me.
try to use this simple regex for getting nubmer
.*= ([\d.]+) .*
preg_match

Website Scraping Using Regex trying to extract integers

I'm having trouble to extract the integers between the brackets from this website.
Part of markup from the website:
<span class="b-label b-link-number" data-num="(322206)">Music & Video</span>
<span class="b-label b-link-number" data-num="(954218)">Toys, Hobbies & Games</span>
<span class="b-label b-link-number" data-num="(502981)">Kids, Baby & Maternity</span>
How do I extract the integers between the brackets?
Desired output:
322206
954218
502981
Should I use Regex since they got the same class name (but not Regex to get between brackets since there are other unwanted elements inside bracket as well from the source code).
Normally, this would be the way I use to extract information:
<?php
//header('Content-Type: text/html; charset=utf-8');
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://global.rakuten.com/en/search/?tl=&k=");
$finder = new DomXPath($grep);
$class = "b-list-item";
$nodes = $finder->query("//*[contains(#class, '$class')]");
foreach ($nodes as $node) {
$span = $node->childNodes;
$search = array(0,1,2,3,4,5,6,7,8,9,'(',')');
$categories = str_replace($search, '', $span->item(0)->nodeValue);
echo '<br>' . '<font color="green">' . $categories . ' ' . '</font>' ;
}
?>
but since the data I want is inside the tag, how do I extract them?
Adding on your current code, its simply straight forward, just change that $class to that class you desire and use ->getAttribute() to get those data-num's:
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://global.rakuten.com/en/search/?tl=&k=");
$finder = new DomXPath($grep);
$class = "b-link-number"; // change the span class
$nodes = $finder->query("//*[contains(#class, '$class')]"); // target those
$numbers = array();
foreach ($nodes as $node) { // for every found elemenet
$link_num = $node->getAttribute('data-num'); // get the attribute `data-num`
$link_num = str_replace(['(', ')'], '', $link_num); // simply remove those parenthesis
$numbers[] = $link_num; // push it inside the container
}
echo '<pre>';
print_r($numbers);
<span[^>)()]*\((\d+)\)[^>]*>
Try this.Grab the capture.See demo.
http://regex101.com/r/iM2wF9/10

Website Scraping from DoMDocument using php

I have a php code that could extract the categories and display them. However,
I still can't extract the numbers that goes along with it too(without the bracket).
Need to be separated between the categories and number(not extract together).
Maybe do another for loop using Regex, etc...
This is the code:
<?php
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");
$finder = new DomXPath($grep);
$class = "CatLevel1";
$nodes = $finder->query("//*[contains(#class, '$class')]");
foreach ($nodes as $node) {
$span = $node->childNodes;
echo $span->item(0)->nodeValue."<br>";
}
?>
Is there any way I could do that? Thanks!
This is my desired output:
Arts, Antiques & Collectibles : 9768<br>
B2B & Industrial Products : 2342<br>
Baby : 3453<br>
etc...
Just add the other sibling as well. Example:
foreach ($nodes as $node) {
$span = $node->childNodes;
echo $span->item(0)->nodeValue . ': ' . str_replace(array('(', ')'), '', $span->item(1)->nodeValue);
echo '<br/>';
}
EDIT: Just use str_replace for that simple purpose of removing that parenthesis.
Sidenote: Always put the UTF-8 Encoding on your PHP file.
header('Content-Type: text/html; charset=utf-8');

Categories