get DHL tracking statuses - php

Please help me to figure out what PHP API or PHP script should I use to get from DHL the shipment statuses having only available DHL Tracking Codes provided by the logistic company which fulfill shipping of our orders from e-commerce website. My Task is to create a PHP CronJob code which would check and register the Status of DHL Tracking Shipping for using them in back-end reports.
I would much appreciate any suggestion which may help me to find the right direction.

I am still looking to find the right way to achieve my task. So, far I do not see other way than Parsing DHL Tracking webpage considering having only Tracking Number available which it seems to be insufficient for using them for some API. DHL API requires Login credentials, secret keys and so on... However, my current parsing code might be useful for someone who looks for similar solution. Just include your Tracking Codes and run the code in your localhost or even on http://phpfiddle.org/:
$tracking_array=Array('000000000000', '1111111111111'); // Tracking Codes
function create_track_url($track)
{
$separator = '%2C+';
$count = count($track);
$url = '';
if ($count < 2 && $count > 0){
$url = $track[0];
}else if ($count >1){
foreach ($track as $k => $v)
{
$sep = ($count-2);
if ($k > $sep){
$separator ='';
}
$url .= $v.$separator;
}
}
return $url;
}
//load the html
$dom = new DOMDocument();
$html = $dom->loadHTMLFile("https://nolp.dhl.de/nextt-online-public/en/search?piececode=".create_track_url($tracking_array));
//discard white space
$dom->preserveWhiteSpace = false;
//the table by its tag name
$xpath = new DOMXpath($dom);
$expression = './/h2[contains(#class, "panel-title")]';
$track_codes =array();
foreach ($xpath->evaluate($expression) as $div) {
$track_codes[]= preg_replace( '/[^0-9]/', '', $div->nodeValue );
}
$tables = $dom->getElementsByTagName('table');
$table = array();
foreach($track_codes as $key => $val)
{
//get all rows from the table
$rows = $tables->item($key)->getElementsByTagName('tr');
// get each column by tag name
$cols = $rows->item($key)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
//print $node->nodeValue."\n";
$row_headers[] = $node->nodeValue;
}
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
// get each column by tag name
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
//print $node->nodeValue."\n";
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->nodeValue;
$i++;
}
$table[$val][] = $row;
}
}
print '<pre>';
print_r($table);

Related

How to scrape data from HTML Table in PHP

Hey I've been trying to scrape data from an html table and I'm not having much luck.
Website: https://www.dnr.state.mn.us/hunting/seasons.html
What I'm trying to do: I want to grab the contents of the table and encode it into json like
['event_title' 'Waterfowl'] and ['event_date' '09/25/21']
but I don't know how to do this, I've tried a couple different things but in the end I can't get it to work.
Code Example (Closest I got):
<?php
$dom = new DOMDocument;
$page = file_get_contents('https://www.dnr.state.mn.us/hunting/seasons.html');
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//tbody/tr') as $tr) {
$tmp = []; // reset the temporary array so previous entries are removed
foreach ($xpath->query("td[#class]", $tr) as $td) {
$key = preg_match('~[a-z]+$~', $td->getAttribute('class'), $out) ? $out[0] : 'no_class';
if ($key === "event-title") {
$tmp['event_title'] = $xpath->query("a", $td);
}
$tmp[$key] = trim($td->textContent);
}
//$tmp['event_date'] = date("M. dS 'y", strtotime(preg_replace('~\.|\d+[ap]m *~', '', $tmp['date'])));
//$result[] = $tmp;
$marray[] = array_unique($tmp);
print_r($marray);
}
//$array2 = var_export($result);
//print_r($array2[1]);
//var_export($result);
//echo "\n----\n";
//echo json_encode($result);
?>

Displays a specific number of elements in foreach PHP

I have to process a huge XML file, I used DOMDocument to process But the datas returned is huge, so how can I choose specific amount of elements to display.
For example I want to display 5 elements.
My code:
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->load('IPCCPC-epoxif-201905.xml'); //IPCCPC-epoxif-201905
$xpath = new DOMXPath($doc);
if(empty($_POST['search'])){
$txtSearch = 'A01B1/00';
}
else{
$txtSearch = $_POST['search'];
}
$titles = $xpath->query("Doc/Fld[#name='IC']/Prg/Sen[contains(text(),\"$txtSearch\")]");
foreach ($titles as $title)
{
// I want to display 5 results here.
}
Add an index to the loop, and break out when it hits the limit.
$limit = 5;
foreach ($titles as $i => $title) {
if ($i >= $limit) {
break;
}
// rest of code
}

Parse html table using file_get_contents to php array

I am trying to parse the table shown here into a multi-dimensional php array. I am using the following code but for some reason its returning an empty array. After searching around on the web, I found this site which is where I got the parseTable() function from. From reading the comments on that website, I see that the function works perfectly. So I'm assuming there is something wrong with the way I'm getting the HTML code from file_get_contents(). Any thoughts on what I'm doing wrong?
<?php
$data = file_get_contents('http://flow935.com/playlist/flowhis.HTM');
function parseTable($html)
{
// Find the table
preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);
// Get title for each row
preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
$row_headers = $matches[1];
// Iterate each row
preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);
$table = array();
foreach($matches[1] as $row_html)
{
preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
$row = array();
for($i=0; $i<count($td_matches[1]); $i++)
{
$td = strip_tags(html_entity_decode($td_matches[1][$i]));
$row[$row_headers[$i]] = $td;
}
if(count($row) > 0)
$table[] = $row;
}
return $table;
}
$output = parseTable($data);
print_r($output);
?>
I want my output array to look something like this:
1
--> 11:33AM
--> DEV
--> IN THE DARK
2
--> 11:29AM
--> LIL' WAYNE
--> SHE WILL
3
--> 11:26AM
--> KARDINAL OFFISHALL
--> NUMBA 1 (TIDE IS HIGH)
Don't cripple yourself parsing HTML with regexps! Instead, let an HTML parser library worry about the structure of the markup for you.
I suggest you to check out Simple HTML DOM (http://simplehtmldom.sourceforge.net/). It is a library specifically written to aid in solving this kind of web scraping problems in PHP. By using such a library, you can write your scraping in much less lines of code without worrying about creating working regexps.
In principle, with Simple HTML DOM you just write something like:
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
// Parse table row here
}
This can be then extended to capture your data in some format, for instance to create an array of artists and corresponding titles as:
<?php
require('simple_html_dom.php');
$table = array();
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
$time = $row->find('td',0)->plaintext;
$artist = $row->find('td',1)->plaintext;
$title = $row->find('td',2)->plaintext;
$table[$artist][$title] = true;
}
echo '<pre>';
print_r($table);
echo '</pre>';
?>
We can see that this code can be (trivially) changed to reformat the data in any other way as well.
I tried simple_html_dom but on larger files and on repeat calls to the function I am getting zend_mm_heap_corrupted on php 5.3 (GAH). I have also tried preg_match_all (but this has been failing on a larger file (5000) lines of html, which was only about 400 rows of my HTML table.
I am using this and its working fast and not spitting errors.
$dom = new DOMDocument();
//load the html
$html = $dom->loadHTMLFile("htmltable.html");
//discard white space
$dom->preserveWhiteSpace = false;
//the table by its tag name
$tables = $dom->getElementsByTagName('table');
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
// get each column by tag name
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
//print $node->nodeValue."\n";
$row_headers[] = $node->nodeValue;
}
$table = array();
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
// get each column by tag name
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
//print $node->nodeValue."\n";
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->nodeValue;
$i++;
}
$table[] = $row;
}
var_dump($table);
This code worked well for me.
Example of original code is here.
http://techgossipz.blogspot.co.nz/2010/02/how-to-parse-html-using-dom-with-php.html

PHP Simple DOM Parser to Scrape From Multiple URLs

Is it possible to use a foreach loop to scrape multiple URL's from an array? I've been trying but for some reason it will only pull from the first URL in the array and the show the results.
include_once('../../simple_html_dom.php');
$link = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($link as $links) {
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value; }
// get title
$ret['ASIN'] = end($values);
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] =$html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$ret = scraping_IMDB($links);
foreach($ret as $k=>$v)
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
Here is the code since the comment part didn't work. :) It's very dirty because I just edited one of the examples to play with it to see if I could get it to do what I wanted.
include_once('../../simple_html_dom.php');
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
// What is this spaghetti code good for?
/*
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value;
}
// get title
$ret['ASIN'] = end($values);
*/
foreach($html->find('input') as $element) {
if($element->id == 'ASIN') {
$ret['ASIN'] = $element->value;
}
}
// Our you could use the following instead of the whole foreach loop above
//
// $ret['ASIN'] = $html->find('input[id="ASIN"]', 0)->value;
//
// if the 0 means, return first found or something similar,
// I just had a look at Amazons source code, and it contains
// 2 HTML tags with id='ASIN'. If they were following html-regulations
// then there should only be ONE element with a specific id.
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
This should do the trick
I have renamed the array to 'links' instead of 'link'. It's an array of links, containing link(s), therefore, foreach($link as $links) seemed wrong, and I changed it to foreach($links as $link)
I really need to ask this question as it will answer way more questions after the world reads this thread. What if ... you used articles like the simple html dom site.
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
return $ret;
}
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
what if its $articles?
$articles[] = $item;
}
//print_r($articles);
$links = array (
'http://link1.com',
'http://link2.com',
'http://link3.com'
);
what would this area look like?
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
Ive seen this multiple links all over stackoverflow for past 2 years, and I still cannot figure it out. Would be great to get the basic handle on it to how the simple html dom examples are.
thx.
First time postin im sure I broke a bunch of rules and didnt do the code section right. I just had to ask this question badly.

How to parse the attribute value of a <a> tag in PHP

I am trying to parse a html page for a database for universities and colleges in US. The code I wrote does fetches the names of the universities but I am unable to to fetch their respective url address.
public function fetch_universities()
{
$url = "http://www.utexas.edu/world/univ/alpha/";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(1)->getElementsByTagName('tr');
$td = $tr->item(7)->getElementsByTagName('td');
$rows = $td->item(0)->getElementsByTagName('li');
$count = 0;
foreach ($rows as $row)
{
$count++;
$cols = $row->getElementsByTagName('a');
echo "$count:".$cols->item(0)->nodeValue. "\n";
}
}
This is my code that I have currently.
Please tell me how to fetch the attribute values as well.
Thank you
If you have a reference to an element, you just have to use getAttribute(), so probably:
echo "$count:".$cols->item(0)->getAttribute('href') . "\n";

Categories