Displays a specific number of elements in foreach PHP - php

I have to process a huge XML file, I used DOMDocument to process But the datas returned is huge, so how can I choose specific amount of elements to display.
For example I want to display 5 elements.
My code:
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->load('IPCCPC-epoxif-201905.xml'); //IPCCPC-epoxif-201905
$xpath = new DOMXPath($doc);
if(empty($_POST['search'])){
$txtSearch = 'A01B1/00';
}
else{
$txtSearch = $_POST['search'];
}
$titles = $xpath->query("Doc/Fld[#name='IC']/Prg/Sen[contains(text(),\"$txtSearch\")]");
foreach ($titles as $title)
{
// I want to display 5 results here.
}

Add an index to the loop, and break out when it hits the limit.
$limit = 5;
foreach ($titles as $i => $title) {
if ($i >= $limit) {
break;
}
// rest of code
}

Related

How to use "PHP Parallel" with DOMDocument in my following PHP Code?

I would like to use PHP-8 Parallel (https://github.com/krakjoe/parallel. Or: https://php.net/parallel) in my function follow_links:
function get_details($url) {
libxml_use_internal_errors(true);
$parser = new DOMDocument();
$parser->loadHTMLFile($url);
// Create an array of all of the title tags.
$title = $parser->getElementsByTagName("title");
// There should only be one <title> on each page, so our array should have only 1 element.
$title = $title->item(0)->nodeValue;
// Give $description and $keywords no value initially. We do this to prevent errors.
$description = "";
$keywords = "";
// Create an array of all of the pages <meta> tags. There will probably be lots of these.
$metas = $parser->getElementsByTagName("meta");
// Loop through all of the <meta> tags we find.
for ($i = 0; $i < $metas->length; $i++) {
$meta = $metas->item($i);
// Get the description
if (strtolower($meta->getAttribute("name")) == "description")
$description = $meta->getAttribute("content");
}
}
function follow_links($url) {
global $already_crawled;
global $crawling;
libxml_use_internal_errors(true);
$parser = new DOMDocument();
$linklist = $parser->getElementsByTagName("a");
foreach ($linklist as $link) {
$l = $link->getAttribute("href");
if (!in_array($l, $already_crawled)) {
$already_crawled[] = $l;
$crawling[] = $l;
// Output the page title, descriptions, keywords and URL. This output is
// piped off to an external file using the command line.
echo get_details($l)."\n";
}
}
// Remove an item from the array after we have crawled it.
// This prevents infinitely crawling the same page.
array_shift($crawling);
foreach ($crawling as $site) {
follow_links($site);
}
}
$starts = ["https://website1.dn", "https://website2.dn", "https://website3.dn", "https://website4.dn"];
follow_links($starts);
I therefore seek to process all the URLs stored in the variable $starts at the same time in a parallel way in follow_links function knowing that it is the function get_details (using DOMDocument) which retrieves the data of each URL ???
Thank you for helping me.

Extract content from multiple pages of same website

I have this script to extract data from multiple pages of the same website. There are some 120 pages.
Here is the code I'm using to get for a single page.
$html = file_get_contents('https://www.example.com/product?page=1');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('div');
foreach ($links as $link){
file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}
How can I do it for multiple pages? The links for that specific pages are incremental like the next page will be https://www.example.com/product?page=2 and so on. How can I do it without creating different files for each link?
What about this :
function extractContent($page)
{
$html = file_get_contents('https://www.example.com/product?page='.$page);
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('div');
foreach ($links as $link) {
// skip empty attributes
if (empty($link->getAttribute('data-product-name'))) {
continue;
}
file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}
}
for ($i=1; $i<=120; $i++) {
extractContent($i);
}

get DHL tracking statuses

Please help me to figure out what PHP API or PHP script should I use to get from DHL the shipment statuses having only available DHL Tracking Codes provided by the logistic company which fulfill shipping of our orders from e-commerce website. My Task is to create a PHP CronJob code which would check and register the Status of DHL Tracking Shipping for using them in back-end reports.
I would much appreciate any suggestion which may help me to find the right direction.
I am still looking to find the right way to achieve my task. So, far I do not see other way than Parsing DHL Tracking webpage considering having only Tracking Number available which it seems to be insufficient for using them for some API. DHL API requires Login credentials, secret keys and so on... However, my current parsing code might be useful for someone who looks for similar solution. Just include your Tracking Codes and run the code in your localhost or even on http://phpfiddle.org/:
$tracking_array=Array('000000000000', '1111111111111'); // Tracking Codes
function create_track_url($track)
{
$separator = '%2C+';
$count = count($track);
$url = '';
if ($count < 2 && $count > 0){
$url = $track[0];
}else if ($count >1){
foreach ($track as $k => $v)
{
$sep = ($count-2);
if ($k > $sep){
$separator ='';
}
$url .= $v.$separator;
}
}
return $url;
}
//load the html
$dom = new DOMDocument();
$html = $dom->loadHTMLFile("https://nolp.dhl.de/nextt-online-public/en/search?piececode=".create_track_url($tracking_array));
//discard white space
$dom->preserveWhiteSpace = false;
//the table by its tag name
$xpath = new DOMXpath($dom);
$expression = './/h2[contains(#class, "panel-title")]';
$track_codes =array();
foreach ($xpath->evaluate($expression) as $div) {
$track_codes[]= preg_replace( '/[^0-9]/', '', $div->nodeValue );
}
$tables = $dom->getElementsByTagName('table');
$table = array();
foreach($track_codes as $key => $val)
{
//get all rows from the table
$rows = $tables->item($key)->getElementsByTagName('tr');
// get each column by tag name
$cols = $rows->item($key)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
//print $node->nodeValue."\n";
$row_headers[] = $node->nodeValue;
}
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
// get each column by tag name
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
//print $node->nodeValue."\n";
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->nodeValue;
$i++;
}
$table[$val][] = $row;
}
}
print '<pre>';
print_r($table);

PHP Simple Dom HTML - Trouble parsing list of a hrefs

I'm trying to scrape all the a hrefs with an id starting with 'system' from this webpage: http://www.myfxbook.com/systems
Here is my code which I just can't seem to get to work. I've been fiddling around for hours now, looking at countless answered questions here.
include_once( 'simple_html_dom.php' );
$url2process = 'http://www.myfxbook.com/systems';
$html = file_get_html( $url2process );
$cnt = 0;
$parent_mark = $html->find('a[id^=system]');
$cntr = 0;
foreach( $parent_mark as $element) {
if( $cntr > 3 ) continue;
$cntr++;
$single_html = file_get_html( $element->href );
UPDATE1: Ok this is kind of working now, but it only seems to be using the very last a href on the page with the correct id. I need to process ALL these a hrefs with this ID, what am I missing here?
You could do it using the domdocument like this..
$html = file_get_contents('http://www.myfxbook.com/systems');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
libxml_use_internal_errors(false);
$links = $doc->getElementsByTagName('a');
$cnt = 0;
$cntr = 0;
foreach ($links as $link) {
if(preg_match('~^system~', $link->getAttribute('id'))) {
if( $cntr > 3 ) {
continue;
}
$cntr++;
$single_html = file_get_contents($link->getAttribute('href'));
if (empty($single_html)) {
echo 'EMPTY';
}
}
}

How can I retrieve infos from PHP DOMElement?

I'm working on a function that gets the whole content of the style.css file, and returns only the CSS rules that needed by the currently viewed page (it will be cached too, so the function only runs when the page was changed).
My problem is with parsing the DOM (I'm never doing it before with PHP DOM). I have the following function, but $element->tagname returns NULL. I also want to check the element's "class" attribute, but I'm stuck here.
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$arr[sizeof($arr)] = $element->tagname;
}
return array_unique($arr);
}
What can I do? How can I get all of the DOM elements tag name, and class from HTML?
Because tagname should be an undefined index because its supposed to be tagName (camel cased).
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$e = array();
$e['tagName'] = $element->tagName; // tagName not tagname
// get all elements attributes
foreach($element->attributes as $attr) {
$attrs = array();
$attrs['name'] = $attr->nodeName;
$attrs['value'] = $attr->nodeValue;
$e['attributes'][] = $attrs;
}
$arr[] = $e;
}
return $arr;
}
Simple Output

Categories