Is it possible to query the first 5 images with DOMDocument? - php

Is it possible to query the first 5 images with DOMDocument?
$dom = new DOMDocument;
$list = $dom->query('img');

With XPath You can fetch all images like this:
$xpath = new DOMXPath($dom);
$list = $xpath->query('//img');
Then you limit the results by only iterating over the first five.
for ($i = 0, $n = min(5, $list->length); $i < $n; ++$i) {
$node = $list->item(0);
}
XPath is very versatile thanks to its expression language. However, in this particular case, you may not need all that power and a simple $list = $dom->getElementsByTagName('img') would yield the same result set.

You can use getElementsByTagName to build and array of images:
$dom = new DOMDocument();
$dom->loadHTML($string);
$images = $dom->getElementsByTagName('img');
$result = array();
for ($i=0; $i<5; $i++){
$node = $images->item($i);
if (is_object( $node)){
$result[] = $node->ownerDocument->saveXML($node);
}
}

Related

php XPath reverse array

Hi there I am trying to reverse php object but I am not successful
sample html is here:
sample_html
$html = file_get_contents($file);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
//get the element you want to append to
$classname = 'pam _3-95 _2pi0 _2lej uiBoxWhite noborder';
$divs = $xpath->query("//*[contains(#class, '$classname')]");
var_dump($divs[0]);
$count = count($divs);
$divs2 = array();
for ($i = $count-1; $i >= 0; $i--) {
$divs = $divs[$i];
}
var_dump($divs[0]);
its giving Cannot use object of type DOMElement as array in
when I have this div reversed I would like to append it back to original html with something like
$doc->saveHTML();
TLDR: I can get the div but cannot reverse it
Thanks for Anwsering and Best Regards
Whilst I don't know if this meets your html structure requirements, but if all your divs are immediate siblings (like a list) and they all share the same parent, you can use the following.
$divs = $xpath->query("//*[contains(#class, '$classname')]");
for( $x = 0; $x < $divs->length ; $x++ )
{
if( $divs[ $x ] !== $divs[ $x ]->parentNode->firstChild )
{
$divs[ $x ]->parentNode->insertBefore( $divs[ $x ], $divs[ $x ]->parentNode->firstChild );
}
}
Alternative solution leveraging iterator_to_array and cloneNode
$divs = iterator_to_array( $xpath->query("//*[contains(#class, '$classname')]") );
$keys = array_reverse( array_keys( $divs ) );
foreach( $divs as $x => $div )
{
$div->parentNode->replaceChild( $divs[ $keys[ $x ] ]->cloneNode(true), $div );
}

How to make models from external table data PHP and Laravel?

I am currently trying to pull some data from an external website in order to create models for my website. I am able to get the data from the table that I want, but have not been able to figure out exactly how to manipulate the data to structure my models. For each row in the table, there is certain data that I want to extract. As it currently stands, I am creating a player model with all of the player names, but now need to figure out how to attach the data that I want from each row to each player. Here is my code so far:
$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$stats = $table->getElementsByTagName("td");
for ($i = 0; $i < $stats->length; $i++) {
// get player name
$attr = $stats->item($i)->getAttribute('data-stat');
if ($attr != 'player') {
continue;
}
$names[] = $stats->item($i)->textContent;
}
foreach($names as $name) {
$player = new Player(['name' => $name]);
echo $player;
echo '<br>';
}
So how would I get certain table data, like age, H, HR, and other things, and appropriately attach this data to the correct player model?
EDIT/UPDATE:
$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");
for($i = 0; $i < $rows->length; $i++) {
$stats = $table->getElementsByTagName("td");
for($i = 0; $i < $stats->length; $i++) {
$name = $stats->item($i)->getAttribute('player');
$age = $stats->item($i)->getAttribute('age');
$names[] = $stats->item($i)->textContent;
$ages[] = $stats->item($i)->textContent;
dd($ages);
}
}

How to change a php code to merge any number of XML files

Here the working code (i took here) to merge two XML files into one. It is working properly, but only with 2 parts of xml.
<?php
$numparts=2;
$filename1='TEST_1.xml';
$filename2='TEST_2.xml';
$doc1 = new DOMDocument();
$doc1->load($filename1);
$doc2 = new DOMDocument();
$doc2->load($filename2);
// get 'res' element of document 1
$res1 = $doc1->getElementsByTagName('items')->item(0); //edited res - items
// iterate over 'item' elements of document 2
$items2 = $doc2->getElementsByTagName('item');
for ($i = 0; $i < $items2->length; $i ++) {
$item2 = $items2->item($i);
// import/copy item from document 2 to document 1
$item1 = $doc1->importNode($item2, true);
// append imported item to document 1 'res' element
$res1->appendChild($item1);
}
$doc1->save('merged.xml'); //edited -added saving into xml file
?>
Please help to change the code to work with any number of pieces (stored in the variable $numparts).
First, using loops, create arrays of the filenames, and their corresponding documents that you wish to merge.
$numparts = 3;
// Create an array of $numparts filenames of the xml files you wish to merge
$filenames = array();
for ($i = 1; $i <= $numparts; $i++) {
$filenames[$i] = 'TEST_' . $i . '.xml';
}
// Create an array of DOM Document objects, one for each xml file
$docs = array();
for ($i = 1; $i <= count($filenames); $i++) {
$docs[$i] = new DOMDocument();
$docs[$i]->load($filenames[$i]);
}
Then create a loop that iterates over all but the first document and nest the loop that iterates of the items in a document (the one in your code sample) within this.
// get 'res' element of document 1
$doc1 = $docs[1];
$res1 = $doc1->getElementsByTagName('items')->item(0); //edited res - items
// iterate over all the rest of the documents
for ($i = 2; $i <= count($docs); $i++) {
$doci = $docs[$i];
// iterate over 'item' elements of document i
$itemsi = $doci->getElementsByTagName('item');
for ($j = 0; $j < $itemsi->length; $j++) {
$itemi = $itemsi->item($j);
// import/copy item from document i to document 1
$item1 = $doc1->importNode($itemi, true);
// append imported item to document 1 'res' element
$res1->appendChild($item1);
}
}
$doc1->save('merged.xml'); //edited -added saving into xml file

PHP DOMDocument generates namespace declarations on wrong elements

Here's the PHP code Im using from http://pastebin.com/7FBysx2X
$doc = new DOMDocument('1.0', 'UTF-8');
$xns = 'http://www.w3.org/2000/xmlns/';
$mns = 'http://example.com/aBc/2/';
$ons = 'http://example.com/test/2005/something';
$ns = 'http://example.com/main/';
$firstChild = $doc->createElement('firstChild');
$firstChild->setAttributeNS($xns, 'xmlns:cns1', $mns);
$firstChild->setAttributeNS($xns, 'xmlns:i', $ons);
$elements = $doc->createElementNS($mns, 'cns1:elements');
for($i = 0; $i < 3; $i++) {
$e = $doc->createElementNS($mns, 'cns1:element');
for($k = 0; $k < 2; $k++) {
$r = rand(100, 999);
$value = round(($r*rand(1,9))/rand(1,9), 2);
$ce = $doc->createElementNS($mns, "cns1:elementValue$r", $value);
$e->appendChild($ce);
}
$elements->appendChild($e);
}
$firstChild->appendChild($elements);
$otherTag = $doc->createElementNS($mns, 'cns1:otherTag', 'some_value');
$emptyTag = $doc->createElementNS($mns, 'cns1:emptyTag');
$emptyTag->setAttributeNS($ons, 'i:nil', 'true');
$firstChild->appendChild($otherTag);
$firstChild->appendChild($emptyTag);
$main = $doc->createElementNS($ns, 'main');
$main->appendChild($firstChild);
$doc->appendChild($main);
header('Content-Type: text/xml');
echo $doc->saveXML();
The above code generates XML like this:
<?xml version="1.0" encoding="UTF-8"?>
<main xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something" xmlns="http://example.com/main/">
<firstChild xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something">
<cns1:elements>
<cns1:element>
<cns1:elementValue303>101</cns1:elementValue303>
<cns1:elementValue608>304</cns1:elementValue608>
</cns1:element>
<cns1:element>
<cns1:elementValue735>147</cns1:elementValue735>
<cns1:elementValue901>4505</cns1:elementValue901>
</cns1:element>
</cns1:elements>
<cns1:otherTag>some_value</cns1:otherTag>
<cns1:emptyTag i:nil="true"/>
</firstChild>
</main>
Document is expected to look like this:
<?xml version="1.0" encoding="UTF-8"?>
<main xmlns="http://example.com/main/">
<firstChild xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something">
<cns1:elements>
<cns1:element>
<cns1:elementValue303>101</cns1:elementValue303>
<cns1:elementValue608>304</cns1:elementValue608>
</cns1:element>
<cns1:element>
<cns1:elementValue735>147</cns1:elementValue735>
<cns1:elementValue901>4505</cns1:elementValue901>
</cns1:element>
</cns1:elements>
<cns1:otherTag>some_value</cns1:otherTag>
<cns1:emptyTag i:nil="true"/>
</firstChild>
</main>
The problem is at <main> tag. Why it has cns1 and i namespace declarations? They should be only at firstChild element. What i need to change to get needed structure?
This is caused by adding child nodes to node which is not already added to document.
Changing code to this:
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->formatOutput = true;
$xns = 'http://www.w3.org/2000/xmlns/';
$mns = 'http://example.com/aBc/2/';
$ons = 'http://example.com/test/2005/something';
$ns = 'http://example.com/main/';
$main = $doc->createElementNS($ns, 'main');
$doc->appendChild($main);
$firstChild = $doc->createElement('firstChild');
$firstChild->setAttributeNS($xns, 'xmlns:cns1', $mns);
$firstChild->setAttributeNS($xns, 'xmlns:i', $ons);
$doc->getElementsByTagName('main')->item(0)->appendChild($firstChild);
$elements = $doc->createElementNS($mns, 'cns1:elements');
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($elements);
for($i = 0; $i < 3; $i++) {
$e = $doc->createElementNS($mns, 'cns1:element');
$doc->getElementsByTagName('elements')->item(0)->appendChild($e);
for($k = 0; $k < 2; $k++) {
$r = rand(100, 999);
$value = round(($r*rand(1,9))/rand(1,9), 2);
$ce = $doc->createElementNS($mns, "cns1:elementValue$r", $value);
$doc->getElementsByTagName('element')->item($i)->appendChild($ce);
}
}
$otherTag = $doc->createElementNS($mns, 'cns1:otherTag', 'some_value');
$emptyTag = $doc->createElementNS($mns, 'cns1:emptyTag');
$emptyTag->setAttributeNS($ons, 'i:nil', 'true');
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($otherTag);
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($emptyTag);
echo $doc->saveXML();
Produces XML which looks exactly like your expected one. Maybe there is more 'pretty' or 'proper' way to do this, but for sure this one is working.

how to read only part of an xml file with php xmlreader

I have an RSS xml file that is pretty large, with more than 700 nodes.
I am using XMLReader Iterator library to parse it and display the results as 10 per page.
This is my sample code for parsing xml:
<?php
require('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string)$xml->title,
'link' => (string)$xml->link
);
}
// Logic for displaying the array values, based on the current page.
// page = 1 means $items[0] to $items[9]
for($i = 0; $i <= 9; $i++)
{
echo ''.$items[$i]['title'].'<br>';
}
?>
But the problem is that, for every page, i am parsing the entire xml file and then just displaying the corresponding page results, like: if the page is 1, displaying the 1 to 10 nodes, and if the page is 5, displaying 41 to 50 nodes.
It is causing delay in displaying data. Is it possible to read just the nodes corresponding to the requested page? So for the first page, i can read nodes from 1 to 10 positions, instead of parsing all the xml file and then display first 10 nodes. In other words, can i apply a limit while parsing an xml file?
I came across this answer of Gordon that addresses a similar question, but it is using SimpleXML, which is not recommended for parsing large xml files.
use array_splice to extract the portion of array
require ('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];
$pages = 0;
$max = 10;
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string) $xml->title,
'link' => (string) $xml->link
);
}
// Take the length of the array
$len = count($items);
// Get the number of pages
$pages = ceil($len / $max);
// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);
// return the portion of results
$arrayItem = array_slice($items, $start, $max);
for ($i = 0; $i <= 9; $i ++) {
echo '' . $arrayItem[$i]['title'] . '<br>';
}
// pagining stuff
for ($i = 1; $i <= $pages; $i ++) {
if ($i === (int) $page) {
// current page
$str[] = sprintf('<span style="color:red">%d</span>', $i);
} else {
$str[] = sprintf('%d', $i, $i);
}
}
echo implode('', $str);
Use cache in this case, since you cannot parse partially an XML.
Check this
<?php
if($_GET['page']!=""){
$startPagenew = $_GET['page'];
$startPage = $startPagenew-1;
}
else{
$startPage = 0;
}
$perPage = 10;
$currentRecord = 0;
$xml = new SimpleXMLElement('http://sports.yahoo.com/mlb/teams/bos/rss.xml', 0, true);
echo $startPage * $perPage;
foreach($xml->channel->item as $key => $value)
{
$currentRecord += 1;
if($currentRecord > ($startPage * $perPage) && $currentRecord < ($startPage * $perPage + $perPage)){
echo "$value->title";
echo "<br>";
}
}
//and the pagination:
//echo $currentRecord;
for ($i = 1; $i <= ($currentRecord / $perPage); $i++) {
echo("<a href='xmlpagination.php?page=".$i."'>".$i."</a>");
} ?>
Updated
Check this Link
http://www.phpclasses.org/package/5667-PHP-Parse-XML-documents-and-return-arrays-of-elements.html
You can use Dom and Xpath. It should be much faster, since Xpath allows you to select nodes by their position in a list.
<?php
$string = file_get_contents("http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml");
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($string);
$string = "";
$xpath = new DOMXPath($dom);
$channel = $dom->getElementsByTagName('channel')->item(0);
$numItems = $xpath->evaluate("count(item)", $channel);
// get your paging logic
$start = 10;
$end = 20;
$items = $xpath->evaluate("item[position() >= $start and not(position() > $end)]", $channel);
$count = $start;
foreach($items as $item) {
print_r("\r\n_____Node number $count ");
print_r( $item->nodeName);
$childNodes = $item->childNodes;
foreach($childNodes as $childNode) {
print_r($childNode->nodeValue);
}
$count ++;
}

Categories