I have a problem when looping through data and creating XML files using DOMDocument. It all worked fine until I decided to run this script in batches. Now I have multiple '<?xml version="1.0"?>' starting tags in my XML files, looks like one for each batch. There are also more products nodes being generated than there are products. Can anyone help.
//get products
$productsObj = new Products($db,$shopKeeperID);
//find out how many products
$countProds = $productsObj->countProducts();
$productBatchLimit = 3; //keep low for testing
//create new file
$file = 'products/'. $products . '.xml';
$fh = fopen($file, 'a');
//create XML document object model (DOM)
$doc = new DOMDocument();
$doc->formatOutput = true;
$counter = 1;
$r = $doc->createElement( "products" );
$doc->appendChild( $r );
for ($i = 0; $i < $countProds; $i += $productBatchLimit) {
$limit = $productBatchLimit*$counter;
$products = $productsObj->getShopKeeperProducts($i, $limit);
$prod = '';
//loop through each product to create well formed XML
foreach( $products as $product ){
$prod = $doc->createElement( "offer" );
$offerID = $doc->createElement( "offerID" );
$offerID->appendChild($doc->createTextNode( $product['prod_id'] ));
$prod->appendChild( $offerID );
$productName = $doc->createElement( "name" );
$productName->appendChild($doc->createTextNode( $product['productName'] ));
$prod->appendChild( $productName );
$r->appendChild( $prod );
$strxml = $doc->saveXML();
}
fwrite($fh, $strxml);
$counter++;
}
fclose($fh);
I am using this command that erases all strings like <?something?>.
$text = preg_replace( '/<\?[^\?>]*\?>/', ' ', $text);
First erase them all and then put one at the beginning.
I did just this a while ago, I can't see exactly whats going wrong. But I can provide the function I built for a site for you to look at.
This function worked 100% fine and as expected. It created a perfect XML document and formatted it perfectly. I hope this helps you find your problem.
function create_xml_file()
{
/* create a dom document with encoding utf8 */
$domtree = new DOMDocument('1.0', 'utf-8');
/* create the root element of the xml tree */
/* Data Node */
$xmlRoot = $domtree->createElement("data");
/* append it to the document created */
$xmlRoot = $domtree->appendChild($xmlRoot);
/* Set our Prices in our <data> <config> node */
$config_node = $domtree->createElement("config");
$config_node = $xmlRoot->appendChild($config_node);
// Add - node to config
$config_node->appendChild($domtree->createElement('config_node', '123456'));
$config_node->appendChild($domtree->createElement('some_other_data', '123456'));
/* Create prices Node */
$price_node = $domtree->createElement('price');
$price_node = $config_node->appendChild($price_node);
/* Black Price Node */
$black_node = $price_node->appendChild($domtree->createElement('black'));
foreach ($p->List_all() as $item):
if ($item['color'] == 'black'):
$black_node->appendChild($domtree->createElement($item['type'], $item['price']));
endif;
endforeach;
/* Create galleries Node */
$galleries_node = $domtree->createElement("galleries");
$galleries_node = $xmlRoot->appendChild($galleries_node);
foreach ($i->List_all() as $image):
/* Our Individual Gallery Node */
$gallery_node = $domtree->createElement("gallery");
$gallery_node = $galleries_node->appendChild($gallery_node);
$gallery_node->appendChild($domtree->createElement('name', $image['name']));
$gallery_node->appendChild($domtree->createElement('filepath', $image['filepath']));
$gallery_node->appendChild($domtree->createElement('thumb', $image['thumb']));
endforeach;
/* Format it so it is human readable */
$domtree->preserveWhiteSpace = false;
$domtree->formatOutput = true;
/* get the xml printed */
//echo $domtree->saveXML();
$file = 'xml/config.xml';
$domtree->save($file);
}
I hope this helps you find your answer. I commented it well for easy understanding.
I simply changed
$strxml = $doc->saveXML();
and
fwrite($fh, $strxml);
To
$doc->save($file);
and it works perfectly.
Related
I am trying to scrape this webpage. In this webpage I have to get the job title and its location. Which I am able to get from my code. But the problem is coming that when I am sending it in XML, then only one detail is going from the array list.
I am using goutte CSS selector library and also please tell me how to scrap pagination in goutte CSS selector library.
here is my code:
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', 'https://www.simplyhired.com/search?q=pharmacy+technician&l=American+Canyon%2C+CA&job=X5clbvspTaqzIHlgOPNXJARu8o4ejpaOtgTprLm2CpPuoeOFjioGdQ');
$job_posting_location = [];
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location')
->each(function ($node) use (&$job_posting_location) {
$job_posting_location[] = $node->text() . PHP_EOL;
});
$joblocation = 0;
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-title-container h3 a')
->each( function ($node) use ($job_posting_location, &$joblocation, $httpClient) {
$job_title = $node->text() . PHP_EOL; //job title
$job_posting_location = $job_posting_location[$joblocation]; //job posting location
// display the result
$items = "{$job_title} # {$job_posting_location}\n\n";
global $results;
$result = explode('#', $items);
$results['job_title'] = $result[0];
$results['job_posting_location'] = $result[1];
$joblocation++;
});
function convertToXML($results, &$xml_user_info){
foreach($results as $key => $value){
if(is_array($value)){
$subnode = $xml_user_info->addChild($key);
foreach ($value as $k=>$v) {
$xml_user_info->addChild("$k",htmlspecialchars("$v"));
}
}else{
$xml_user_info->addChild("$key",htmlspecialchars("$value"));
}
}
return $xml_user_info->asXML();
}
$xml_user_info = new SimpleXMLElement('<root/>');
$xml_content = convertToXML($results,$xml_user_info);
$xmlFile = 'details.xml';
$handle = fopen($xmlFile, 'w') or die('Unable to open the file: '.$xmlFile);
if(fwrite($handle, $xml_content)) {
echo 'Successfully written to an XML file.';
}
else{
echo 'Error in file generating';
}
what i got in xml file --
<?xml version="1.0"?>
<root><job_title>Pharmacy Technician
</job_title><job_posting_location> Vallejo, CA
</job_posting_location></root>
what i want in xml file --
<?xml version="1.0"?>
<root>
<job_title>Pharmacy Technician</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician 1</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician New</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
and so on...
</root>
You overwrite the values in the $results variable. You're would need to do something like this to append:
$results[] = [
'job_title' => $result[0];
'job_posting_location' => $result[1]
];
However here is no need to put the data into an array at all, just create the
XML directly with DOM.
Both your selectors share the same start. Iterate the card and then fetch
related data.
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', $url);
$document = new DOMDocument();
// append document element node
$postings = $document->appendChild($document->createElement('jobs'));
// iterate job posting cards
$response->filter('.LeftPane article .SerpJob-jobCard.card')->each(
function($jobCard) use ($document, $postings) {
// fetch data
$location = $jobCard
->filter(
'.jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location'
)
->text();
$title = $jobCard->filter('.jobposting-title-container h3 a')->text();
// append 'job' node to group data in result
$job = $postings->appendChild($document->createElement('job'));
// append data nodes
$job->appendChild($document->createElement('job_title'))->textContent = $title;
$job->appendChild($document->createElement('job_posting_location'))->textContent = $location;
}
);
echo $document->saveXML();
I wrote a database seeder last year that scrapes a stats website. Upon revisiting my code, it no longer seems to be working and I am a bit stumped as to the reason. $html->find() is supposed to return an array of elements found, however it seems to only be finding the first table when used.
As per the documentation, I instead tried using find() and specifying each table's ID, however this also seems to fail.
$table_passing = $html->find('table[id=passing]');
Can anyone help me figure out what is wrong here? I am at a loss as to why neither of these methods are working, where the page source clearly shows multiple tables and the IDs, where both approaches should work.
private function getTeamStats()
{
$url = 'http://www.pro-football-reference.com/years/2016/opp.htm';
$html = file_get_html($url);
$tables = $html->find('table');
$table_defense = $tables[0];
$table_passing = $tables[1];
$table_rushing = $tables[2];
//$table_passing = $html->find('table[id=passing]');
$teams = array();
# OVERALL DEFENSIVE STATISTICS #
foreach ($table_defense->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$rank = $stats[0]->plaintext;
$games = $stats[2]->plaintext;
$yards = $stats[4]->plaintext;
// Calculate the Yards Allowed per Game by dividing Total / Games
$tydag = $yards / $games;
$teams[$name]['rank'] = $rank;
$teams[$name]['games'] = $games;
$teams[$name]['tydag'] = $tydag;
}
}
# PASSING DEFENSIVE STATISTICS #
foreach ($table_passing->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$pass_rank = $stats[0]->plaintext;
$pass_yards = $stats[14]->plaintext;
$teams[$name]['pass_rank'] = $pass_rank;
$teams[$name]['paydag'] = $pass_yards;
}
}
# RUSHING DEFENSIVE STATISTICS #
foreach ($table_rushing->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$rush_rank = $stats[0]->plaintext;
$rush_yards = $stats[7]->plaintext;
$teams[$name]['rush_rank'] = $rush_rank;
$teams[$name]['ruydag'] = $rush_yards;
}
}
I never use simplexml or other derivatives but when using an XPath query to find an attribute such as ID usually one would prefix with # and the value should be quoted - so for your case it might be
$table_passing = $html->find('table[#id="passing"]');
Using a standard DOMDocument & DOMXPath approach the issue was that the actual table was "commented out" in source code - so a simple string replacement of the html comments enabled the following to work - this could easily be adapted to the original code.
$url='http://www.pro-football-reference.com/years/2016/opp.htm';
$html=file_get_contents( $url );
/* remove the html comments */
$html=str_replace( array('<!--','-->'), '', $html );
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( $html );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$tbl=$xp->query( '//table[#id="passing"]' );
foreach( $tbl as $n )echo $n->tagName.' > '.$n->getAttribute('id');
/* outputs */
table > passing
I'm stuck on something extremely simple.
Here is my xml feed:
http://xml.betfred.com/Horse-Racing-Daily.xml
Here is my code
<?php
function HRList5($viewbets) {
$xmlData = 'http://xml.betfred.com/Horse-Racing-Daily.xml';
$xml = simplexml_load_file($xmlData);
$curdate = date('d/m/Y');
$new_array = array();
foreach ($xml->event as $event) {
if($event->bettype->attributes()->bettypeid == $viewbets){//$_GET['evid']){
// $eventid = $_GET['eventid'];
// if ($limit == $c) {
// break;
// }
// $c++;
$eventd = substr($event->attributes()->{'date'},6,2);
$eventm = substr($event->attributes()->{'date'},4,2);
$eventy = substr($event->attributes()->{'date'},0,4);
$eventt = $event->attributes()->{'time'};
$eventid = $event->attributes()->{'eventid'};
$betname = $event->bettype->bet->attributes()->{'name'};
$bettypeid = $event->bettype->attributes()->{'bettypeid'};
$betprice = $event->bettype->bet->attributes()->{'price'};
$betid = $event->bettype->bet->attributes()->{'id'};
$new_array[$betname.$betid] = array(
'betname' => $betname,
'viewbets' => $viewbets,
'betid' => $betid,
'betname' => $betname,
'betprice' => $betprice,
'betpriceid' => $event->bettype->attributes()->{'betid'},
);
}
ksort($new_array);
$limit = 10;
$c = 0;
foreach ($new_array as $event_time => $event_data) {
// $racedate = $event_data['eventy'].$event_data['eventm'].$event_data['eventd'];
$today = date('Ymd');
//if($today == $racedate){
// if ($limit == $c) {
// break;
//}
//$c++;
$replace = array("/"," ");
// $eventname = str_replace($replace,'-', $event_data['eventname']);
//$venue = str_replace($replace,'-', $event_data['venue']);
echo "<div class=\"units-row unit-100\">
<div class=\"unit-20\" style=\"margin-left:0px;\">
".$event_data['betprice']."
</div>
<div class=\"unit-50\">
".$event_data['betname'].' - '.$event_data['betprice']."
</div>
<div class=\"unit-20\">
<img src=\"betnow.gif\" ><br />
</div>
</div>";
}
}//echo "<strong>View ALL Horse Races</strong> <strong>>></strong>";
//var_dump($event_data);
}
?>
Now basically the XML file contains a list of horse races that are happening today.
The page I call the function on also declares
<?php $viewbets = $_GET['EVID'];?>
Then where the function is called I have
<?php HRList5($viewbets);?>
I've just had a play around and now it displays the data in the first <bet> node
but the issue is it's not displaying them ALL, its just repeating the 1st one down the page.
I basically need the xml feed queried & if the event->bettype->attributes()->{'bettypeid'} == $viewbets I want the bet nodes repeated down the page.
I don't use simplexml so can offer no guidance with that - I would say however that to find the elements and attributes you need within the xml feed that you ought to use an XPath query. The following code will hopefully be of use in that respect, it probably has an easy translation into simplexml methods.
Edit: Rather than targeting each bet as the original xpath did which then caused issues, the following should be more useful. It targets the bettype and then processes the childnodes.
/* The `eid` to search for in the DOM document */
$eid=25573360.20;
/* create the DOM object & load the xml */
$dom=new DOMDocument;
$dom->load( 'http://xml.betfred.com/Horse-Racing-Daily.xml' );
/* Create a new XPath object */
$xp=new DOMXPath( $dom );
/* Search the DOM for nodes with particular attribute - bettypeid - use number function from XSLT to test */
$oCol=$xp->query('//event/bettype[ number( #bettypeid )="'.$eid.'" ]');
/* If the query was successful there should be a nodelist object to work with */
if( $oCol ){
foreach( $oCol as $node ) {
echo '
<h1>'.$node->parentNode->getAttribute('name').'</h1>
<h2>'.date('D, j F, Y',strtotime($node->getAttribute('bet-start-date'))).'</h2>';
foreach( $node->childNodes as $bet ){
echo "<div>Name: {$bet->getAttribute('name')} ID: {$bet->getAttribute('id')} Price: {$bet->getAttribute('price')}</div>";
}
}
} else {
echo 'XPath query failed';
}
$dom = $xp = $col = null;
I've made this program that updates an xml file based on entries in an array.
I've used FILE_APPEND because the entries are more than one and otherwise file gets overwritten. But the problem is the xml version tag prints out as many times as many entries are there.
So i want to remove this tag.
Here's my program:-
<?php
include 'array.php';
$xmlW = new XMLWriter();
$file = 'entry-'. date('M-D-Y') .'.xml';
/*$setting = new XMLWriterSettings();
$setting->OmitXmlDeclaration = true;*/
foreach($data as $d) {
if(in_array ($d['Mode'], array('ccAV','MB','Paypal','E2P'))) {
$recordType = 'receipt';
$xml_object = simplexml_load_file ('receipt.xml');
} else {
$xml_object = simplexml_load_file ('journal.xml');
$recordType = 'journal';
}
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->DATE = $d['InvoiceDate'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->NARRATION = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->EFFECTIVEDATE = $d['InvoiceDate'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[0]->LEDGERNAME = $d['Mode'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[0]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[1]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[2]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->DATE = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->INSTRUMENTDATE = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->AMOUNT = 'Rahul';
$xml = $xml_object->asXML();
file_put_contents($file, $xml, FILE_APPEND);
}
?>
Thanks for the help.
I'm using simplexml to read a xml file. So far i'm unable to get the attribute value i'm looking for. this is my code.
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$usergroup = $doc->getElementsByTagName( "preset" );
foreach($usergroup as $group){
$pname = $group->getElementsByTagName( "name" );
$att = 'code';
$name = $pname->attributes()->$att; //not working
$name = $pname->getAttribute('code'); //not working
if($name==$preset_name){
echo($name);
$group->parentNode->removeChild($group);
}
}
}
and my xml file looks like
<presets>
<preset>
<name code="default">Default</name>
<createdBy>named</createdBy>
<icons>somethignhere</icons>
</preset>
</presets>
Try this :
function getByPattern($pattern, $source)
{
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$result = $xpath->evaluate($pattern);
return $result;
}
And you may use it like (using XPath) :
$data = getByPattern("/regions/testclass1/presets/preset",$xml);
UPDATE
Code :
<?php
$xmlstr = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><presets><preset><name code=\"default\">Default</name><createdBy>named</createdBy><icons>somethignhere</icons></preset></presets>";
$xml = new SimpleXMLElement($xmlstr);
$result = $xml->xpath("/presets/preset/name");
foreach($result[0]->attributes() as $a => $b) {
echo $a,'="',$b,"\"\n";
}
?>
Output :
code="default"
P.S. And also try accepting answers as #TJHeuvel mentioned; it's an indication that you respect the community (and the community will be more than happy to help you more, next time...)
Actually question in my head includes deleting a node as well , mistakenly i could not add it. So in my point of view this is the complete answer, i a case if someone else find this useful.
This answer doesn't include SimpleXMLElement class because how hard i tried it didn't delete the node with unset(); . So back to where i was , i finally found an answer. This is my code.
and its Simple!!!
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$presetgroup = $doc->getElementsByTagName( "preset" );
foreach($presetgroup as $group){
$pname = $group->getElementsByTagName( "name" );
$pcode = $pname->item(0)->getAttribute('code');
if($pcode==$preset_name){
echo($preset_name);
$group->parentNode->removeChild($group);
}
}
}
$doc->save($xmlfile);