Here is my code, you can copy and paste it to start runing, it's complete for test:
<?php
$url = "http://www.sportsdirect.com/ladies/ladies-underwear";
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$n = $xpath->query('//div[#class="s-producttext-top-wrapper"]');
$l = $xpath->query('//div[#class="s-producttext-top-wrapper"]/a');
$p = $xpath->query('//div[#class="s-largered"]');
$nl = $xpath->query('//a[#class="swipeNextClick NextLink"]');
$NextLink = $nl->item(0)->getAttribute("data-dcp");
$item = 0;
foreach ($n as $entry) {
$Name = $entry->nodeValue;
$Link = $l->item($item)->getAttribute("href");
$Price = $p->item($item)->nodeValue;
$Find = array('£');
$Replace = array('');
$Price = str_replace($Find, $Replace, $Price);
echo "Name: $Name - Link: $Link - Price: $Price - $NextLink<br>";
$item++;
}
?>
This is parsing all the products from http://www.sportsdirect.com/ladies/ladies-underwear which are on the FIRST page.
Here is the link for the second page http://www.sportsdirect.com/ladies/ladies-underwear#dcp=2&dppp=100&OrderBy=rank
And when i execute this code to get all the products from the SECOND page:
<?php
$url = "http://www.sportsdirect.com/ladies/ladies-underwear#dcp=2&dppp=100&OrderBy=rank";
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$n = $xpath->query('//div[#class="s-producttext-top-wrapper"]');
$l = $xpath->query('//div[#class="s-producttext-top-wrapper"]/a');
$p = $xpath->query('//div[#class="s-largered"]');
$nl = $xpath->query('//a[#class="swipeNextClick NextLink"]');
$NextLink = $nl->item(0)->getAttribute("data-dcp");
$item = 0;
foreach ($n as $entry) {
$Name = $entry->nodeValue;
$Link = $l->item($item)->getAttribute("href");
$Price = $p->item($item)->nodeValue;
$Find = array('£');
$Replace = array('');
$Price = str_replace($Find, $Replace, $Price);
echo "Name: $Name - Link: $Link - Price: $Price - $NextLink<br>";
$item++;
}
?>
I still get the results for the products of the FIRST page. Why?
How can i parse all the products from Page 2, where is my mistake?
Can you please help me out?
Thanks in advance!
I am pulling data from a page and I know this is a long process depending on the date being pulled. After 132 seconds of pulling the data the page times-out.
I have set the set_time_limit(0);and ignore_user_abort(true); - I am not sure what else to do to keep the script alive and pull all the data.
I have added the code below in case there is something i can do to speed it up??
set_time_limit(0);
ignore_user_abort(true);
error_reporting(-1);
ini_set('display_errors', 'On');
include "../include/class.php";
include "../include/db.php";
//the below will get the list of id's for each race that day
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$url = "http://form.timeform.betfair.com/daypage?date=20150516"; //WILL NEED TO PULL TOMORROWS DATE AS DD-MM-YYY
$html = curl($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
//pull the individual cards for the day
//li class="rac-cardsclass="ix ixc"
$getdropdown = '//div[contains(#data-location, "RACING_COUNTRY_GB_IE")]//div[contains(#class, "course")]';
$getdropdown2 = $xpath->query($getdropdown);
//loop through each individual card
foreach($getdropdown2 as $dropresults) {
//loop through and get all the a tags
$arr = $dropresults->getElementsByTagName("a");
foreach($arr as $item) {
//only grab the links which point to the results page
if(strpos($item->getAttribute('href'), 'raceresult') !== false) {
//grab the code
$code = explode("=", $item->getAttribute('href'));
$code = end($code);
$url = "http://form.timeform.betfair.com/raceresult?raceId=" . $code; //WILL NEED TO PULL TOMORROWS DATE AS DD-MM-YYY
$html = curl($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$spanTexts = array();
//get the place name
$getplacename = '//span[contains(#class, "locality")]';
$getplacename2 = $xpath->query($getplacename);
//loop through each individual card
foreach($getplacename2 as $getplacename22) {
echo "Venue: " . $venue = $getplacename22->textContent;
} //$getplacename2 as $getplacename22
$gettime = '//abbr [contains(#class, "dtstart")]';
//get the Date and the Time
$gettime2 = $xpath->query($gettime);
foreach($gettime2 as $gettime22) {
echo "Date : " . $Dateandtime = date(trim($gettime22->getAttribute('title')), strtotime('+5 hours'));
} //$gettime2 as $gettime22
//pull the data for the race e.g going money ect
$getdropdown22 = '//div[contains(#class, "content")]/p';
$getdropdown222 = $xpath->query($getdropdown22);
foreach($getdropdown222 as $dropresults2) {
$racename = trim($dropresults2->childNodes->item(0)->textContent);
//foreach ($dropresults2->childNodes as $node) { if(is_object($node)) { echo $node->nodeType; } else { echo $node; } }
foreach($dropresults2->childNodes as $node) {
if(is_object($node) && $node->nodeType === XML_ELEMENT_NODE && strtolower($node->tagName) === 'span') {
$spanTexts[] = (string) $node->textContent;
} //is_object($node) && $node->nodeType === XML_ELEMENT_NODE && strtolower($node->tagName) === 'span'
} //$dropresults2->childNodes as $node
if(count($spanTexts) < 6)
continue;
list($going, $distance, $age, $prizemoney, $runners, $racetype) = $spanTexts;
$going = str_replace(array(
'Â',
'Going:',
'|'
), '', $going);
$distance = miletofurlong($distance = trim(GetBetween($distance, ':', 'Â')));
$age = trim(GetBetween($age, ':', 'Â'));
$prizemoney = trim(GetBetween($prizemoney, '£', 'Â'));
$runners = trim(GetBetween($runners, ':', 'Â'));
$racetype = trim(GetBetween($racetype, ':', 'Â'));
} //$getdropdown222 as $dropresults2
//pull the individual horse data
$getdropdown = '//div[contains(#class, "table-container")]//tbody//tr';
$getdropdown2 = $xpath->query($getdropdown);
//loop through each individual card
foreach($getdropdown2 as $dropresults) {
$position = $dropresults->childNodes->item(0)->childNodes->item(1)->textContent;
$draw = str_replace(array('(',')'), '', $dropresults->childNodes->item(0)->childNodes->item(3)->textContent);
$losingdist = str_replace('Â', '', trim($dropresults->childNodes->item(2)->textContent));
if(strpos($losingdist, '¾') !== false) {
$losingdist = str_replace('¾', '.75', $losingdist);
} //strpos($losingdist, '¾') !== false
if(strpos($losingdist, '½') !== false) {
$losingdist = str_replace('½', '.5', $losingdist);
} //strpos($losingdist, '½') !== false
if(strpos($losingdist, '¼') !== false) {
$losingdist = str_replace('¼', '.25', $losingdist);
} //strpos($losingdist, '¼') !== false
$losingdist;
$horse = trim(preg_replace("/\([^\)]+\)/","",str_replace("'","",trim($dropresults->childNodes->item(4)->textContent))));
$horseage = trim($dropresults->childNodes->item(6)->textContent);
$weight = trim($dropresults->childNodes->item(8)->childNodes->item(1)->textContent);
$or = str_replace(array('(',')'), '', trim($dropresults->childNodes->item(8)->childNodes->item(3)->textContent));
str_replace('-', '', $eq = trim($dropresults->childNodes->item(10)->textContent));
$jockey = trim($dropresults->childNodes->item(12)->childNodes->item(1)->textContent);
$trainer = trim($dropresults->childNodes->item(12)->childNodes->item(4)->textContent);
$highandlowinrunning = trim($dropresults->childNodes->item(14)->childNodes->item(1)->textContent);
$highandlow = explode("/", $highandlowinrunning);
str_replace('-', '', $lowodds = trim($highandlow['1']));
str_replace('-', '', $highodds = trim($highandlow['0']));
$bfsp = trim($dropresults->childNodes->item(16)->childNodes->item(1)->textContent);
$isp = trim(str_replace('/', '', $dropresults->childNodes->item(16)->childNodes->item(3)->textContent));
$placeodds = trim($dropresults->childNodes->item(18)->textContent);
$venue = mysqli_real_escape_string($db, $venue);
$Dateandtime = mysqli_real_escape_string($db,$Dateandtime);
$going = mysqli_real_escape_string($db, $going);
$distance = mysqli_real_escape_string($db,$distance);
$age = mysqli_real_escape_string($db,$age);
$prizemoney = mysqli_real_escape_string($db,$prizemoney);
$runners = mysqli_real_escape_string($db,$runners );
$racetype = mysqli_real_escape_string($db,$racetype);
$position = mysqli_real_escape_string($db,$position );
$draw = mysqli_real_escape_string($db,$draw);
$losingdist = mysqli_real_escape_string($db,$losingdist);
$horse = mysqli_real_escape_string($db,$horse );
$age = mysqli_real_escape_string($db,$age);
$weight = mysqli_real_escape_string($db,$weight);
$or = mysqli_real_escape_string($db,$or );
$eq = mysqli_real_escape_string($db,$eq );
$jockey = mysqli_real_escape_string($db,$jockey);
$trainer = mysqli_real_escape_string($db,$trainer);
$lowodds = mysqli_real_escape_string($db,$lowodds);
$highodds = mysqli_real_escape_string($db,$highodds);
$bfsp = mysqli_real_escape_string($db,$bfsp);
$isp = mysqli_real_escape_string($db,$isp);
$placeodds = mysqli_real_escape_string($db,$placeodds);
$sql = "
INSERT INTO `Race_Records`
(
`Venue`,
`DateandTime`,
`Going`,
`Distance`,
`Age`,
`PrizeMoney`,
`Runners`,
`RaceType`,
`Position`,
`Draw`,
`LosingDist`,
`Horse`,
`HorseAge`,
`Weight`,
`OR`,
`EQ`,
`Jockey`,
`Trainer`,
`InRunningLow`,
`InRunningHigh`,
`BFSP`,
`ISP`,
`PlaceOdds`,
`RaceName`
)
VALUES
(
'$venue',
'$Dateandtime',
'$going',
'$distance',
'$age',
'$prizemoney',
'$runners',
'$racetype',
'$position',
'$draw',
'$losingdist',
'$horse',
'$age',
'$weight',
'$or',
'$eq',
'$jockey',
'$trainer',
'$lowodds',
'$highodds',
'$bfsp',
'$isp',
'$placeodds',
'$racename'
)
";
$res = mysqli_query($db, $sql);
if (!$res) {
echo PHP_EOL . "FAIL: $sql";
trigger_error(mysqli_error($db), E_USER_ERROR);
}
}
}
}
}
$id = date_create($id);
$theid2 = date_format($id,"d-m-Y");
$url = "www.sportinglife.com/racing/results/".$theid2; //WILL NEED TO PULL TOMORROWS DATE AS DD-MM-YYY
$html = curl($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$getdropdown = '//li[contains(#class, "rac-cards")]//div[contains(#class, "ix ixv")]';
$getdropdown2 = $xpath->query($getdropdown);
//loop through each individual card
foreach($getdropdown2 as $dropresults) {
//loop through and get all the a tags
$arr = $dropresults->getElementsByTagName("a");
foreach($arr as $item) {
//only grab the links which point to the results page
//grab the code
$getcomments = $item->getAttribute('href');
foreach ($listofcorses as $bad) {
if (strstr( strtolower($getcomments),strtolower($bad)) !== false) {
$url = "http://www.sportinglife.com/".$getcomments; //WILL NEED TO PULL TOMORROWS DATE AS DD-MM-YYY
$html = curl($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$spanTexts = array();
//get the place name
$getplacename = '//table';
$getplacename2 = $xpath->query($getplacename);
//loop through each individual card
$loopnumber = 0;
foreach($getplacename2 as $getplacename22) {
// get how many child nodes are in the loop
$count = 0;
foreach($getplacename22 ->childNodes->item(11)->childNodes as $node)
if(!($node instanceof \DomText))
$count++;
//loop through and get the horses name and the comment
for ($i = 0; $i < $count; $i++) {
if ($i % 2 == 0)
{
if ($getplacename22 ->childNodes->item(11)->childNodes->item($i)->childNodes->item(4) != null)
{
$horse = mysqli_real_escape_string($db,trim(preg_replace("/[^A-Za-z ]+/", "", preg_replace("/\([^\)]+\)/","",trim($getplacename22 ->childNodes->item(11)->childNodes->item($i)->childNodes->item(4)->textContent)))));
$check = "ok";
}
else
{
$check = "no";
}
}
else
{
if ($check == "ok") {
$comments = mysqli_real_escape_string($db,trim($getplacename22 ->childNodes->item(11)->childNodes->item($i)->textContent));
//update the database
$results = $db->query("UPDATE Race_Records SET comments= '$comments' WHERE Horse='$horse'");
}
}
}
}
}
}
}
}
?>
You could try setting curl's timeout
curl_setopt($ch,CURLOPT_TIMEOUT,1000);
You might also want to check that the services you are accessing in the loop are rate-limited or not, and if so put in an appropriate sleep in the loop to make sure you aren't making too many requests from the service in consecutive cycles; it could well be that the code is running OK, but then timeingout after a number of HTTP requests to the remote service
Set max execution time
// Begin your php code with this
ini_set('max_execution_time',300); // 60s*5=300s 5 minutes
I'm trying to display the latest additions to this NVD XML file:
http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-recent.xml
I can get all of them to list using the following code, but I'm only interested in displaying the most recent ten (from 2013 for the time being) and the XML file lists them in chronological order (starting in 2011).
<?php
$file= 'http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-recent.xml';
$xml = file_get_contents($file);
$sxe = new SimpleXMLElement($xml);
$ns = $sxe->getNamespaces(true);
echo "<b>Latest Vulnerabilities:</b><p>";
foreach($sxe->entry as $entry)
{
$vuln = $entry->children($ns['vuln']);
$href = $vuln->references->reference->attributes()->href;
echo "" . $vuln->{'cve-id'} . "<br>";
}
?>
Since you cannot manipulate the XML arrays directly, something like this should work for your needs:
$file= 'http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-recent.xml';
$xml = file_get_contents($file);
$sxe = new SimpleXMLElement($xml);
$ns = $sxe->getNamespaces(true);
echo "<b>Latest Vulnerabilities:</b><p>";
$all = $sxe->entry;
$length = count($all);
$offset_start = $length - 10;
for($i = 0; $i < $length; $i++)
{
if($i >= $offset_start)
{
$entry = $all[$i];
$vuln = $entry->children($ns['vuln']);
$href = $vuln->references->reference->attributes()->href;
echo "" . $vuln->{'cve-id'} . "<br>";
}
}
I have a PHP script that writes data to files in batches of 5000 table rows. When all the rows have been written to file(s) it should output the time taken to run the script. Instead what is happening is the script appears to be continually running on the browser and the output never appears but the file(s) exist, meaning the script has run. This only happens with large amounts of data. Any suggestions?
$startTime = time();
$ID = '123';
$productBatchLimit = 5000;
$products = new Products();
$countProds = $products->countShopKeeperProducts();
//limit the amount of products
//if ($countProds > $productBatchLimit){$countProds = $productBatchLimit; }
$counter = 1;
for ($i = 0; $i < $countProds; $i += $productBatchLimit) {
$xml_file = 'xml/products/'. $ID . '_'. $counter .'.xml';
$xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
$xml .= "<products>\n";
//create new file
$fh = fopen($xml_file, 'a');
fwrite($fh, $xml);
$limit = $productBatchLimit*$counter;
$prodList = $products->getProducts($i, $limit);
foreach ($prodList as $prod ){
$xml = "<product>\n";
foreach ($prod as $key => $value){
$value = functions::xml_entities($value);
$xml .= "<{$key}>{$value}</{$key}>\n";
}
$xml .= "</product>\n";
fwrite($fh, $xml);
}
$counter++;
fwrite($fh, '</products>');
fclose($fh);
}
//check to see when XML is fully formed
$validxml = XMLReader::open($xml_file);
$validxml->setParserProperty(XMLReader::VALIDATE, true);
if ($validxml->isValid()==true){
$endTime = time();
echo "Total time to generate results: ".($endTime - $startTime)." seconds. \n";
} else {
echo "Problem saving Products XML.\n";
}