I use this method to extract data from html of a website. But sometimes it gets stuck. How do I prevent it from being timedout? Like rather than giving some weird errors it should simply say; can't fetch result right now
$html = file_get_contents('https://homeshopping.pk/search.php?category%5B%5D=&search_query='.$homeshoppingSearch);
$pk_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pk_doc->loadHTML($html);
libxml_clear_errors();
$pk_xpath = new DOMXPath($pk_doc);
/
//Homeshopping
$pk_row = $pk_xpath->query('//a[#class=""]');
$pk_row2 = $pk_xpath->query('//a[#class="price"]');
$pk_row3 = $pk_xpath->query('(//a[#class="price"])//#href');
//HomeShopping
if($pk_row->length > 0){
$rowarray = array();
foreach($pk_row as $row){
$rowarray[]= $row->nodeValue;
// echo $row->nodeValue . "<br/>";
}
}
if($pk_row2->length > 0){
$row2array = array();
foreach($pk_row2 as $row2){
$row2array[]=$row2->nodeValue;
// echo $row2->nodeValue . "<br/>";
}
}
if($pk_row3->length > 0){
$row3array = array();
foreach($pk_row3 as $row3){
$row3array[]=$row3->nodeValue;
//echo $row3->nodeValue . "<br/>";
}
}
}
You can try using the set_time_limit function. It lets you set how long the execution time may be
Another way would be to set the time limit right in the php ini if you got the rights to edit it
Related
I have been playing around with a simple php webscraper I've built for a small project of mine. The scraper is running through jobposts on a website and storing all relevant information in an nested array, which I then store in an xml-file. However, the problem is that whenever i run the code it only store the first 79 jobposts and i can't seem to find the problem (I know there are more jobposts with the class I'm searching for).
If anyone can point me in the right direction or have tried something similar themselves, it whould be nice to get a solution :)
I'm running the server locally via. MAMP. Don't know if that could be the problem?
include('simple_html_dom.php');
$Pages = array();
$JobOffers = array();
$html = file_get_html("https://www.jobindex.dk/jobsoegning?q=studiejob");
$NumPage = $html->find('li.page-item');
foreach ($NumPage as $page){
$res = preg_replace("/[^0-9]/", "", $page->plaintext);
$PageNumber = $res.trim();
$PageNumToInt = (int)$PageNumber;
array_push($Pages, $PageNumToInt);
}
$HighestValue = max($Pages);
for($i = 8; $i <= $HighestValue; $i++){
$Newhtml = file_get_html("https://www.jobindex.dk/jobsoegning?page=".$i."&q=studiejob");
$items = $Newhtml->find('div.PaidJob');
foreach ($items as $job){
$RareTitle = $job->find("a", 0)->plaintext;
$CommonTitle = $job->find("a", 1)->plaintext;
$Virksomhed = $job->find("a", 2)->plaintext;
$LinkHref = $job->find("a", 1)->href;
$DisP1 = $job->find("p", 1)->plaintext;
$DisP2 = $job->find("p", 2)->plaintext;
$Dis = $DisP1 . " " . $DisP2;
$date = date("d/m/Y");
$prefix = "JoIn";
echo $RareTitle;
echo $CommonTitle;
echo $Virksomhed;
echo $LinkHref;
echo $Dis;
echo $date;
echo $prefix;
$SingleJob = array($CommonTitle, $RareTitle, $Virksomhed, $Dis, $LinkHref, $date, $prefix);
array_push($JobOffers,$SingleJob);
}}
This code is for saving the job offers in local xml file:
function SaveJobs($JobInfo){
if(file_exists("./xml/JobOffers.xml")){
$i = 1;
foreach ($JobInfo as $jobs){
$xml = new DOMDocument("1.0", "utf-8");
$xml->load("./xml/JobOffers.xml");
// Creating textnode with line break
$textNode = $xml->createTextNode("\n");
// root Element
$root = $xml->getElementsByTagName("job")->item(0);
$root->appendChild($textNode);
// Create Singlejob Element
$SingleJob = $xml->createElement("Jobitem");
//ID Attribute
$DomAtt1 = $xml->createAttribute('ID');
$DomAtt1->value = $i.$jobs[6];
$SingleJob->appendChild($DomAtt1);
//Date Attribute
$DomAtt2 = $xml->createAttribute('Date');
$DomAtt2->value = $jobs[5];
$SingleJob->appendChild($DomAtt2);
// Creating Elements
$TitleElement = $xml->createElement("Title", $jobs[0]);
$SecTitle = $xml->createElement("SecTitle", $jobs[1]);
$Firm = $xml->createElement("Firm", $jobs[2]);
$dis = $xml->createElement("Description", $jobs[3]);
$Linkhref = $xml->createElement("Linkhref", $jobs[4]);
// Append data to SingleJob Element
$SingleJob->appendChild($TitleElement);
$SingleJob->appendChild($SecTitle);
$SingleJob->appendChild($Firm);
$SingleJob->appendChild($dis);
$SingleJob->appendChild($Linkhref);
// Append Singlejob to root and save the changes
$root->appendChild($SingleJob);
$xml->save("./xml/JobOffers.xml");
$i++;
}
}}
I am fetching pagination urls list from another website using file_get_contents but the while loop won't work, it fetches data of the first url and thats it won't work on the second url of the array fetched from my database of urls.
include('simple_html_dom.php');
ini_set('max_execution_time', 0);
$con=mysqli_connect("localhost","root","","mydb");
$crawl_query = "SELECT url from list LIMIT 10";
$crawl_url = mysqli_query($con, $crawl_query);
$rows = array();
while($rows = mysqli_fetch_array($crawl_url))
{
$pages = $rows['url'];
$html = file_get_contents($pages);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$finder = new DomXPath($dom);
$links = $finder->query("//*[contains(#class, 'pages')]");
$array1= array();
foreach ($links as $link){
$array1[] = $link;
$length[] = $array1[0]->getElementsByTagName('a');
$final_length = $length[0]->length -1;
for($i=1; $i<=$final_length; $i++ )
{
if($i==1)
{
echo rtrim($pages, ".html").trim($i, "1");
echo "<br/>";
}
else
{
echo rtrim($pages, ".html")."_".trim($i, "1");
echo "<br/>";
}
}
}
}
All I get is
example.com/content/new
example.com/content/new2
example.com/content/new3
which is the result of first url in my database. That url has 3 pages in it's pagination but I can't get the loop to work on second url from mydb.
I'm trying to pull all the data from my users table and display it in XML format. The connection works fine and everything as I have a login and registration set up fine, but I can't seem to get this to display anything other than a white screen.
I've found lots of different tutorials on how to do it with mysql but not mysqli. what am i missing?
generatexml.php
<?php
include 'connection.php';
$xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
$root_element = $config['users'];
$xml .= "<$root_element>";
if ($result = $mysqli->query("SELECT * FROM users", MYSQLI_USE_RESULT)) {
while($row = $result->fetch_assoc())
{
$xml .= "<".$config['users'].">";
//loop through each key,value pair in row
foreach($result_array as $key => $value)
{
//$key holds the table column name
$xml .= "<$key>";
//embed the SQL data in a CDATA element to avoid XML entity issues
$xml .= "<![CDATA[$value]]>";
//and close the element
$xml .= "</$key>";
}
$xml.="</".$config['users'].">";
echo $xml;
}
}
?>
I struggle a lot to find out this solution in mysqli format but nowhere i found the solution. Below is the solution i figured. Run this demo and map it your requirement, surely it will help.
<?php
//Create file name to save
$filename = "export_xml_".date("Y-m-d_H-i",time()).".xml";
$mysql = new Mysqli('server', 'user', 'pass', 'database');
if ($mysql->connect_errno) {
throw new Exception(sprintf("Mysqli: (%d): %s", $mysql->connect_errno, $mysql->connect_error));
}
//Extract data to export to XML
$sqlQuery = 'SELECT * FROM t1';
if (!$result = $mysql->query($sqlQuery)) {
throw new Exception(sprintf('Mysqli: (%d): %s', $mysql->errno, $mysql->error));
}
//Create new document
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
//add table in document
$table = $dom->appendChild($dom->createElement('table'));
//add row in document
foreach($result as $row) {
$data = $dom->createElement('row');
$table->appendChild($data);
//add column in document
foreach($row as $name => $value) {
$col = $dom->createElement('column', $value);
$data->appendChild($col);
$colattribute = $dom->createAttribute('name');
// Value for the created attribute
$colattribute->value = $name;
$col->appendChild($colattribute);
}
}
/*
** insert more nodes
*/
$dom->formatOutput = true; // set the formatOutput attribute of domDocument to true
// save XML as string or file
$test1 = $dom->saveXML(); // put string in test1
$dom->save($filename); // save as file
$dom->save('xml/'.$filename);
?>
If you have access to the mysql CLI, here's my quick hack for achieving this:
$sql = "SELECT * FROM dockcomm WHERE listname = 'roortoor'
and status IN ('P','E') and comm_type IN ('W','O')
and comm_period NOT IN ('1','2','3','4') order by comment_num";
$cmd = "/usr/bin/mysql -u<person> -h<host> -p<password> <database> --xml -e \"$sql\";";
$res = system($cmd ,$resval);
echo $res;
Here is a solution using php only. You where close to getting it right. This was the key part of the code that I changed "$row as $key => $data" used $row instead of $result_array, ie. iterate through row not the result_array (this contains the entire dataset). Hope this helps someone.
if ($result->num_rows > 0) {
// output data of each row
while($row = $result->fetch_assoc()) {
$value .="<record>\r\n";
//loop through each key,value pair in row
foreach($row as $key => $data)
{
//$key holds the table column name
$vals = "\t" . "<". $key . ">" . "<![CDATA[" . $data . "]]>" . "</" . $key . ">" . "\r\n";
$value = $value . $vals;
//echo $value;
}
$value .="</record>\r\n";
$count++;
}
} else {
// echo "0 results";
}
$conn->close();
One possible issue could be this line:
if ($result = $mysqli->query("SELECT * FROM users", MYSQLI_USE_RESULT)) {
Try the procedural approach instead of the object oriented approach. I do not know if $mysqli is defined in connection.php, but it is possible that you mixed it up.
if ($result = mysqli_query('SELECT * FROM users', MYSQLI_USE_RESULT)) {
This could resolve the white screen error.
I noticed two other things:
(1) One tiny effectiveness issue:
$xml = '<?xml version="1.0" encoding="UTF-8"?>';
So you do not need to escape every single quotation mark.
(2) One serious XML issue: The root element needs to be closed before you echo your $xml.
$xml .= "</$root_element>";
echo $xml;
Generally, for your purpose, it would be safer to use PHP's XMLWriter extension, as already proposed.
I like to have update of some selected users score from https://location.services.mozilla.com/leaders For that i want to scrape their data using the id as you see for the first user <a id="g#v" href="#g#v">g#v</a>
I want to have his data using the id on that anchor. I have wrote the attached code but, couldn't succeed. Best if i could create an array like $names=array("g#v","elly","elkos"," grack"); and get the scores of all of them, so if i increase the number of names, i can have everyone's score. I was trying for a single user though:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//a[#id="g#v"]');
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$name = $row->nodeValue;
$scores = $pokemon_xpath->query('//.td.tr/td[#class="text-right"]', $row);
foreach($scores as $score){
$lead = $score->nodeValue;
}
echo $name . ": ". $lead;
}
}
}
?>
I tweaked your scraping code to display all users and their score:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//tr');
if($pokemon_row->length > 0){
// Get each tr
foreach($pokemon_row as $row){
// Get each td in the tr group
$tds = $row->childNodes;
foreach($tds as $td){
// Filter out empty nodes
if(trim($td->nodeValue, " \n\r\t\0\xC2\xA0") !== "") {
$name = $td->nodeValue;
echo $name . " ";
}
}
echo "<br>" . PHP_EOL;
}
}
}
?>
This will print out:
Rank User Points
1. g#v 18444343
2. elly 4458330
3. elkos 4452607
4. grack 2769789
5. sonickydon 2707133
6. SDBoyd 2636721
...
If you want to search for a single user, then just replace the query line with this:
$pokemon_row = $pokemon_xpath->query('//tr[td/a[#id="g#v"]]');
which will print:
1. g#v 18444343
Hope that helps.
i am trying to do this:
i have several thousand xml files, i am reading them, and i am looking for special text inside an xml with specific tag, but those tags which are having the text i need, are different. what i did till now is this:
$xml_filename = "xml/".$anzeigen_id.".xml";
$dom = new DOMDocument();
$dom->load($xml_filename);
$value = $dom->getElementsByTagName('FormattedPositionDescription');
foreach($value as $v){
$text = $v->getElementsByTagName('Value');
foreach($text as $t){
$anzeige_txt = $t->nodeValue;
$anzeige_txt = utf8_decode($anzeige_txt);
$anzeige_txt = mysql_real_escape_string($anzeige_txt);
echo $anzeige_txt;
$sql = "INSERT INTO joinvision_anzeige(`firmen_id`,`anzeige_id`,`anzeige_txt`) VALUES ('$firma_id','$anzeigen_id','$anzeige_txt')";
$sql_inserted = mysql_query($sql);
if($sql_inserted){
echo "'$anzeigen_id' from $xml_filename inserted<br />";
}else{
echo mysql_errno() . ": " . mysql_error() . "\n";
}
}
}
now what i need to do is this:
look for FormattedPositionDescription in xml and if there is not this tag there, then look for anothertag in that same xml file..
how can i do this, thanks for help in advance
Just check the length property of the DOMNodeList:
$value = $dom->getElementsByTagName('FormattedPositionDescription');
if($value->length > 0)
{
// found some FormattedPositionDescription
}
else
{
// didn't find any FormattedPositionDescription, so look for anothertag
$list = $dom->getElementsByTagName('anothertag');
}