Is it possible to repeat file_get_html() in a php loop? - php

I am having some trouble with a fairly simple scraping program. Hopefully the working parts help someone set up a news aggregator if they're interested. I like to get specific stuff so I find grabbing the content and filtering by keywords helps me avoid the annoying issue of headlines about things that don't pertain to my life.
I would really appreciate any help or suggestions you all might have. This one has been bugging me for a couple days and I haven't come across a good solution.
Basically, the program fetches a set of items from a table to move to another table, and in the process should take the URL component of the table entry from table A that it is currently processing, and calls another program using the include. Program B uses it to grab content from that entry and save it to a file, which is named based on other attributes of the current item from table A.
The file is then saved to a folder, and the dom variable from program B is cleared so that the loop can continue.
Here is program A:
<?php
$db = 'newsfeed';
$zeta = 0;
$beta = 0;
// connect to RDS instance MySQL Database Newsfeed
include_once('/var/www/dbfunctions/mysqli_connectdb.php');
// set content source table
$sourcetable = 'feedsources';
$mastertable = 'mastertable';
// set date to remove results older than
date_default_timezone_set("UTC");
$datenow = date_timestamp_get(date_create());
$offset = "86400";
$deldate = $datenow - $offset;
//begin cycling through content data
//delete all "old" entries from the mastertable
//get number of source items present
$itemquery = "SELECT id,name FROM $sourcetable";
$itemresult = mysqli_query($conn, $itemquery);
while ($row = mysqli_fetch_assoc($itemresult)) {
$sourceid = $row['id'];
$sourcename = $row['name'];
// cycle throught the data tables
$dataquery = "SELECT * FROM $sourcetable WHERE id = $sourceid;";
$dataresult = mysqli_query($conn, $dataquery);
while ($row = mysqli_fetch_assoc($dataresult)) {
$table = $row['datatable'];
}
// copy all data from the targetted table into the master table
//loop through the targetted table and copy to mysql
$getdata = "SELECT * FROM ".$table.";";
$datareturn = mysqli_query($conn, $getdata);
while ($row = mysqli_fetch_assoc($datareturn)) {
$date = $row['datecreated'];
$title = addslashes($row['title']);
$url = addslashes($row['url']);
$tags = addslashes($row['tags']);
$titleid = $row['id'];
//get content and place in html file in /var/www/html/nuzr/content/
//check whether the item already exists in the table
$checkquery = "select id from ".$mastertable." where title = '".$title."';";
$checkcheck = mysqli_query($conn, $checkquery);
if(mysqli_num_rows($checkcheck) > 0){
}else{
require_once("getcontent.php");
$copy = "INSERT INTO ".$mastertable." VALUES ('NULL','$table','$sourcename','$date','$title','$url','$tags','$filename');";
mysqli_query($conn, $copy);
echo "Beta is ".$beta;
$beta = $beta + 1;
}
}
// clean the master table
$delquery = 'DELETE FROM '.$mastertable.' WHERE datecreated < '.$deldate.';';
mysqli_query($conn, $delquery);
}
function clear()
{
$this->dom = null;
$this->parent = null;
$this->parent = null;
$this->children = null;
}
?>
Program B
<?php
//Check Start
//echo "Program Starts";
// Include the library
include('/var/www/tools/dom/simple_html_dom.php');
$source = $url;
$content = array();
$header1 = array();
$header2 = array();
$i = 0; $y = 0;
// Retrieve the DOM from a given URL
$html = file_get_html($source);
//grab headers in case initial title is a header
foreach($html->find('h1') as $e){
$header1[$i] = $e->outertext;
//echo $e->outertext;
$i = $i + 1;
}
$i = 0;
foreach($html->find('h2') as $e){
$header2[$i] = $e->outertext;
//echo $e->outertext;
$i = $i + 1;
}
//reset counter
$i = 0;
// Find all paragraph tags and print their content into a text file
foreach($html->find('p') as $e){
$content[$i] = $e->outertext;
//echo $e->outertext;
$i = $i + 1;
}
//create the content storage file
$filename = "/var/www/html/nuzr/content/".$table.$titleid.".html";
echo "The filename is".$filename;
$file = fopen($filename,"a");
// write header and link to original article
$titleblurb = "<b>Original article courtesy of <a href='".$url."'>".$sourcename."</a></b>";
fwrite($file, $titleblurb);
// set site specific parameters based on header / footer size
if($sourcename == "The Globe and Mail"){
//Set indexing parameters
$z = $i - 13; $y = 2;
//Add Header content
$text = $header1[0];
fwrite($file, $text);
$text = $header2[1];
fwrite($file, $text);
}elseif($sourcename == "CNN Money"){
//Set indexing parameters
$z = $i - 3; $y = 1;
//Add header content
$text = $header1[0];
fwrite($file, $text);
$text = $header2[1];
fwrite($file, $text);
}elseif($sourcename == "CNN Markets"){
//Set indexing parameters
$z = $i - 3; $y = 1;
//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);
}elseif($sourcename == "BBC Business"){
//Set indexing parameters
$z = $i - 9; $y = 1;
//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);
}elseif($sourcename == "BBC Politics"){
//Set indexing parameters
$z = $i - 0; $y = 1;
//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);
}else{
echo $sourcename;
}
do{
$text = $content[$y];
fwrite($file, $text);
$y = $y +1;
}while($y<$z);
echo "Zeta is".$zeta;
$zeta = $zeta +1;
//close the content file
fclose($file);
//echo "File end.";
$html->clear();
unset($html);
?>
So long story short when I run this with all these output echos as update points it appears that the include(program B) only runs the first iteration and then it stops running. I was getting an issue with the file_get_html() until I added the clear.

You need to place the code from PROGRAM B inside
while ($row = mysqli_fetch_assoc($datareturn)) {
fetchURL($url);
}
so that for every entry, when you have the $url, you can invoke program B. I suggest placing PROGRAM B as function and calling that function within the while loop.
something like this
function fetchURL($url) {
// Place PROGRAM B here. You can make the `include` be part of the program A at the top.
}

Related

Php save more Array into MySQL

I'm trying to save arrays on a MySQL database with PHP.
The code inserts only the first line, if I have an array of 5 elements it just inserts the first element and the others and 4 don't save them for me.
Can anyone tell me where I'm wrong?
Thanks a lot.
<?php
//getting user values
$day = $_POST['Day'];
$nDay = $_POST['n_Day'];
$fieldOne = $_POST['Field_one'];
$fieldTwo = $_POST['Field_two'];
$timeOne = $_POST['Time_one'];
$timeTwo = $_POST['Time_two'];
$idR = $_POST['id_ristorante'];
$day_array = explode(",",$day);
$nDay_array = explode(",",$nDay);
$timeOne_array = explode(",",$timeOne);
$timeTwo_array = explode(",",$timeTwo);
$len = count($day_array and $nDay_array and $timeOne_array and $timeTwo_array);
$output=array();
//require database
require_once('db.php');
//checking if email exists
$conn=$dbh->prepare('SELECT id_ristorante FROM Orari WHERE id_ristorante=:idR');
$conn->bindParam(':idR', $idR, PDO::PARAM_STR);
$conn->execute();
//results
if($conn->rowCount() !==0){
$output['isSuccess'] = 0;
$output['message'] = "Orario giĆ  inserito";
} else {
for($i=0;$i<$len;$i++){
$day = $day_array[$i];
$nDay = $nDay_array[$i];
$timeOne = $timeOne_array[$i];
$timeTwo = $timeTwo_array[$i];
$conn=$dbh->prepare('INSERT INTO Orari (Day, n_Day, Field_one, Field_two, Time_one, Time_two, id_ristorante) VALUES (?,?,?,?,?,?,?)');
//encrypting the password
$conn->bindParam(1,$day);
$conn->bindParam(2,$nDay);
$conn->bindParam(3,$fieldOne);
$conn->bindParam(4,$fieldTwo);
$conn->bindParam(5,$timeOne);
$conn->bindParam(6,$timeTwo);
$conn->bindParam(7,$idR);
$conn->execute();
if($conn->rowCount() == 0) {
$output['isSuccess'] = 0;
$output['message'] = "Errore, riprova.";
} elseif($conn->rowCount() !==0){
$output['isSuccess'] = 1;
$output['message'] = "Orari salvati!";
}
}
}
echo json_encode($output);
?>
When you trying to do count on multiple array as:
$len = count($day_array and $nDay_array and $timeOne_array and $timeTwo_array);
The and cause the array to be boolean evaluated so the final assign to $len is 1 and that why the loop is done only once and only the first element is inserted to DB.
If all the array in the same length you should just do count on 1 of them as:
$len = count($day_array)
Better practice will be to do count on each on them and then assign $len with the minimum value
Change this line
$len = count($day_array and $nDay_array and $timeOne_array and $timeTwo_array);
To
$len = count($day_array) + count($nDay_array) + count($timeOne_array) + count($timeTwo_array);

Mysqli Result Fetch_Row Memory Leak

I use mysqli API to query a large table, every 1000 rows, but the memory of my server grows up very fast. The memory is 0, even the swap. I don't know how to fix it.
The table has 4 million rows so I query the table each time by 1000.
Here is my code:
<?php
ini_set('memory_limit','32M');
$config = require_once('config.php');
$attachmentRoot = $config['attachment_root'];
$mysqli = new mysqli($config['DB_HOST'],$config['DB_USER'],$config['DB_PASSWORD'],$config['DB_NAME']);
$mysqli->set_charset('gbk');
if(!$mysqli)
throw new Exception('DB connnect faild:'.mysqli_connect_errno().mysqli_connect_error());
echo "\nRename The Dup Files With Suffix: .es201704111728es \n";
$startTime = microtime(true);
/**
*
* Move dup file to $name + .es201704111728es
*/
$suffix = ".es201704111728es";
$fileLinesLimit = 100000;
$listSuffix = 0;
$lines = 0;
/**
* Create File List.
*/
$fileList = '/tmp/Dupfilelist.txt';
$baseListName = $fileList.$listSuffix;
//$fs = fopen($baseListName,'w');
$totalSize = 0;
$start = 0;
$step = 10000;
$sql = "SELECT id,filepath,ids,duplicatefile,filesize FROM duplicate_attachment WHERE id> $start AND duplicatefile IS NOT NULL LIMIT $step";
$result = $mysqli->query($sql);
while($result->num_rows > 0)
{
while($result->fetch_row())
{
/*$fiepath = $row[1];
$uniqueIdsArray = array_unique(explode(',',$row[2]));if(empty($row[3]))throw new \Exception("\n".'ERROR:'.$row[0]."\n".var_export($row[3],true)."\n");
$uniqueFilesArray = array_unique(explode(',',$row[3]));
$hasFile = array_search($fiepath,$uniqueFilesArray);
if($hasFile !== false)
unset($uniqueFilesArray[$hasFile]);
$num = count($uniqueIdsArray);
$fileNum = count($uniqueFilesArray);
$ids = implode(',',$uniqueIdsArray);
if($num>1 && $fileNum>0){
//echo "\nID: $row[0] . File Need To Rename:".var_export($uniqueFilesArray,true)."\n";
$size = intval($row[4]);
if($lines >= $fileLinesLimit){
$lines = 0;
$listSuffix++;
//$fileList .= $listSuffix;
}
array_map(function($file) use ($attachmentRoot,$suffix,$fiepath,$totalSize,$size,$fileLinesLimit,&$listSuffix,&$lines,$fileList){
//$fs = fopen($fileList.$listSuffix,'a');
if($file === $fiepath)
return -1;
$source = $file;
$target = $source.$suffix;
//rename($source,$target);
//fwrite($fs,$source.','.$target."\n");
//file_put_contents($fileList.$listSuffix, $source.','.$target."\n",FILE_APPEND);
//$totalSize += intval($size);
$lines ++;
//echo memory_get_usage()."\n";
//fclose($fs);
//unset($fs);
//try to write file without amount memory cost
//$ts = fopen('/tmp/tempfile-0412','w');
},$uniqueFilesArray);
//echo "Test Just One Attachment Record.\n";
//echo "Ids:$ids\n";
//exit();
}*/
}
echo memory_get_peak_usage(true)."\n";
if(!$mysqli->ping())
{
echo "Mysql Conncect Failed.Reconnecting.\n";
$mysqli = new mysqli($config['DB_HOST'],$config['DB_USER'],$config['DB_PASSWORD'],$config['DB_NAME']);
$mysqli->set_charset('gbk');
if(!$mysqli)
throw new Exception('DB connnect faild:'.mysqli_connect_errno().mysqli_connect_error());
}
//mysqli_free_result($result);
$result->close();
unset($result);
$start += $step;
$sql = "SELECT id,filepath,ids,duplicatefile,filesize FROM duplicate_attachment WHERE id> $start AND duplicatefile IS NOT NULL LIMIT $step";
$result = $mysqli->query($sql);
}
echo "Dup File Total Size: $totalSize\n";
echo "Script cost time :".(microtime(true)-$startTime)." ms\n";sleep(1000*10);
mysqli_close($mysqli);
exit();
I enable the XDEBUG extension.Sorry for that.
I disable this extension and everything goes well.
I ran into this issue with PHP Version 7.3.26 on Centos 7. I worked around it by using unbuffered query (instead of buffered). In the above example, replace
$result = $mysqli->query($sql)
with
$result = $mysqli->query($sql, MYSQLI_USE_RESULT)

how to show random articles

I am working on project which shows articles and this was done by article manager (a ready to use php script) but I have a problem, I want to show only four article titles and summaries from old list of article randomly which contains 10 article. Any idea how to achieve this process?
I have auto generated summary of article
<div class="a1">
<h3><a href={article_url}>{subject}</h3>
<p>{summary}<p></a>
</div>
When a new article is added the above code will generated and add into summary page. I want to add it to side of the main article page, where user can see only four article out of ten or more randomly.
<?php
$lines = file_get_contents('article_summary.html');
$input = array($lines);
$rand_keys = array_rand($input, 4);
echo $input[$rand_keys[0]] . "<br/>";
echo $input[$rand_keys[1]] . "<br/>";
echo $input[$rand_keys[2]] . "<br/>";
echo $input[$rand_keys[3]] . "<br/>";
?>
Thanks for your kindness.
Assuming I understood you correctly - a simple solution.
<?php
// Settings.
$articleSummariesLines = file('article_summary.html', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$showSummaries = 4;
// Check if given file is valid.
$validFile = ((count($articleSummariesLines) % 4 == 0) ? true : false);
if(!$validFile) {
die('Invalid file...');
}
// Count articles and check wether all those articles exist.
$countArticleSummaries = count($articleSummariesLines) / 4;
if($showSummaries > $countArticleSummaries) {
die('Can not display '. $showSummaries .' summaries. Only '. $countArticleSummaries .' available.');
}
// Generate random article indices.
$articleIndices = array();
while(true) {
if(count($articleIndices) < $showSummaries) {
$random = mt_rand(0, $countArticleSummaries - 1);
if(!in_array($random, $articleIndices)) {
$articleIndices[] = $random;
}
} else {
break;
}
}
// Display items.
for($i = 0; $i < $showSummaries; ++$i) {
$currentArticleId = $articleIndices[$i];
$content = '';
for($j = 0; $j < 4; ++$j) {
$content .= $articleSummariesLines[$currentArticleId * 4 + $j];
}
echo($content);
}

PHP - array_push() expects parameter 1 to be array, null given in

I'm currently experiencing issues where array_push() is not working. I have ensured the arrays are directly accessible and declared correctly. Yet I'm still receiving these warnings and the values are not being pushed onto the array.
Here is my code:
include('../connstr.inc');
$email=$_REQUEST["email"];
$datafile=$_REQUEST["datafile"];
$email_safe=preg_replace("/[^a-zA-Z]/","_",$email);
$path="../uploaded_data";
$xml = simplexml_load_file("{$path}/{$email_safe}/{$datafile}.xml");
// Retreive data details for specified activity
$lapCount = $xml->Activities->Activity->Lap->count();
// Lap Variables
$totalTime = array(); $distance = array(); $maxSpeed = array();
$calories = array(); $intensity = array(); $trigMethod = array();
$avgSpeed = array();
// Convert filename to DateTime format
$datafile = convertID($datafile);
$datafile = date('Y-m-d H:i:s', strtotime($datafile));
// Variables for accurate distance calculations
$polarDistance = true;
$lapID;
$totalLapDistance;
$firstPoint = array();
$secondPoint = array();
// Collect details for each lap
for($x = 0; $x < $lapCount; $x++) {
$totalLapDistance = 0;
$lapNumber = $x+1;
$totalTime[$x] = $xml->Activities->Activity->Lap[$x]->TotalTimeSeconds;
$distance[$x] = $xml->Activities->Activity->Lap[$x]->DistanceMeters;
$maxSpeed[$x] = $xml->Activities->Activity->Lap[$x]->MaximumSpeed;
$calories[$x] = $xml->Activities->Activity->Lap[$x]->Calories;
$intensity[$x] = $xml->Activities->Activity->Lap[$x]->Intensity;
$trigMethod[$x] = $xml->Activities->Activity->Lap[$x]->TriggerMethod;
$avgSpeed[$x] = $xml->Activities->Activity->Lap[$x]->Extensions->LX->AvgSpeed;
// Store activity details into the 'detail' table
$sqlLap = "INSERT INTO lap (lapDate,lapNumber,TotalTime,distance,maxSpeed,avgSpeed,calories,intensity,trigMethod) VALUES (\"$datafile\",\"$lapNumber\",\"$totalTime[$x]\",\"$distance[$x]\",\"$maxSpeed[$x]\",\"$avgSpeed[$x]\",\"$calories[$x]\",\"$intensity[$x]\",\"$trigMethod[$x]\")";
$runLap = mysql_query($sqlLap) or die("unable to complete INSERT action:$sql:".mysql_error());
// Trackpoint variables
$altitude = array(); $tDistance = array(); $latitude = array();
$longitude = array(); $speed = array(); $pointTime = array();
// Retreive lapID
$lapID = getLapID();
// Find how many tracks exist for specified lap
$trackCount = $xml->Activities->Activity->Lap[$x]->Track->count();
$trackpointTotalCount = 1;
for($t = 0; $t < $trackCount; $t++) {
// Find out how many trackpoints exist for each track
$trackpointCount = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint->count();
// Collect details for each specificied track point
for($tp = 0; $tp < $trackpointCount; $tp++) {
$altitude[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->AltitudeMeters;
$tDistance[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->DistanceMeters;
$pointTime[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->Time;
$latitude[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->Position->LatitudeDegrees;
$longitude[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->Position->LongitudeDegrees;
$speed[$tp] = $xml->Activities->Activity->Lap[$x]->Track[$t]->Trackpoint[$tp]->Extensions->TPX->Speed;
// Check Track point
if(checkTP($altitude[$tp], $tDistance[$tp], $latitude[$tp], $longitude[$tp], $speed[$tp])) {
// Check if accurate distance should be calculated
if($polarDistance) {
$aa = $latitude[$tp];
$bb = $longitude[$tp];
$cc = $altitude[$tp];
if($tp == 0) {
array_push($firstPoint, $aa, $bb, $cc);
} else if($tp != 0) {
array_push($secondPoint, $aa, $bb, $cc);
}
printArray($firstPoint);
printArray($secondPoint);
// Add distance between trackpoints to total lap distance
$totalLapDistance += calcDistance($firstPoint, $secondPoint);
}
// Insert current trackpoint data into 'trackpoint' table
$sqlTC = "INSERT INTO trackpoint (tpDate,tpNumber,altitude,distance,latitude,longitude,speed,pointTime) VALUES (\"$datafile\",\"$trackpointTotalCount\",\"$altitude[$tp]\",\"$tDistance[$tp]\",\"$latitude[$tp]\",\"$longitude[$tp]\",\"$speed[$tp]\",\"$pointTime[$tp]\")";
$runTC = mysql_query($sqlTC) or die("unable to complete INSERT action:$sql:".mysql_error());
}
$trackpointTotalCount++;
if($polarDistance) {
if($tp != 0) {
unset($firstPoint);
$firstPoint = &$secondPoint;
unset($secondPoint);
}
}
}
}
if($polarDistance) {
if($tp != 0) {
// Update lap with more accurate distance
echo $totalLapDistance . '<br />';
$sqlUlap = "UPDATE lap SET accDistance='$totalLapDistance' WHERE lapID = '$lapID' ";
$runUlap = mysql_query($sqlUlap) or die("unable to complete UPDATE action:$sql:".mysql_error());
}
}
}
I didn't include all of the code below as there is quite a lot and I very much doubt it's relevant.
The warnings themselves only appear when trying to push a variable onto $secondPoint:
array_push($secondPoint, $aa, $bb, $cc);
However values are not being pushed onto either of the variables ($firstPoint, $secondPoint)
As a test I did echo $aa,bb and $cc and they did contain correct values.
Anybody have an idea of what I'm doing wrong?
EDIT: I have showed more of the code as I do use these arrays later, however this should not affect how the values are initially pushed? Below is some code which may affect it, namely the assign by reference?
if($polarDistance) {
if($tp != 0) {
unset($firstPoint);
$firstPoint = &$secondPoint;
unset($secondPoint);
}
}
That unset($secondPoint) will probably do it.
Try this instead:
if($polarDistance) {
if($tp != 0) {
$firstPoint = $secondPoint;
$secondPoint = array();
}
}

pChart a graph for each row

I'm new at using pChart. I take data from database to construct the graph.
I will have a random number of rows and I would like to do a graph for each row.
(one row -> one graph). Is it possible?
So far I can do the graph but all the rows are in the same graph.
Here is my code:
<?php
include("pChart/class/pData.class.php");
include("pChart/class/pDraw.class.php");
include("pChart/class/pImage.class.php");
include("pChart/class/pPie.class.php");
$myData = new pData();
$Requete = "SELECT * FROM `day`";
$Result = mysql_query($Requete, $db);
while($row = mysql_fetch_array($Result)){
$hour = explode(" ", $row["g1"]);
$nb = $row["numb"] * 2;
for($i = 0; $i < $nb; $i++){
if ($i%2 == 1){
$time[$i] = ($hour[$i])/ 60;
$myData->addPoints($time[$i],"year");
}else{
if ($hour[$i] == "00"){
$name[$i] = "On";
}elseif ($hour[$i] == "02"){
$name[$i] = "Off";
}
$myData->addPoints($name[$i],"name");
}
}
}
$myData->setAbscissa("name");
$myData->setSerieDescription("year","Application A");
$myPicture = new pImage(600,300,$myData);
$myPicture->setFontProperties(array("FontName"=>"pChart/fonts/tahoma.ttf","FontSize"=>16));
$PieChart = new pPie($myPicture,$myData);
$PieChart->draw3DPie(340,125,array("DrawLabels"=>TRUE,"Border"=>TRUE));
$myPicture->autoOutput("images/example.png");
?>
If I understand the question, you want a picture for every row? In that case I think that you will have to include the lower part of your code (starting from $myData to $myPicture) in the while loop.

Categories