I have the following code:
<?php
error_reporting(E_ALL & ~E_NOTICE);
set_time_limit(1000);
$f = $_GET['location'].'.txt';
if ( !file_exists($f) ) {
die('Location unavailable');
}
$file = fopen($f, "r");
$i = 0;
while (!feof($file)) {
$members[] = fgets($file);
}
fclose($file);
function get_thumbs($url)
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_HEADER, 0);
ob_start();
curl_exec ($ch);
curl_close ($ch);
$string = ob_get_contents();
ob_end_clean();
return $string;
}
foreach ($members as $id){
// echo $id.'<br/>'; // do something with each line from text file here
$id = preg_replace('/\s+/', '', $id);
$link = 'http://localhost/cvr/get.php?id='.$id.'&thumb=yes&title=yes';
$content = get_thumbs($link);
echo $content;
}
?>
Where get.php is using almost the same above cURL function to grab data from a website and parse it.
In the txt file I have like 20 ids to get data from but the foreach seems to take a very long time to load, like 30+ seconds. Any advice?
I am a php beginner so please don't be very hard on me.
Thanks!
Related
I've written a script in php to fetch links and write them in a csv file from the main page of wikipedia. The script does fetch the links accordingly. However, I can't write the populated results in a csv file. When I execute my script, It does nothing, no error either. Any help will be highly appreciated.
My try so far:
<?php
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
foreach ($dom->find('a') as $link) {
$links[]= $link->href . '<br>';
}
return implode("\n", $links);
$file = fopen("itemfile.csv","w");
foreach ($links as $item) {
fputcsv($file,$item);
}
fclose($file);
}
fetch_content($url);
?>
1.You are using return in your function, that's why nothing gets written in the file as code stops executing after that.
2.Simplified your logic with below code:-
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
So the full code needs to be:-
<?php
//comment these two lines when script started working properly
error_reporting(E_ALL);
ini_set('display_errors',1); // 2 lines are for Checking and displaying all errors
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
}
fetch_content($url);
?>
The reason the file does not get written is because you return out of the function before that code can even be executed.
I've written a script in php to scrape the titles and its links from a webpage and write them accordingly to a csv file. As I'm dealing with a paginated site, only the content of last page remains in the csv file and the rest are being overwritten. I tried with writing mode w. However, when I do the same using append a then I find all the data in that csv file.
As appending and writing data makes the csv file open and close multiple times (because of my perhaps wrongly applied loops), the script becomes less efficient and time consuming.
How can i do the same in an efficient manner and of course using (writing) w mode?
This is I've written so far:
<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page=";
function get_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$infile = fopen("itemfile.csv","a");
foreach($dom->find('.question-summary') as $file){
$itemTitle = $file->find('.question-hyperlink', 0)->innertext;
$itemLink = $file->find('.question-hyperlink', 0)->href;
echo "{$itemTitle},{$itemLink}<br>";
fputcsv($infile,[$itemTitle,$itemLink]);
}
fclose($infile);
}
for($i = 1; $i<10; $i++){
get_content($link.$i);
}
?>
If you don't want to open and close the file multiple times, then move the opening script before your for-loop and close it after:
function get_content($url, $inifile)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
foreach($dom->find('.question-summary') as $file){
$itemTitle = $file->find('.question-hyperlink', 0)->innertext;
$itemLink = $file->find('.question-hyperlink', 0)->href;
echo "{$itemTitle},{$itemLink}<br>";
fputcsv($infile,[$itemTitle,$itemLink]);
}
}
$infile = fopen("itemfile.csv","w");
for($i = 1; $i<10; $i++) {
get_content($link.$i, $inifile);
}
fclose($infile);
?>
I would consider not echoing or writing results to the file in get_content function. I would rewrite it so it would only get content, so I can handle extracted data any way I like. Something like this (please read code comments):
<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page=";
// This function does not write data to a file or print it. It only extracts data
// and returns it as an array.
function get_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
// We don't need the following line anymore
// $infile = fopen("itemfile.csv","a");
// We will collect extracted data in an array
$result = [];
foreach($dom->find('.question-summary') as $file){
$itemTitle = $file->find('.question-hyperlink', 0)->innertext;
$itemLink = $file->find('.question-hyperlink', 0)->href;
$result []= [$itemTitle, $itemLink];
// echo "{$itemTitle},{$itemLink}<br>";
// No need to write to file, so we don't need the following as well
// fputcsv($infile,[$itemTitle,$itemLink]);
}
// No files opened, so the following line is no more required
// fclose($infile);
// Return extracted data from this specific URL
return $result;
}
// Merge all results (result for each url with different page parameter
// With a little refactoring, get_content() can handle this as well
$result = [];
for($page = 1; $page < 10; $page++){
$result = array_merge($result, get_content($link.$page));
}
// Now do whatever you want with $result. Like writing its values to a file, or print it, etc.
// You might want to write a function for this
$outputFile = fopen("itemfile.csv","a");
foreach ($result as $row) {
fputcsv($outputFile, $row);
}
fclose($outputFile);
?>
I'm Try To Make Cache File For Quick JSON Response But I have Some Problems I got This Code in a PHP File But I can't Understand How I can Make This For $url JSON File for any JSON Response I have No Many Skills Please anyone can help me for This Problem
Here is Code What i'm Try
function data_get_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$uaa = $_SERVER['HTTP_USER_AGENT'];
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent: $uaa");
return curl_exec($ch);
}
$cache_option = true;
$id = 'DU6IdS2gVog';
$url = 'https://www.googleapis.com/youtube/v3/videos?key={Youtube_Api}&fields=items(snippet(title%2Cdescription%2Ctags))&part=snippet&id=DU6IdS2gVog';
$data = data_get_curl($url);
header('Content-Type: application/json');
echo $data;
function cache_set($id,$data,$time = 84600){
if(!$cache_option) return NULL;
$name = ROOT."/cache/".md5($id).".json";
$fh = fopen($name, "w");
fwrite($fh, serialize($data));
fclose($fh);
}
function cache_get($id){
if(!$cache_option) return NULL;
$file = ROOT."/cache/".md5($id).".json";
if(file_exists($file)){
if(time() - 84600 < filemtime($file)){
return unserialize(data_get_curl($file));
}else{
unlink($file);
}
}
return NULL;
}
Thanks In Advance
There are multiple errors with your code. You can simple use file_put_contents to add content to cache file.
Try the following example
$id = 'DU6IdS2gVog';
$cache_path = 'cache/';
$filename = $cache_path.md5($id);
if( file_exists($filename) && ( time() - 84600 < filemtime($filename) ) )
{
$data = json_decode(file_get_contents($filename), true);
}
else
{
$data = data_get_curl('https://www.googleapis.com/youtube/v3/videos?key={API_KEY}&fields=items(snippet(title%2Cdescription%2Ctags))&part=snippet&id='.$id);
file_put_contents($filename, $data);
$data = json_decode($data, true);
}
How can I make that when I execute PHP script from my browser on webserver it shows results of that script while it's still executing (the code has many curls and it takes about 30 minutes to fully process). I have two servers and on one it shows every "echo" when it's called, but on the other server it shows all after 30 minutes, when script is fully executed. Both servers are running on apache, php 5.6
code:
<?php
error_reporting('E_ALL');
ini_set('error_reporting', 'E_ALL');
set_time_limit ( 2);
$i=0;
$handle = fopen("filmyNasze.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
$line = explode("##", $line);
$nazwafilmu = trim($line[0]);
$linkfilmu = trim($line[1]);
$linkfilmu = 'http://xxx.pl' . $linkfilmu;
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $linkfilmu);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: cda.player=html5"));
curl_exec($ch);
$result = curl_exec($ch);
curl_close($ch);
libxml_use_internal_errors(true);
$doc2 = new DOMDocument();
$doc2->loadHTML($result);
$divid = str_replace('/video/', '', trim($line[1]));
foreach( $doc2->getElementsByTagName('div') as $div ) {
if($div->getAttribute('id') == 'mediaplayer' . $divid) {
$array = json_decode($div->getAttribute('player_data'), true);
//echo $array["video"]["file"] . " ## ";
}
}
echo $nazwafilmu . ' ## ' . trim($line[1]) . ' ## ' . $array["video"]["file"] . '<br />';
}
fclose($handle);
}
else {
die('brak pliku .txt');
}
You can flush PHP's output buffer using ob_flush() and flush().
http://php.net/manual/en/function.flush.php
<?php
ob_start();
echo 'Something';
ob_flush();
flush();
ob_end_flush();
So from my understanding this should be fairly simple as I should only need to change the original fileget contents code, and the rest of the script should still work? I have commented out the old file get contents and added the curl below.
after changing from file get contents to cURL the code below does not output
//$data = #file_get_contents("http://www.city-data.com/city/".$cityActualURL."-".$stateActualURL.".html");
//$data = file_get_contents("http://www.city-data.com/city/Geneva-Illinois.html");
//Initialize the Curl session
$ch = curl_init();
$url= "http://www.city-data.com/city/".$cityActualURL."-".$stateActualURL.".html";
//echo "$url<br>";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
//echo $data;
$details = str_replace("\n", "", $data);
$details = str_replace("\r", "", $details);
$detailsBlock = <<<HTML
~<div style='clear:both;'></div><br/><b>(.*?) on our <a href='http://www.city-data.com/top2/toplists2.html'>top lists</a>: </b><ul style='margin:10px;'>(.*?)<div style='bp_bindex'>~
HTML;
$detailsBlock2 = <<<HTML
~<br/><br/><b>(.*?) on our <a href='http://www.city-data.com/top2/toplists2.html'>top lists</a>: </b><ul style='margin:10px;'>(.*?)</ul></td>~
HTML;
$detailsBlock3 = <<<HTML
~<div style='clear:both;'></div><br/><b>(.*?) on our <a href='http://www.city-data.com/top2/toplists2.html'>top lists</a>: </b><ul style='margin:10px;'>(.*?)</ul></td>~
HTML;
preg_match($detailsBlock, $details, $matches);
preg_match($detailsBlock2, $details, $matches2);
preg_match($detailsBlock3, $details, $matches3);
if (isset($matches[2]))
{
$facts = "<ul style='margin:10px;'>".$matches[2];
}
elseif (isset($matches2[2]))
{
$facts = "<ul style='margin:10px;'>".$matches2[2];
}
elseif (isset($matches3[2]))
{
$facts = "<ul style='margin:10px;'>".$matches3[2];
}
else
{
$facts = "More Information to Come...";
}
If you have a problem with your script you need to debug it. For example:
$data = curl_exec($ch);
var_dump($data); die();
Then you will get an output what $data is. Depending on the output you can further decide where to look next for the cause of the malfunction.
The following function works great, just pass it a URL.
function file_get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
TIP: New lines and carriage returns can be replaced with one line of code.
$details = str_replace(array("\r\n","\r","\n"), '', $data);