curl not downloading all pdf files PHP - php

im trying to download multiple pdfs with php. i get an array of urls and each url redirects to a website that contains a pdf file if something is wrong with that url it just redirects to a html page, so i've been googling and found this to download all pdfs to the server:
public function download ($data, $simultaneous = 1, $save_to)
{
$loops = array_chunk($data, $simultaneous, true);
foreach ($loops as $key => $value)
{
foreach ($value as $urlkey => $urlvalue)
{
$ch[$urlkey] = curl_init($urlvalue["url"]);
curl_setopt($ch[$urlkey], CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch[$urlkey], CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch[$urlkey], CURLOPT_SSL_VERIFYHOST, false);
}
$mh = curl_multi_init();
foreach ($value as $urlkey => $urlvalue)
{
curl_multi_add_handle($mh, $ch[$urlkey]);
}
$running = null;
do {
curl_multi_exec($mh, $running);
} while ($running);
foreach ($value as $urlkey => $urlvalue)
{
$response = curl_multi_getcontent($ch[$urlkey]);
file_put_contents($save_to.$urlvalue["saveas"], $response);
curl_multi_remove_handle($mh,$ch[$urlkey]);
curl_close($ch[$urlkey]);
}
}
}
for some reason this downloads only some of the files
anyone has any idea why this is not working?
any help would be appreciated

Related

How to validate image url in php with multi_curl?

Hello guys I am checking the url if it has a image or not by using multi curl. But here's the issue. What if the $testArray array has like 2000 links and I do not want to make 2000 curl request at a time, so I would like to do curl request of 50 at a time. How can I accomplish this? Please let me know any confusing with code. Thanks a lot.
function checkImageIfExist ($imageLink) {
$imageLinkArray = array();
$curl_arr = array();
$mh = curl_multi_init();
foreach ($imageLink as $key => $value) {
$curl_arr[$key] = curl_init();
curl_setopt($curl_arr[$key], CURLOPT_URL, $value);
curl_setopt($curl_arr[$key], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $curl_arr[$key]);
do {
curl_multi_exec($mh, $running);
} while($running > 0);
$httpcode = curl_getinfo($curl_arr[$key], CURLINFO_HTTP_CODE);
if ($httpcode == 200)
$imageLinkArray[] = $value;
else
'';
}
print_r($imageLinkArray);
curl_multi_close($mh);
}
This is how I call the function.
checkImageIfExist($testArray);

Ignore "404" status pages with cURL

I´m using cURL to get informations out of an API and write them in a MySQL table.
The url to the API looks like that:
https://eu.api.blizzard.com/data/wow/item-class/4/item-subclass/1?namespace=static-eu&locale=de_DE&access_token=US6XqgbtQ6rh3EVIqPsejuF62RwP8ljzWn
You can change the number from the url part 1?namespace to another value, for example to 4?namespace to get other informations from the API.
I´m using php "range" to generate the numbers for the url.
Problem:
Some numbers in the url leads to a 404 response, since there are no informations in the API. Example URL:
https://eu.api.blizzard.com/data/wow/item-class/4/item-subclass/18?namespace=static-eu&locale=de_DE&access_token=US6XqgbtQ6rh3EVIqPsejuF62RwP8ljzWn
These "404" pages should get ignored and nothing should be written in MySQL. How is this possible with cURL?
Complete code:
$ids = [];
foreach(range(0, 20) as $number) {
$ids[] = $number;
}
$userAgent = 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0';
$mh = curl_multi_init();
$channels = [];
foreach ($ids as $id) {
$fetchURL = 'https://eu.api.blizzard.com/data/wow/item-class/4/item-subclass/' . $id . '?namespace=static-eu&locale=de_DE&access_token=US6XqgbtQ6rh3EVIqPsejuF62RwP8ljzWn';
$channels[$id] = curl_init($fetchURL);
curl_setopt($channels[$id], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($channels[$id], CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($channels[$id], CURLOPT_SSL_VERIFYPEER, 0);
curl_multi_add_handle($mh, $channels[$id]);
}
// execute all queries simultaneously, and continue when all are complete
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
//close the handles
foreach ($ids as $id) {
curl_multi_remove_handle($mh, $channels[$id]);
}
curl_multi_close($mh);
$response = [];
foreach($ids as $id){
$res = curl_multi_getcontent($channels[$id]);
$response[$id] = ($res === false) ? null : json_decode($res, true);
}
echo ("<pre>");
foreach ($response as $item) {
$sqle= "REPLACE INTO `itemsubclasses`
(`class_id`, `subclass`, `name`)
VALUES
('{$item['class_id']}', '{$item['subclass_id']}', '{$item['display_name']}')";
if ($conn->query($sqle) === TRUE) {
echo "Geklappt";
} else {
echo "Problem";
}
}

Nested loop with recursive function?

I need to do a recursive loop on every result suggested by google up to a user-defined depth and save results in a multidimensional array, explored later on.
I want to get this result.
google
google app
google app store
google app store games
google app store games free
google maps
google maps directions
google maps directions driving
google maps directions driving canada
...
Currently, my recursive function returns replicated results from the second nesting.
google
google app
google app
google app store
google app store
google app
google app store
google app store
google app store
...
I think the problem comes from the array (parent results) that I pass as an argument to my function recursive_function() to each nested loops.
$child = recursive_function($parent[0][1], $depth, $inc+1);
Recursive function
// keywords at line or spaced
$keywords = explode("\n", trim("facebook"));
$result = recursive_function($keywords, 2);
function recursive_function($query, $depth, $inc = 1)
{
$urls = preg_filter('/^/', 'http://suggestqueries.google.com/complete/search?client=firefox&q=', array_map('urlencode', $query));
$parent = curl_multi_function($urls);
array_multisort($parent[0][1]);
if (count($parent[0][1]) === 0 || $inc >= $depth)
{
$out[] = $parent[0][1];
}
else
{
$child = recursive_function($parent[0][1], $depth, $inc+1);
$out[] = $child;
}
return $out;
}
Function curl
function curl_multi_function($data, $options = array())
{
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d)
{
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curly[$id], CURLOPT_SSL_VERIFYPEER, 0);
// post?
if (is_array($d))
{
if (!empty($d['post']))
{
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
// extra options?
if (!empty($options)) {
curl_setopt_array($curly[$id], $options);
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do
{
curl_multi_exec($mh, $running);
}
while($running > 0);
// get content and remove handles
foreach($curly as $id => $c)
{
$result[$id] = curl_multi_getcontent($c);
// decode json result
$result[$id] = json_decode(utf8_encode($result[$id]));
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
Thank's
I've changed your recursive_function a little bit:
function recursive_function($query, $depth, $inc = 1)
{
$urls = preg_filter('/^/', 'http://suggestqueries.google.com/complete/search?client=firefox&q=', array_map('urlencode', $query));
$parent = curl_multi_function($urls);
foreach ($parent as $key => $value) {
array_multisort($value[1]);
$words = explode(' ', $value[0]);
$lastWord = end($words);
if (count($value[1]) === 0 || $inc >= $depth) {
$out[$lastWord] = [];
} else {
unset($value[1][0]);
$child = recursive_function($value[1], $depth, $inc+1);
$out[$lastWord] = $child;
}
}
return $out;
}
It generates an array like this:
[
google =>
[
app =>
[
store =>
[
games =>
[
free => []
]
]
]
...
]
]
Is that what you want?

Can't execute HTTP request to Youtube Data API from PHP

I'm writing a mass youtube link finder, which imports a list of titles from an array, generates an API url and the executes them with curl_multi.
However, curl returns blank data for each link. Links are fine as I can access to them correctly via Chrome.
file_get_contents() tried in another script with a url amongst those returns an ERR_EMPTY_RESPONSE in Chrome.
Any help would be much appreciated,
EDIT: Code:
function getYTUrl($urls){
$curls = array();
$result = array();
$arrjson = array();
$mh = curl_multi_init();
foreach ($urls as $key => $value) {
echo $value;
$curls[$key] = curl_init();
curl_setopt($curls[$key], CURLOPT_URL, $value);
curl_setopt($curls[$key], CURLOPT_HEADER, 0);
curl_setopt($curls[$key], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curls[$key],CURLOPT_SSL_VERIFYPEER,false);
curl_multi_add_handle($mh, $curls[$key]);
}
$active = null;
do{
$mrc = curl_multi_exec($mh, $active);
}
while ($active);
foreach ($urls as $key => $value) {
$result[$key] = curl_multi_getcontent($curls[$value]);
curl_multi_remove_handle($mh, $value);
}
curl_multi_close($mh);
}

Multithreading PHP Function

Currently when I execute this function with say 60 URL's I get a HTTP 504 error. Is there anyway to multithread this so that I no longer get a 504 error and iterate throughout the entire list of URL's?
<?php
namespace App\Http\Controllers;
use Request;
use App\Http\Controllers\Controller;
class MainController extends Controller
{
public function parse()
{
$input = Request::all();
$csv = $input['laraCsv'];
$new_csv = trim(preg_replace('/\s\s+/', ',', $csv));
$headerInfo = [];
//$titles = [];
$csvArray = str_getcsv($new_csv, ",");
$csvLength = count($csvArray);
$i = 0;
while ($i < $csvLength) {
if(strpos($csvArray[$i], '.pdf') !== false) {
print_r($csvArray[$i]);
}
else{
array_push($headerInfo, get_headers($csvArray[$i], 1));
}
//sleep(3);
//echo file_get_contents($csvArray[$i]);
$i++;
}
return view('csvViewer')->with('data', $headerInfo)->with('urls', $csvArray);
}
}
I've used digitalocean in the past before but I'm not sure what error codes they give if you run out of time, (also set_time_limit(0); should already be in your code).
See if this works:
<?php
function getHeaders($data) {
$curly = array();
$result = array();
$mh = curl_multi_init();
foreach ($data as $id => $url) {
$curly[$id] = curl_init();
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, true);
curl_setopt($curly[$id], CURLOPT_NOBODY, true);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $curly[$id]);
}
$running = null;
do {
curl_multi_exec($mh, $running);
} while ($running > 0);
foreach($curly as $id => $c) {
$result[$id] = array_filter(explode("\n", curl_multi_getcontent($c)));
curl_multi_remove_handle($mh, $c);
}
curl_multi_close($mh);
return $result;
}
$urls = array(
'http://google.com',
'http://yahoo.com',
'http://doesnotexistwillitplease.com'
);
$r = getHeaders($urls);
echo '<pre>';
print_r($r);
So once you've gotten all your URLs into an array, run it like getHeaders($urls);.
If it doesn't work try it only with 3 or 4 urls first. Also set_time_limit(0); at the top as mentioned before.
Are you sure it is because of your code ? it could also be the server configuration.
about HTTP 504
This problem is entirely due to slow IP communication between back-end
computers, possibly including the Web server. Only the people who set
up the network at the site which hosts the Web server can fix this
problem.

Categories