check if file exist on remote website without knowing extension

check if file exist on remote website without knowing extension - php

I need to find a file name _template on more than one remote server.
The file will have one of two extensions:
_template.htm
or
_template.php
I can find a file using cURL on a remote server which is easy but cant wrap my head around how to find one or the other.
The code is in a loop and it will search on about 20 different sites for this file, if it finds it just needs to say: Found it if it doesn't find any of the two files it needs to say: None Found.
Doing cURL twice takes too long to load.
Searching through the ftp array also takes too long.
My current code below that doesnt work, in know why it doesnt work thanks guys i would like to know how i can make it work: (This is in a loop)
$template_file = glob("/_template.{htm,php}", GLOB_BRACE);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $client_link.$template_file);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
$data = curl_exec($ch);
curl_close($ch);
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if(!$data)
{
echo "No Files Found<br>";
}
else
{
if($code == 200)
{
$filefound = 'Found'.$template_file;
}
elseif($code == 404)
{
$filefound = '404 File not Found';
}
}

The problem is with the glob function - that function matches paths on the local filesystem. Add a print_r($template_file);' after theglob` and you'll see that it doesn't match anything (or it only matches things on your local system).
What you need to do is build the URLs one-by-one:
foreach(array("htm", "php") as $ext)
{
// build the URL string
$url = "http://example.com/_template." . $ext;
// now do whatever you need to with the URL...
}

Related

PHP - Get the prices displayed by google

I would like to recover all price from Google search in a PHP file.
Exemple of price search : https://www.google.com/search?ei=QBN1XIfYDrG5gwfmq6bwDg&q=860+evo+500go&oq=860+evo+500go&gs_l=psy-ab.3..0j0i10j0i22i10i30j0i22i30l3.5044.6363..6572...0.0..0.59.347.6......0....1..gws-wiz.......0i71j0i20i263j0i67j0i203.HYjd3deC288
file_get_contents doesn't work and i've to use CURL like in this topic :
PHP file_get_contents error 503
Now i dunno how to create next of script.
I guess I have to create a loop and use the function preg_match to keep only what i need.
It is right ? Can i've any exemple ?
Here is the beginning of my script :
$url = "https://www.google.com/search?ei=QBN1XIfYDrG5gwfmq6bwDg&q=860+evo+500go&oq=860+evo+500go&gs_l=psy-ab.3..0j0i10j0i22i10i30j0i22i30l3.5044.6363..6572...0.0..0.59.347.6......0....1..gws-wiz.......0i71j0i20i263j0i67j0i203.HYjd3deC288";
function curl_get_file_contents($URL) {
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
foreach ($variable as $key => $value) {
echo $result;
}
}

This will not work reliably. Google search is quite sensitive to scraping and quickly starts to respond with request to verify captcha when it suspects automated access.
I would recommend to look into a better source for information you need, otherwise you just risk burning time on writing code that won't run because you can't consistently get your data in first place.

Finding a specific GitLab tag from PHP

We have a JIRA instance that our custom PHP app built in Laravel pulls from and for each issue looks to see if a specific branch or tag exists:
chdir($path . $repo);
exec("git rev-parse --verify ".$branch, $branch_dump, $return_var);
if ($return_var == 0) {
return true;
} else {
return false;
}
However, we have migrated all of our git projects to GitLab and that method no longer works, since you need root to get into GitLab's repo data directory.
We looked at GitLab's API and found that we could do this:
http://gitlab/api/v3/projects/10/repository/commits/OUR-TAG-HERE?private_token=XXX
However this requires us to specify an arbitrary GitLab project ID (10 in this case) and therefore isn't predictable, so we can't programmatically execute the search for each JIRA API return like we did before. This method would work if we could simply search for tags using the project name only, but I can't find a way to do that.
Here's an overlook at the way the app works:
JIRA contains all issues we want
Each issue contains several custom fields we use to search our git repos with, generically they are "Repo Name" and "Tag Name"
Our Laravel app connects to JIRA's api and harvests all issues into an array we use to build a table listing information about each issue
The two custom fields "Repo Name" and "Tag Name" are matched against our git repositories to determine which of several options to provide the end user (clone tag, create tag if repo exists but no tag exists, or none if neither)
We briefly considered adding another custom field to our JIRA issues which we would fill with GitLab's project ID, but we have hundreds of issues and it is an inelegant solution that really only acts as another potential point of failure, to say nothing of the extra maintenance.
Any ideas?

The best solution I found to this issue was to use the API to get the list of projects and use that list to pair name and ID.
For example, this code will output the tag names for all your projects:
//Get Projects list via API
$header = array("PRIVATE-TOKEN: <YOUR_TOKEN>");
$ch = curl_init("https://<YOUR_GITLAB_DOMAIN>/api/v3/projects/");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
//Parse returned list to an array
$projectsArray= json_decode($result, true);
//Loop over the array of projects accessing the list of tags via the API
foreach ($projectsArray as $project) {
echo $project["name"] . " Tags:<br>";
$tagURL = "https://<YOUR_GITLAB_DOMAIN>/api/v3/projects/" . $project["id"] . "/repository/tags";
$ch2 = curl_init($tagURL);
curl_setopt($ch2, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch2, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
$result2 = curl_exec($ch2);
curl_close($ch2);
$tagsArray= json_decode($result2, true);
foreach ($tagsArray as $tag) {echo $tag["name"] . "<br>";}
echo "<br>";
}

Since arbitrary project IDs are still required by the GitLab API for this functionality we've scrapped the API altogether. Instead we're now simply cURLing HTTP response codes. Here's one of our methods to see if the issue has a tag:
public function HasTag($projectName, $nameSpace, $tagName)
{
$url=$this->gitLabUrl.'/'.$nameSpace.'/'.$projectName.'/tags/'.$tagName;
$ch = curl_init(); // Initiate curl
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Disable SSL verification
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Will return the response, if false print the response
curl_setopt($ch, CURLOPT_URL, $url); // Set the url
curl_exec($ch); // Execute
$info = curl_getinfo($ch);
curl_close($ch);
if ($info['http_code'] == 200) {
return true;
} else {
return false;
}
}
And here's our method to check for a branch:
public function HasBranch($projectName, $nameSpace, $branchName)
{
$url=$this->gitLabUrl.'/'.$nameSpace.'/'.$projectName.'/tree/'.$branchName;
$ch = curl_init(); // Initiate curl
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Disable SSL verification
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Will return the response, if false print the response
curl_setopt($ch, CURLOPT_URL, $url); // Set the url
curl_exec($ch); // Execute
$info = curl_getinfo($ch);
curl_close($ch);
if ($info['http_code'] == 200) {
return true;
} else {
return false;
}
}
As you can see this is pretty simple and hacky, but it works for our implementation because none of the projects being accessed are private (our GitLab instance is purely internal).
Hopefully in the future GitLab will remove the ID requirement from its API.

Collecting file with PHP CURL after validating request downloads an empty file

I am doing a system where one of my sites goes to the other to get documents.
On the first site I am using Curl to make a request to get the file wanted:
I am using the solution from Download file from URL using CURL :
function collect_file($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://example.com");
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
return($result);
}
function write_to_file($text,$new_filename){
$fp = fopen($new_filename, 'w');
fwrite($fp, $text);
fclose($fp);
}
$curlUrl = 'http://site2.com/file-depository/14R4NP8JkoIHwIyjnexSUmyJibdpHs5ZpFs3NLFCxcs54kNhHj';
$new_file_name = "testfile-new.png";
$temp_file_contents = collect_file($curlUrl);
write_to_file($temp_file_contents,$new_file_name);
I am testing downloading an image. If i use a direct URL into $curlUrl , for instance http://site2.com/file-depository/image.png it works perfect.
What I am doing is that the URL http://site2.com/file-depository/14R4NP8JkoIHwIyjnexSUmyJibdpHs5ZpFs3NLFCxcs54kNhHj is then parsed and checked against a database to match the document requested, once there is a document matched I need to provide this document to the Curl response.
I have tried many ways to read the file but everytime i am getting a file on the other end but it is only 1kb in size (45 expected) and when trying to open it i get an error unkown file type etc.
On the second site, once the URL is validated here is what I have:
$file = readfile('some-image.png');
echo $file;
I am guessing there is part of the information which belongs to the file missing but can't figure it out, any pointers appreciated!

I have replaced
function write_to_file($text,$new_filename){
$fp = fopen($new_filename, 'w');
fwrite($fp, $text);
fclose($fp);
}
by file_put_contents($new_file_name,trim($temp_file_contents));
Please note the trim(), the issue was that I was apparently collecting some empty space in front of the file content.

Save copy of a photo from Google Places API Place Photos using curl

I'm trying to grab a photo from Google Place Photos using curl and save it on my server.
The request format as per the Google API documentation is like this:
https://maps.googleapis.com/maps/api/place/photo?maxwidth=400&photoreference=CoQBegAAAFg5U0y-iQEtUVMfqw4KpXYe60QwJC-wl59NZlcaxSQZNgAhGrjmUKD2NkXatfQF1QRap-PQCx3kMfsKQCcxtkZqQ&sensor=true&key=AddYourOwnKeyHere
So I tried this function:
function download_image1($image_url, $image_file){
$fp = fopen ($image_file, 'w+');
$ch = curl_init($image_url);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // enable if you want
curl_setopt($ch, CURLOPT_FILE, $fp); // output to file
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000); // some large value to allow curl to run for a long time
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
// curl_setopt($ch, CURLOPT_VERBOSE, true); // Enable this line to see debug prints
curl_exec($ch);
curl_close($ch); // closing curl handle
fclose($fp); // closing file handle
}
download_image1($photo, "test.jpg");
..where $photo holds the request url.
This is not working, it saves an empty image with header errors, it probably is because the request is not the actual url of the photo. Also, in the request url, it's not possible to know which image extension I'm going to get (jpg, png, gif, etc) so that's another problem.
Any help on how to save the photos appreciated.
EDIT: I get the header errors "Can't read file header" in my image viewer software when I try to open the image. The script itself doesn't show any errors.

I found a solution here:
http://kyleyu.com/?q=node/356
It gives a very useful function to return the actual URL after redirection:
function get_furl($url)
{
$furl = false;
// First check response headers
$headers = get_headers($url);
// Test for 301 or 302
if(preg_match('/^HTTP\/\d\.\d\s+(301|302)/',$headers[0]))
{
foreach($headers as $value)
{
if(substr(strtolower($value), 0, 9) == "location:")
{
$furl = trim(substr($value, 9, strlen($value)));
}
}
}
// Set final URL
$furl = ($furl) ? $furl : $url;
return $furl;
}
So you pass the Google Place Photo request uRL to this function and it returns the actual URL of the photo after the redirection which then can be used with CURL. It also explains that sometimes, the curl option curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); doesn't always work.

https://stackoverflow.com/a/23540352/2979237
We can minimize the above code to change as adding 1 as the second parameter of get_header() function like $headers = get_headers($url, 1);. that will return the associate values.

cURL returns empty output from valid url - no errors reported

If you just enter the urls into the browser you can see that both work, cdon works even without javascript, have they blocked cURL somehow?
I'm trying to build a scraper to benifit legal movies online which would benifit them a whole lot, seems stupid blocking scrapers in general imho. Although I'm far from sure that's whats going on here! Might be just an error somewhere..
// Works
get_file1('http://sfanytime.com/sv-SE/Sokresultat/?field=all&q=The+Matrix', '/', 'sfanytime.html');
// Saves a blank 0 KB file
get_file1('http://downloads.cdon.com/index.phtml?action=search&search_terms=The+Matrix', '/', 'cdon.html');
function get_file1($file, $local_path, $newfilename) {
$out = fopen($newfilename, 'wb');
if ($out === FALSE) {
return false;
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, $file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$error = curl_error($ch);
if (strlen($error) > 0) {
echo "<br>Error is : ". $error;
return false;
}
curl_close($ch);
return true;
}

You should change the line
curl_setopt($ch, CURLOPT_FAILONERROR, true);
...to...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
CURLOPT_FAILONERROR will cause a "silent fail" - which from what you say, is not what you want. I have replaced this with CURLOPT_FOLLOWLOCATION, because when I visit the second URL, I get redirected to a "choose your country" type page, which will be a response with an empty body - which is why you get an empty file.
There is no problem with your code as such, simply a problem with the way you handle the response from the second URL. You don't see an error because, technically, there wasn't one.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

check if file exist on remote website without knowing extension - php

Related

PHP - Get the prices displayed by google

Finding a specific GitLab tag from PHP

Collecting file with PHP CURL after validating request downloads an empty file

Save copy of a photo from Google Places API Place Photos using curl

cURL returns empty output from valid url - no errors reported

Categories

Resources