This is all of my code:
<html>
<body>
<form>
Playlist to Scrape: <input type="text" name="url" placeholder="Playlist URL">
<input type="submit">
</form>
<?php
if(isset($_GET['url'])){
$source = file_get_contents($_GET['url']);
$regex = '/<a href="(.*?)" class="gothere pl-button" title="/';
preg_match_all($regex,$source,$output);
echo "<textarea cols=100 rows=50>";
$fullUrl = array();
foreach($output[1] as $url){
array_push($fullUrl,"http://soundcloud.com".$url);
}
$final = implode(";",$fullUrl);
echo $final;
echo "</textarea>";
}else{
echo "borks";
}
?>
</body>
</html>
Yesterday, it worked fine.
What the code should do is:
Take a Soundcloud URL, extract the individual songs, and then print them like song1;song2;song3
Again, this worked fine yesterday, and I haven't changed anything since, I think...
I have tried to comment the other code out, and just keeping $source = file_get_contents($_GET['url']); and echoing $source, but it returned blank, which makes me think it is a problem with file_get_contents.
If you have any idea on why this is happening, I would appreciate hearing it. Thanks!
What might have happened is that a new SSL certificate was installed on the server that file_get_contents is trying to access. In our case, the target server had a new SSL certificate installed on its domain from another vendor and another wild-card domain.
Changing our config a little bit fixed the problem.
$opts = array(
'http' => array(
'method' => "GET",
'header' => "Content-Type: application/json\r\n".
"Accept: application/json\r\n",
'ignore_errors' => true
),
// VVVVV The extra config that fixed it
'ssl' => array(
'verify_peer' => false,
'verify_peer_name' => false,
)
// ^^^^^
);
$context = stream_context_create($opts);
$result = file_get_contents(THE_URL_WITH_A_CHANGED_CERTIFICATE, false, $context);
I found this solution thanks to this answer. It even was downvoted.
This certainly explained the fact that file_get_contents suddenly stops working.
Your question doesn't have enough information for someone to help you.
To start with though, I would
Check that the script is receiving the URL get parameter correctly (var_dump($_GET['url']))
Check what PHP fetches from the URL (var_dump(file_get_contents($_GET['url']));
My guess is either your server admin turned off FOPEN URL wrappers, or the owner of the site you're scraping decided they didn't want you scraping their site, and are blocking requests from your PHP scripts.
It also helps to turn error reporting all the way up, and set display errors to 1
error_reporting(E_ALL);
ini_set('display_errors', 1);
Although if you've been developing without this, chances are there's lots of working-but-warning-worthy code in your application.
Good luck.
In my case (I was also frequently downloading one page but not soundcloud) it was because of F5 “bobcmn” Javascript detection at server.
When I wrote into my php script somethinkg like var_dump($source); - to see what server sent - then I saw that response starts with this code: window[“bobcmn”] = ...
More here:
https://blog.dotnetframework.org/2017/10/10/understanding-f5-bobcmn-javascript-detection/
Related
I am running an IIS 8 / PHP web server and am attempting to write a so-called 'proxy script' as a means of fetching HTTP content and loading it onto an HTTPS page.
Although the script does run successfully (outputting whatever the HTTP page sends) in some cases - for example, Google.com, Amazon.com, etc. - it does not work in fetching my own website and a few others.
Here is the code of proxy.php:
<?php
$url = $_GET['url'];
echo "FETCHING URL<br/>"; // displays this no matter what URL I enter
$ctx_array = array('http' =>
array(
'method' => 'GET',
'timeout' => 10,
)
);
$ctx = stream_context_create($ctx_array);
$output = file_get_contents($url, false, $output); // times out for certain requests
echo $output;
When I set $_GET['url'] to http://www.ucomc.net, the script fails. With most other URLs, it works fine.
I have checked other answers on here and other places but none of them describe my issue, nor do the solutions offered solve it.
I've seen some suggestions to similar problems that involve changing the user agent, but when I do this it not only does not solve the existing problem but prevents other sites from loading as well. I do not want to rely on third-party proxies (don't trust the free ones/want to deal with their query limit and don't want to pay for the expensive ones)
Turns out that it was just a problem with the firewall. Testing it on a PHP sandbox worked fine, so I just had to modify the outgoing connections settings in the server firewall to allow the request through.
<?php
$incfile = $_REQUEST["file"];
include($incfile);
?>
upload.php file:
<?php
$context = array(
'http' => array(
'proxy' => "tcp://proxy.example.com:80",
'request_fulluri' => true,
'verify_peer' => false,
'verify_peer_name' => false,
)
);
stream_context_set_default($context);
?>
proxy.php file:
auto_prepend_file=proxy.php
allow_url_include=1
php.ini:
I browse to http://testexample.com/upload.php?file=http://example.com/file.php but http://example.com/file.php times out with error Warning: include(): failed to open stream: Connection timed out. I played with echo file_get_contents and used the URL path and that works fine as it appears to honor the proxy settings. So does anyone know what the issue might be with using include or why it does not use my proxy settings?
Edit: As a workaround I used this code below:
<?php
$incfile = $_REQUEST["file"];
$filecontent = file_get_contents($incfile);
eval($filecontent);
?>
The problem with this though is that it reads in the PHP as a string and not the whole file. So I have to remove the PHP beginning and ending tags which changes the GET request body so effects my results. So even though it kinds works, the include function is really what I need.
So you need your http requests for example.com to go to go through proxy.example.com. Would it suffice to simply override DNS for example.com to point to proxy.example.com - perhaps in the hosts file - on this development server? Then you could
include 'http://example.com/file.php';
If you want to limit the solution to PHP, you could define a custom stream wrapper for your proxy.
http://php.net/manual/en/function.stream-wrapper-register.php
http://php.net/manual/en/stream.streamwrapper.example-1.php
Im trying to scrape information from the site http://steamstat.us - The thing i want to get is the status and such from the site.
Im currently only using this code:
<?php
$homepage = file_get_contents('http://www.steamstat.us/');
echo $homepage;
?>
The problem I have here is that "Normal (16h)" and the rest just returns 3 dots.
Cant figure what the problem should be.
Anyone have any clue?
EDIT
This is now fixed.
I solved the problem as followed:
<?php
$opts = array('http' => array('header' => "User-Agent:MyAgent/1.0\r\n"));
$context = stream_context_create($opts);
$json_url = file_get_contents('https://crowbar.steamdb.info/Barney', FALSE, $context);
$data = json_decode($json_url);
?>
Its a https protocol which is not easy to scrap. Thought the website allows it as the headers sent for "access-control-allow-origin" are marked as * which means the content can be requested by any other site.
You are not receiving the content becasue Normal (16h) is not yet populated on page load. Its coming from ajax.
The HTML source says <span class="status" id="repo">…</span>. You are receiving these three dots inside span tag in file_get_contents.
The only way to do it is to look for the ajax call in the network log and then use file_get_contents in that URL called by ajax.
I was running my WebServer for months with the same Algorithm where I got the content of a URL by using this line of code:
$response = file_get_contents('http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),'','&'));
But now something must have changed as out of sudden it stopped working.
In earlier days the URL looked like it should have been:
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
but now I get an error in my nginx log saying that I requested the following URL which returned a 403
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
I know that something changed on the target server, but I think that shouldn't affect me or not?!
I already spent hours and hours of reading and searching through Google and Stackoverflow, but all the suggested ways as
urlencode() or
htmlspecialchars() etc...
didn't work for me.
For your information, the environment is a zend application with a nginx server on my end and a php webservice with apache on the other end.
Like I said, it changed without any change on my side!
Thanks
Let's find out the culprit!
1) Is it http_build_query ? Try replacing:
'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw)
with:
"http://femoso.de:8019/api/2/getVendorLogin?vendor={$vendor}&user={$login}&pw={$pw}"
2) Is some kind of post-processing in the place? Try replacing '&' with chr(38)
3) Maybe give a try and play a little bit with cURL?
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => 'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true, // include response header in result
//CURLOPT_FOLLOWLOCATION => true, // uncomment to follow redirects
CURLINFO_HEADER_OUT => true, // track request header, see var_dump below
));
$data = curl_exec($ch);
curl_close($ch);
var_dump($data, curl_getinfo($ch, CURLINFO_HEADER_OUT));
exit;
Sounds like your arg_separator.output is set to "&" in your php.ini. Either comment that line out or change to just "&"
I'm no expert but that's the way the computer reads the address since it's a special character. Something with encoding. Simple fix would be to to filter by utilizing str_replace(). Something along those lines.
I would like to get the resulting web page of a specific form submit. This form is using POST so my current goal is to be able to send POST data to an url, and to get the HTML content of the result in a variable.
My problem is that i cannot use cUrl (not enabled), that's why i ask for your knowledge to know if an other solution is possible.
Thanks in advance
See this, using fsockopen:
http://www.jonasjohn.de/snippets/php/post-request.htm
Fsockopen is in php standard library, so all php fron version 4 has it :)
try file_get_contents() and stream
$opts = array( 'http'=>array('method'=>"POST", 'content' => http_build_query(array('status' => $message)),));
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);