So I found a solution to a problem I was having, (how to determine a domain protocol) How to find the domain is whether HTTP or HTTPS (with or without WWW) using PHP?
Below are two versions of my code;
The first doesn't work as expected it only echoes out my domains.
<?php
$url_list = file('urls.txt');
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
}
?>
The Second version of my code gives me an error of how I am supplied my foreach statement...
<?php
$fn = fopen("urls.txt","r");
while(! feof($fn)) {
$url_list = fgets($fn);
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
#echo $result;
fclose($fn);
}
}
?>
What can I be doing wrong??
Expected results is this, but when reading the domains from a file;
code: reference --> How to find the domain is whether HTTP or HTTPS (with or without WWW) using PHP?
<?php
$url_list = ['facebook.com','google.com'];
foreach($url_list as $url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;//add here your db commands
}
?>
Output:
test#linux: php domain-fuzzer.php
https://www.facebook.com/http://www.google.com/#
I found the solution, just incase anyone would find such an issue here is the code:
<?php
$fn = file_get_contents("urls.txt");
$url_list = explode(PHP_EOL, $fn);
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo nl2br("$real_url \n");
}
?>
By default fgets returns a single current line from the file not an array of all lines of the file.
So, your code will be:
<?php
$fn = fopen("urls.txt","r");
while(! feof($fn)) {
$url = fgets($fn);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
#echo $result;
fclose($fn);
}
?>
Related
I'm trying to get all the urls of the shows that are listed https://api.tvmaze.com/schedule?country=US&date=2021-03-24 here but im getting just a blank page and I don't have any ideea why.
function request($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$api_url = 'https://api.tvmaze.com/schedule?country=US&date=2021-03-23';
$data = request($api_url);
$final_data = json_decode($data, TRUE);
for ($x = 0; $x < count($final_data); $x++)
{
echo $finaldata[$x]['show']['url']."\n";
}
cannot download this link using curl php
https://www.economy.gov.ae/PublicationsArabic/2%20%D9%86%D8%B4%D8%B1%D8%A9%20%D8%A7%D9%84%D8%B9%D9%84%D8%A7%D9%85%D8%A7%D8%AA%20%D8%A7%D9%84%D8%AA%D8%AC%D8%A7%D8%B1%D9%8A%D8%A9%20%D8%A7%D9%84%D8%B9%D8%AF%D8%AF%20199-%20%D8%A7%D9%84%D9%86%D8%B4%D8%B1%20%D8%B9%D9%86%20%D8%A7%D9%84%D8%B9%D9%84%D8%A7%D9%85%D8%A7%D8%AA%20%D8%A7%D9%84%D9%85%D9%82%D8%A8%D9%88%D9%84%D8%A9.pdf
tried basic curl didn't work, wget don't work too
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
echo $output=curl_exec($ch);
curl_close($ch);
empty pdf or 189 byte
just website use post request before that get request and you need to forge it before as it has a certain logic on app, now it work right
Try urldecode before using curl:
$ch = curl_init();
$url = urldecode($url);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
echo $output=curl_exec($ch);
curl_close($ch);
I am trying to recover the content of a page with PHP cURL. It works well on other websites, but on this website it does not work and I don't why.
Here is my code :
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://ratings.fide.com/top.phtml');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($curl);
curl_close($curl);
echo $result;
little problem with php DOM.
<?php
$url = "http://www.dogpile.com/search/images?q=" . rawurlencode($_GET['q']);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$output = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
$dom->loadHTML($output);
print_r($dom->getElementById('imageResults'));
?>
I'm trying to get thumbnail and full image url from the results. But I cannot grab required info...
http://simplehtmldom.sourceforge.net/manual.html
Download library read manual and use it!
I am trying to use php cURL to fetch amazon web page but get
HTTP/1.1 503 Service Temporarily Unavailable instead. Is Amazon blocking cURL?
http://www.amazon.com/gp/offer-listing/B003B7Q5YY/
<?php
function get_html_content($url) {
// fake user agent
$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_COOKIEFILE,'cookies.txt');
curl_setopt($ch,CURLOPT_COOKIEJAR,'cookies.txt');
$string = curl_exec($ch);
curl_close($ch);
return $string;
}
echo get_html_content("http://www.amazon.com/gp/offer-listing/B003B7Q5YY");
?>
I use simple
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $offers_page);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$html = curl_exec($ch);
curl_close($ch);
but i have another problem. if you send a lot of queries to amazon - they start send 500 page to you.