I'm trying to scrape
newark.com
I have written code, which I have run locally to test it, and it works amazingly!
<?php
$link = 'https://www.newark.com/';
$proxy = ['server' => '172.93.142.42:3128'];
$user_agents = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Mozilla/5.0 (Linux; Android 8.0.0; H3113 Build/50.1.A.10.40; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/185.0.0.39.72;]', 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E302 [FBAN/FBIOS;FBAV/166.0.0.53.95;FBBV/101310068;FBDV/iPhone7,2;FBMD/iPhone;FBSN/iOS;FBSV/11.3.1;FBSS/2;FBCR/vodafoneP;FBID/phone;FBLC/en_GB;FBOP/5;FBRV/102694127]', 'Mozilla/5.0 (Linux; Android 7.0; Studio Mega Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36 OPR/46.3.2246.127744', 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Mobile/15D100 [FBAN/FBIOS;FBAV/168.0.0.57.90;FBBV/103647182;FBDV/iPhone9,3;FBMD/iPhone;FBSN/iOS;FBSV/11.2.6;FBSS/2;FBCR/MEO;FBID/phone;FBLC/pt_PT;FBOP/5;FBRV/104934021]'];
$user_agent = $user_agents[array_rand($user_agents)];
//$user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36';
$curl_handler = curl_init();
curl_setopt_array($curl_handler, array(
CURLOPT_URL => $link,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_PROXY => $proxy['server'],
));
$result = curl_exec($curl_handler);
curl_close($curl_handler);
$result = mb_convert_encoding($result, 'UTF-8');
header('Content-type: text/html; charset=utf-8');
echo($result);
However, when I run this code inside of my US servers it does not work.
script execution takes time and nothing happens, nothing appears
But when I change the URL, I put
www.google.com
This script is also working on my servers. I've added proxies to my code but it didn't help with the URL that I need.
I guess it is related to the URL I need, any help?
I am getting some json response at a frequent interval, I am storing the json response in a file. I want to decode it
[{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"processed","category":"cat facts","sg_event_id":"QdphcK0Jre4Q7L9Huwm_ug==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"deferred","category":"cat facts","sg_event_id":"vbfCZfSBz32ySl7j5nSayw==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","response":"400 try again later","attempt":"5"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"delivered","category":"cat facts","sg_event_id":"tmFRu_j-NWZ4fZU4zRhDYg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","response":"250 OK"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"open","category":"cat facts","sg_event_id":"LlbAt3ZNgC3yUoTt0ImdXg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"click","category":"cat facts","sg_event_id":"7seySmsaB5gncIjv4dmfGg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"bounce","category":"cat facts","sg_event_id":"xOeVKsSD2pcarTPw6r6q5g==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","reason":"500 unknown recipient","status":"5.0.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"dropped","category":"cat facts","sg_event_id":"-mk0ZOl1WgDTEteRC2olOw==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","reason":"Bounced Address","status":"5.0.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"spamreport","category":"cat facts","sg_event_id":"x-S6eSyCAzeEZoTrJvf0rg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"unsubscribe","category":"cat facts","sg_event_id":"RQ9N3MW13w8AWeGU_fwD7Q==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"group_unsubscribe","category":"cat facts","sg_event_id":"jIDVkh2-1yXXIJBOK-lVmg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/","asm_group_id":10},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"group_resubscribe","category":"cat facts","sg_event_id":"BtasI8e0rTH1GyCQHYX-Ag==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/","asm_group_id":10}]
[{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"processed","category":"cat facts","sg_event_id":"QdphcK0Jre4Q7L9Huwm_ug==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"deferred","category":"cat facts","sg_event_id":"vbfCZfSBz32ySl7j5nSayw==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","response":"400 try again later","attempt":"5"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"delivered","category":"cat facts","sg_event_id":"tmFRu_j-NWZ4fZU4zRhDYg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","response":"250 OK"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"open","category":"cat facts","sg_event_id":"LlbAt3ZNgC3yUoTt0ImdXg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"click","category":"cat facts","sg_event_id":"7seySmsaB5gncIjv4dmfGg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"bounce","category":"cat facts","sg_event_id":"xOeVKsSD2pcarTPw6r6q5g==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","reason":"500 unknown recipient","status":"5.0.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"dropped","category":"cat facts","sg_event_id":"-mk0ZOl1WgDTEteRC2olOw==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","reason":"Bounced Address","status":"5.0.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"spamreport","category":"cat facts","sg_event_id":"x-S6eSyCAzeEZoTrJvf0rg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"unsubscribe","category":"cat facts","sg_event_id":"RQ9N3MW13w8AWeGU_fwD7Q==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0"},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"group_unsubscribe","category":"cat facts","sg_event_id":"jIDVkh2-1yXXIJBOK-lVmg==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/","asm_group_id":10},{"email":"example#test.com","timestamp":1460352083,"smtp-id":"\u003c14c5d75ce93.dfd.64b469#ismtpd-555\u003e","event":"group_resubscribe","category":"cat facts","sg_event_id":"BtasI8e0rTH1GyCQHYX-Ag==","sg_message_id":"14c5d75ce93.dfd.64b469.filter0001.16648.5515E0B88.0","useragent":"Mozilla/4.0 (compatible; MSIE 6.1; Windows XP; .NET CLR 1.1.4322; .NET CLR 2.0.50727)","ip":"255.255.255.255","url":"http://www.sendgrid.com/","asm_group_id":10}]
This is the code I used for decoding
function b() {
$string = file_get_contents("/home/linux/Public/test/a.json");
$json_a = json_decode($string, true);
print_r($json_a);
foreach ($json_a as $person_name) {
print_r($person_name);
}
}
It prints Invalid argument supplied for foreach() as error.
I removed all response used only one response, then it works fine.
If you use newlines to separate each json, you could do
foreach (file($file_name) as $json_plain) {
print_r(json_decode($json_plain, true));
}
file() could split file into an array of lines, so you can decode each line.
<?php
$file_name = 'a.json';
foreach (file($file_name) as $json_plain) {
print_r(json_decode($json_plain, true));
}
?>
You can use this.
I have this code
<?php
$ua = array(
"Mozilla/5.0 (compatible; MSIE 9.0; AOL 9.7; AOLBuild 4343.19; Windows NT 6.1; WOW64; Trident/5.0; FunWebProducts)",
"Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; XH; rv:8.578.498) fr, Gecko/20121021 Camino/8.723+ (Firefox compatible)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1",
"Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (X11; U; Linux i686; fr-fr) AppleWebKit/525.1+ (KHTML, like Gecko, Safari/525.1+) midori/1.19",
"Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16",
"Mozilla/5.0 (Linux; U; Android 4.0.3; de-ch; HTC Sensation Build/IML74K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30");
$uar = array_rand($ua);
$url = "sometestserverisetup";
$ip = '127.0.0.1';
$port = '9051';
$auth = 'mypwwhateveritis';
$command = 'signal NEWNYM';
$fp = fsockopen($ip,$port,$error_number,$err_string,10);
if(!$fp) { echo "ERROR: $error_number : $err_string";
return false;
} else {
fwrite($fp,"AUTHENTICATE \"".$auth."\"\n");
$received = fread($fp,512);
fwrite($fp,$command."\n");
$received = fread($fp,512);
}
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_PROXY, "127.0.0.1:9050");
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch,CURLOPT_USERAGENT,$ua[$uar]);
$response = curl_exec($ch);
echo $response;
?>
everything works fine. With my test site and it displays correctly. However certain sites (google.com, amazon.com, youtube, facebook. only display a blank page for echo response.
Is there some curl set opt that needs to be enabled for pages to display properly.
Looking at a var_dump(curl_getinfo($ch)); after calling curl_exec can be helpful.
I tested your code and found in some cases the sites send a 302 Moved response with a Location header to redirect the browser which would result in an empty response on a successful request.
Adding
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
made it so that every site you mentioned always returned a response in my tests. And depending on what you are doing (searches, logins, form submissions) you will probably find redirects are common so you need to tell cURL to follow them with that option.
Beyond that, you can set CURLOPT_HEADER to true so you can look at the response headers sent to see what's going on in addition to curl_getinfo to make sure the connection was successful (either through Tor or to the site).
With this code can return the online users only on localhost. When sending to webhost crashes, even seeking the token code example in the second call. It only works with the second parameter tokem generated on site page source code. How to run this script from the webhost?
if (!function_exists('getHistats')) {
function getHistats($sid = 0, $cc = '') {
if (empty($sid) || empty($cc))
return 'error';
$url = 'http://www.histats.com/viewstats/HST_GET_SUMMARY.php';
$result = '';
$ualist = array(
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)',
'Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.23 (KHTML, like Gecko) Ubuntu/10.04 Chromium/11.0.688.0 Chrome/11.0.688.0 Safari/534.23',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.04 (lucid) Firefox/3.6.14 GTB7.1',
'Opera/9.80 (X11; Linux i686; U; en) Presto/2.7.62 Version/11.01',
'Midori/0.2.2 (X11; Linux i686; U; en-us) WebKit/531.2+',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1150.1 Iron/20.0.1150.1 Safari/536.11'
);
if (function_exists('curl_init')) {
$http_headers = array();
$http_headers[] = 'Host: www.histats.com';
$http_headers[] = 'Referer: www.histats.com/viewstats/?sid='. $sid .'&act=2&f=1';
$http_headers[] = 'X-Requested-With: XMLHttpRequest';
$opts = array();
$opts[CURLOPT_URL] = $url;
$opts[CURLOPT_HTTPHEADER] = $http_headers;
$opts[CURLOPT_CONNECTTIMEOUT] = 5;
$opts[CURLOPT_TIMEOUT] = 10;
$opts[CURLOPT_USERAGENT] = $ualist[rand(0, count($ualist) - 1)];
$opts[CURLOPT_HEADER] = FALSE;
$opts[CURLOPT_RETURNTRANSFER] = TRUE;
$opts[CURLOPT_POST] = 1;
$opts[CURLOPT_POSTFIELDS] = 'AR_REQ[sid]='. $sid .'&AR_REQ[CC]='. $cc .'&dbg=1';
# Initialize PHP/CURL handle
$ch = curl_init();
curl_setopt_array($ch, $opts);
# Create return array
$result = curl_exec($ch);
curl_close($ch);
} elseif (ini_get('allow_url_fopen')) {
$result = file_get_contents($url);
}
if (empty($result) || ($result == 'error=11') || ($result == 'err:1'))
return 'error';
$obj = json_decode($result);
return isset($obj->livearray->livesummary->cur_online) ? $obj->livearray->livesummary->cur_online : 0;
}
}
$html = file_get_contents('http://histats.com/viewstats/?sid=3041076&act=2&f=1');
preg_match("/OBJ_summary.sockTOKEN = '(.*?)'/i", $html, $match);
echo 'Online: '. getHistats('3041076', 'bjh1NStBTVZyMFJzRENTODFHTHNQamJyV0FvY2l4TGRNSk5FczQyYnR3dERlaUhWczJZNUtWQk5lU2p6STlyRTZCQXZUd2t6MWJzS3Z2cWs2d1g4aXc9PQ==');
echo '<br />';
echo 'Token: '. $match[1];
echo '<br />';
echo 'Online: '. getHistats('3041076', $match[1]);
Im having problems reading contents from a specific URL. This simple script
<?php
$str = #file_get_contents("http://neginmirsalehi.com");
echo $str;
?>
Only outputs: "Error in exception handler."
I tryed with curl also, but same problem!
Is it some kind of protection on that site? But strange error.
No, cURL does work, just setup the browser agent option:
$ch = curl_init('http://neginmirsalehi.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$result = curl_exec($ch);
echo $result;
Sample Output
Also, file_get_contents with additional stream context with an agent also works as well:
$options = array('http' => array('user_agent' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)'));
$context = stream_context_create($options);
$response = file_get_contents('http://neginmirsalehi.com', false, $context);
echo $response;
Do you need indicate file name,not just url
<?php
$str = #file_get_contents("/dir/dir/file_name.extension");
echo $str;
?>
$u = 'http://neginmirsalehi.com';
$c = curl_init($u);
curl_setopt($c, CURLOPT_USERAGENT, 'moz');
$r = curl_exec($c);
print_r($r);
works fine