I see search results in SO of get_browser resulting Default Browser for example Chrome Users, I on the other hand am not sure of what the users are using I think Default Browser results in my case are robots/crawlers/etc.
I'm using the full_php_browscap.ini version and it gets a Default Browser value for the browser. What is Default Browser? And when does get_browser return Default Browser?
I think 'Default Browser' is returned when the user-agent is unknown. Either the agent is missing from browscap or maybe browscap can't be found.
You can use $_SERVER['HTTP_USER_AGENT'] to find out wich user-agent was used. Maybe this string is just made up (robots indeed).
Someone replied the folowing on the php manual page for get_browser:
We are using get_browser() function for useragent Mozilla/4.0
(compatible; MSIE 4.01; Windows NT) the get_browser function is
returning as Default Browser and Platform = unknown.
So i added this to my browscap.ini manually:
[Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)] Parent=IE 4.01
Platform=WinNT
I Hope this helps.
I wasted much time to learn how to use that func,
While I finally understood that man should never use that function,
it will kill your performance !!!
try with/without get_browser to check the result.
ab -c 100 -n 100 http://yourserver/
use preg_match_all('/(opera|chrome|safari|firefox|msie)\/?\s*(\.?\d+(\.\d+)*)/i', $_SERVER['HTTP_USER_AGENT']) instead
Related
I set up a crontab to execute a php file every minute.
Now I need to create the php file but I’m clueless on what the contents should be.
All the code needs to do is visit the website url.
No need to save anything.
It just needs mimic loading the home page just like a browser would.
That in turn triggers a chain of events which are already in place.
It is an extremely low traffic site so that’s the reason for it.
I know, I could do it with curl.
But for reasons I won’t get into, it needs to be a php file.
Can anyone point me in the right direction please. Not expecting you to provide code, just direction.
Thanks!
You can use curl in PHP to just send a request to the page:
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, "the.url-of-the-page.here");
curl_exec($curl_handle);
curl_close($curl_handle);
curl
Example
You could also do it with one line (note that the whole HTML of the page is retrieved which takes a bit longer):
file_get_contents('URL');
As Prince Dorcis stated you could also use curl. If the website is not yours you should maybe (or have to) use curl and send a request with a useragent (you can find a list here):
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
Marco M is right, but there is a catch (it may not be for most but there is sometimes)
file_get_contents("https://example.com");
normally does the trick (i use that more than i should) BUT !
There is a setting in php.ini that needs to be on for that function to enable it to open URLs !
I had that once with a webhoster, they did not allow that ;)
I have this weird problem, and I don't know how to get rid of it.
Example: I put a var_dump('test') in my code at the top of the page. Just to edit something.
Alt-tab to chrome, cmd-R to refresh.
The var_dump('test')is not there. Cmd-R again. Still not there.
Then I wait for a minute, and refresh... And suddenly it's there.
Basically: I will always see code changes, but not immediately.
I have this problem in PhpStorm and Netbeans, so it's probably not an IDE problem.
Edit: I have also tried this in different browsers, and they all have this as well, so it's not a browser-related problem.
Has anyone had this problem before? Does anyone know a solution to this?
It's really difficult to work efficiently if I have to wait to see my edited code live...
EDIT:
I'm working on my localhost. Server setup is with MAMP.
REQUEST HEADERS:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:no-cache
Connection:keep-alive
Cookie:projekktorplayertracking_prkusruuid=D1A39803-4DE3-4C0B-B199-6650CF0F8DE5; Akamai_AnalyticsMetrics_clientId=C355983152DF60151A0C6375798CD52E8F09B995; __atuvc=4%7C47%2C0%7C48%2C0%7C49%2C17%7C50%2C47%7C51; PHPSESSID=885c62f543097973d17820dca7b3a526; __utma=172339134.2012691863.1384502289.1387377512.1387442224.41; __utmb=172339134.1.10.1387442224; __utmc=172339134; __utmz=172339134.1384502289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:local.sos
Pragma:no-cache
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36
RESPONSE HEADERS:
Connection:Keep-Alive
Content-Length:681
Content-Type:text/html
Date:Thu, 19 Dec 2013 09:00:54 GMT
Keep-Alive:timeout=5, max=99
Server:Apache/2.2.25 (Unix) mod_ssl/2.2.25 OpenSSL/0.9.8y DAV/2 PHP/5.5.3
X-Pad:avoid browser bug
X-Powered-By:PHP/5.5.3
EDIT:
I was messing around in MAMP's settings. My PHP version was 5.5.3, but then I couldn't set any PHP Extensions.
When I put PHP version on 5.2.17 (my only other option), I was able to set Cache to XCache.
So... Now my page is always up-to-date when reloaded immediately.
Thanks to anyone that replied and helped me with this!
This was the solution:
I was messing around in MAMP's settings. My PHP version was 5.5.3, but then I couldn't set any PHP Extensions.
When I put PHP version on 5.2.17 (my only other option), I was able to set Cache to XCache.
Then it worked.
But then I found this thread.
In your MAMP Dir go to : /bin/php/php5.5.3/conf/php.ini
And comment the Opcahe lines:
[OPcache]
;zend_extension="/Applications/MAMP/bin/php/php5.5.3/lib/php/extensions/no-debug-non-zts-20121212/opcache.so"
; opcache.memory_consumption=128
; opcache.interned_strings_buffer=8
; opcache.max_accelerated_files=4000
; opcache.revalidate_freq=60
; opcache.fast_shutdown=1
; opcache.enable_cli=1
Now I'm programming in PHP 5.5.3, and my pages are immediately updated.
There are three possible causes (I can think of):
Your browser is caching the file, on development sites you can disable your cache (eg in Chrome press F12 and click on the gear in the bottom right, check the checkbox to disable cache while developer tools are open - keep it open in development areas)
Your connection to your server is lagging, this can be caused by delayed uploads by your IDE or by your connection. You can test this by opening a SSH connection and check modified times after saving (eg; repeatedly pressing ls -la or watch -n 1 ls -la in the directory of the file)
In case of some applications another form of caching might exist. This can be APC or Opcache. In order for this to be the possible cause it might be wise to exclude the above first. This step requires you to analyze the headers send by the server as available on the Network tab of the devtools (in case of chrome)
Not sure about NetBeans, but PhpStorm updates the file as you type (there is no need to explicitly save). HOWEVER, the auto-save debounce is OS dependant. Mac might wait for file changes slower to refresh their contents. I'm not sure how to do it on OS X, because I can't recall the name of the feature but another workaround is to explicitly save the file using Command+S.
I had a similar sounding problem working locally with .php and .less files in IE and Chrome. Something was causing the css file to be cached or cookied or something and wouldn't display the changes made to the .less file. We fixed it by creating a php variable of the time stamp and then attaching the variable to the end of the file name and source link. The browser treated it like a new file and would always load it.
I don't have the actual code to do that right now (I'm at home) but will look for it tomorrow at work.
Obviously, this isn't the same problem you're having, but I thought it might give you a new direction to research your issue.
UPDATE #2:
I have confirmed with my contacts at NOAA that they are having big time interconnectivity problems all across NOAA. For example, they are only getting precipitation data from 2 locations. I am sure this is related. I let NOAA know about this thread and the work you all did to identify this as a connectivity issue.
UPDATE: Now the wget command works from my local server but not from the 1and1.com server. I guess that explains why it works from my browser. Must be a connection issue back east as some of you are also having the same problem. Hopefully this will clear itself as it looks like I can't do anything about it.
EDIT: It is clear that the fetch problem I am having it
unique to NOAA addresses in that there is no problem with my code and other sites
that all fetches work just fine in a normal browser
that no way I have been able to try will fetch the file with code.
My question is how can I make code that will fetch the file as well as the browser?
I have used this command to get an external web page for almost 2 years now
wget -O <my web site>/data.txt http://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt
I have tried this from two different servers with the same result so I am sure I am not being blocked.
Suddenly this morning it quit working. To make matters worse, it would leave processes running on the server until there were enough that it shut down my account and all my web sites were erroring out until we did a kill one at a time to the 49 sleeping processes.
I got no help from 1and1 tech support. They said it was my cron script, which was just the one line above.
So I decided to re-write the file get using php. I tried file_get_contents. I have tried curl, fgets as well. But none of this worked so I tried lynx.
Nothing loads this particular URL but everything I tried works fine on other urls.
But if I just copy http://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt into a browser, no problem - the file displays promptly.
Obviously it is possible to read this file because the browser is doing it. I have tried Chrome, IE, and Firefox and none had a problem loading this page but nothing I have tried in code works.
What I want to do is read this file and write it to the local server to buffer it. Then my code can parse it for various data requests.
What is a reliable way to read this external web page?
It was suggested I add a user agent so I changed my code to the following
function read_url($url){
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$output = curl_exec($ch);
if(curl_errno($ch)){
echo "<!-- curl_error($ch) -->";
}
curl_close($ch);
return $output;
}
Again, it works on other external web sites but not on this one.
I tried running the wget manually: Here is what I got
(uiserver):u49953355:~ > wget -O <my site>/ships_data.txt http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt
--2013-11-17 15:55:21-- http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt
Resolving www.ndbc.noaa.gov... 140.90.238.27
Connecting to www.ndbc.noaa.gov|140.90.238.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 690872 (675K) [text/plain]
Saving to: `<my site>/ships_data.txt'
0% [ ] 1,066 --.-K/s eta 7h 14m
It just stays at 0%
NOTE <my-site> is the web address where my data is stored. I did not want to publish the address of my bugger area but it is like mydomain/buffer/
I just tried the same thing from another server (not 1and1)
dad#myth_desktop:~$ wget -O ships_data.txt http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt
--13:14:32-- http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt
=> `ships_data.txt'
Resolving www.ndbc.noaa.gov... 140.90.238.27
Connecting to www.ndbc.noaa.gov|140.90.238.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 690,872 (675K) [text/plain]
3% [====> ] 27,046 --.--K/s ETA 34:18
It is stuck at 3% this time.
Both your wget commands worked for me.
It also seems that NOAA is not blocking your requests either since you get the 200 response code and HTTP headers (content length, type, etc) and part of the data (1066 bytes are somewhere in the row 7-8 of the data).
It may be that your connection (in general or specifically to NOAA) is slow or passing via some buffering proxy. Until the proxy gets all or most of the data, to wget it will look like connection is staling. Does it work to retrieve this file: http://www.ndbc.noaa.gov/robots.txt?
Option --debug of wget might also help to find out the problem.
Anyways, about hanging wget processes, you can use --timeout=60 option to limit the waiting time before failing (http://www.gnu.org/software/wget/manual/wget.html).
wget -O ships_data.txt http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt --timeout=10
If you want to set an user agent (like you did in the PHP script), you can use "--user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)" option.
wget -O ships_data.txt http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt "--user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
About curl vs wget, you can just replace the wget commands with curl command (instead of doing it in PHP):
curl -o ships_data.txt http://www.ndbc.noaa.gov/data/realtime2/ship_obs.txt --user-agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
Andrei
The file is available, but even though its small it is taking a long while to download. In a couple attempts I experienced up to 3 minutes and 47 seconds to fetch this tiny file of 23KB.
It is clearly some issue with their network, not much you can do about it.
Consider using set_time_limit(600) to allow your PHP script to take longer (10 minutes) to download the file, but at the same time not too long so it won't get stuck if it fails.
Since initially, the OP was able to not run the wget command manually, my guess was that the server IP was blocked.
Manually running the following command hung up, so it added weight to my said speculation.
wget -O <my web site>/data.txt http://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt on the hosted server
On checking if wget itself was working, OP did wget to a dummy endpoint. wget -O <web-site>/google.log www.google.com which worked.
Since OP mentioned that downloads proceeded sometimes, but not always and it worked from another server from the same hosted solution, I think we can now pin it to be an issue on the other website's network.
My guess is, the crons are being run at a very small frequency (say every minute), like
* * * * * wget -O <my web site>/data.txt http://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt
(or at a similar small frequency), and due to whatever kind of server load the external website has, the earlier requests either time out, or do not finish within the time period stipulated for them (1 minute).
Beacuse of this, OP is facing some race condition in which multiple cron processes are trying to write to the same file, but none of them are able to actually write to it completely because of the delay in receving packets for the file (Example, one process hanging from 12:10 AM, another one started at 12:11 AM, and one more started at 12:12 AM, none of them over)
The solution to this would be to make them little more infrequent, or if OP wants to use the same frequencies, then redownload only if a previous version of the download is not currently in progress. For checking if a process is already running, check this
Please consider the below request from apache access log.
119.63.193.131 - - [03/Oct/2013:19:22:19 +0000] "HEAD /blah/blahblah/ HTTP/1.1" 301 - "-" "\"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)\""
Does this request comply with the RFC / standard?
Would Apache pass malformed HEAD requests to PHP?
My configuration is Apache 2.2.15, mod_fcgid 2.3.7, PHP 5.3.3, Linux 2.6.32.60-40 x64, CentOS 6.4
I see nothing obviously wrong about the request in that log entry. It has an unusual user agent (with double quotes in it), but that doesn't make it malformed - it's perfectly valid, and Apache would certainly pass it on to PHP.
I have done fair few RESTful APIs with PHP and apache; never came across any such issues. Best would be to isolate the part that you want to doubly make sure to be working, which in your case is PHP and apache. So put together a basic PHP script that would dump $_SERVER and apache_request_headers() (may be other global variables) which would give you enough clue as to whether it is working or not. use curl -I option for a command line HTTP client; you may also use -v option to see exactly what happens from the client's perspective.
I am running a PHP script through cron every 30 minutes which parses and save some pages of my site on the same server. I need to run the script as Firefox or chrome useragent, since the parsed pages has some interface dependency on CSS3 styles.
I tried this within my script:
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
But the Firefox or Chrome dependent stylesheets doesn't load with it. I tried with both double and single quotes.
My question is: Is it possible to spoof useragent for scripts run through server and not browser and how.
NOTE: I know that my browser dependency for interface is bad. But I want to know if this is even possible.
EDIT
My script runs through the sitemap on the server and create a html cache of the pages in sitemap. It don't need to execute any js or css file. Only thing is to spoof useragent so that the cache generated contains the extra js and css files for that browser that are included in the header.
You can consider that I am generating cache files for all browser type - IE, webkits and firefox. So, that I can serve the cache file to the user based on their browser. At this time I am serving the same files to all users, that is without the extra css files.
I think I will need to hardcode the css file into my page so that it is always included in the cache (non-compatible browser won't show any change but it will only increase the file over-head for them). Thanks anyways
When you run a php script through Cron, the idea is that it is a script, not a webpage being requested. Even if you could spoof the useragent, the css and javascript isn't going to excecuted as if it would be running inside a real web browser. The point of cron is to run scripts, raw scripts, that do, for example, file operations.
Well, at first I would look at your user agent identification. I think it is unnecessary complicated, try simply Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1).
If this should not work for then you could try to execute the curl call as a shell command with exec(). In this case you could run into problems that the page is not really rendered itself. You could workaround this by using a X virtual framebuffer. This would make your page render in memory, not showing any screen output - ergo behave like a browser.
You could do it like this:
exec("xvfb-run curl [...]");
You can also set the user agent by using ini_set('user_agent', 'your-user-agent');
Maybe that will help you.