Resolving errors in PHP script with large execution time - php

I have implemented a web crawler that crawls and retrieves content from .edu TLD. The html content is being inserted into MySQL tables as the source code of the page. The script can go on for hours on a decent internet connection when a large number of seed urls are fed to the crawler. Now, my problem is that the script halts after crawling a number of links without giving any errors. I have used exception handling to handle "MySQL Server has gone away error" and has already eliminated a lot of problems and implemented if conditions that echo the errors if they are encountered. However I am not getting any errors. The problem is the halting of the script, whether I run it in the browser, Eclipse PDT or the CLI. Though it is worthy to note that the number of links crawled are somewhat different in all the three methods of running the script. I have altered the php.ini max_execution_time and other directives but this is not helping in anyway.
I have coded the script so that it resumes the crawling from where it halted, but I want the script to continue without halting so that I don't have to monitor whether the script is running or not.
Should I make changes to my Apache httpd.conf files. If yes, then what those settings should be??
The description in these links for my web crawler may help.
Errors regarding Web Crawler in PHP
Solving "MySQL server has gone away" errors
This is the code that retrieves html from url. This is from simple_html_dom.
function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT)
{
// We DO force the tags to be terminated.
$dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $defaultBRText);
// For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
$contents = file_get_contents($url, $use_include_path, $context, $offset);
// Paperg - use our own mechanism for getting the contents as we want to control the timeout.
// $contents = retrieve_url_contents($url);
if (empty($contents))
{
return false;
}
// The second parameter can force the selectors to all be lowercase.
$dom->load($contents, $lowercase, $stripRN);
return $dom;
}
Here is the error log for the following links:
http://www.nust.edu.pk/
http://www.harvard.edu/
http://berkeley.edu/
http://www.columbia.edu/
http://www.princeton.edu/main
http://www.stanford.edu/
And the crawler stopped after crawling this link:
http://itunes.columbia.edu/m/
[01-Jan-2012 22:54:39] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 22:54:39] PHP Warning:
file_get_contents(http://lms.nust.edu.pk) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 22:54:41] PHP Warning:
file_get_contents(http://www.nust.edu.pk/#) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
... (same error repeated twice) ...
[01-Jan-2012 22:55:58] PHP Warning:
file_get_contents(http://www.nust.edu.pk/usr/oricdic.aspx#ipo) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:55:58] PHP Warning:
file_get_contents(http://www.nust.edu.pk/usr/oricdic.aspx#tto) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:55:59] PHP Warning:
file_get_contents(http://www.nust.edu.pk/usr/oricdic.aspx#ilo) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:55:59] PHP Warning:
file_get_contents(http://www.nust.edu.pk/usr/oricdic.aspx#mco) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:56:05] PHP Warning:
file_get_contents(http://www.nust.edu.pk/#) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
... (same error repeated 18 times) ...
[01-Jan-2012 22:57:33] PHP Warning:
file_get_contents(http://www.nust.edu.pk/#ctl00_SiteMapPath1_SkipLink)
[function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:57:33] PHP Notice: Undefined variable: parts in
D:\wamp\www\crawler1\AbsoluteUrl\url_to_absolute.php on line 330
[01-Jan-2012 22:57:55] PHP Warning:
file_get_contents(http://www.harvard.edu/#skip) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:58:21] PHP Warning:
file_get_contents(http://www.harvard.edu/admissions-aid#undergrad) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:58:22] PHP Warning:
file_get_contents(http://www.harvard.edu/admissions-aid#grad) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:58:24] PHP Warning:
file_get_contents(http://www.harvard.edu/admissions-aid#continue) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 22:58:25] PHP Warning:
file_get_contents(http://www.harvard.edu/admissions-aid#summer) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
[01-Jan-2012 23:00:04] PHP Warning:
file_get_contents(http://www.harvard.edu/#) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line
72
... (same error repeated 1 time) ...
[01-Jan-2012 23:00:11] PHP Notice: Undefined variable: parts in
D:\wamp\www\crawler1\AbsoluteUrl\url_to_absolute.php on line 330
[01-Jan-2012 23:00:41] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:00:41] PHP Warning:
file_get_contents(http://directory.berkeley.edu) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:00:47] PHP Notice: Undefined variable: parts in
D:\wamp\www\crawler1\AbsoluteUrl\url_to_absolute.php on line 330
[01-Jan-2012 23:01:53] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:01:53] PHP Warning:
file_get_contents(http://students.berkeley.edu/uga/) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:01:57] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:01:57] PHP Warning:
file_get_contents(http://publicservice.berkeley.edu/) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:00] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:00] PHP Warning:
file_get_contents(http://students.berkeley.edu/osl/leadprogs.asp) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:17] PHP Notice: Undefined variable: parts in
D:\wamp\www\crawler1\AbsoluteUrl\url_to_absolute.php on line 330
[01-Jan-2012 23:02:25] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:25] PHP Warning:
file_get_contents(http://bearfacts.berkeley.edu/bearfacts) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:28] PHP Warning: file_get_contents() [streams.crypto]: this stream does not
support SSL/crypto in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
[01-Jan-2012 23:02:28] PHP Warning:
file_get_contents(http://career.berkeley.edu/) [function.file-get-contents]:
failed to open stream: Cannot connect to HTTPS server through proxy in
D:\wamp\www\crawler1\simplehtmldom_1_5\simple_html_dom.php on line 72
And this is the error log from php-cgi.exe:
Problem signature:
Problem Event Name: APPCRASH
Application Name: php-cgi.exe
Application Version: 5.3.8.0
Application Timestamp: 4e537939
Fault Module Name: php5ts.dll
Fault Module Version: 5.3.8.0
Fault Module Timestamp: 4e537a04
Exception Code: c0000005
Exception Offset: 0000c793
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1033
Additional Information 1: 0a9e
Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
Additional Information 3: 0a9e
Additional Information 4: 0a9e372d3b4ad19135b953a78882e789
Please help me in this regard.

you should check call stack of php process (if running as CGI or CLI) or apache httpd process(if run as mod_php).
Then you will see in which module/procedure are execution halted.
Also you can check active TCP/IP connection made by your script, maybe there is some ongoing IO operation which caused your script to halted.
I hope this helps.

Related

php gives error - failed to open stream: HTTP request failed! HTTP/1.1 505 HTTP Version not supported

I am trying to get ScannerStatus from an HP scanner with
fopen($scannerstatus1,"r");
I get errors like
Warning: fopen(http://10.0.0.106:80/eSCL/ScannerStatus): failed to open stream: HTTP request failed! HTTP/1.1 505 HTTP Version not supported in /var/www/html/eSCL/ScannerStatus/index.php on line 14
I have also tried
readfile($scannerstatus1);
&
file_get_contents($scannerstatus1);
but all yield the same error.
I can open the url with a web browser.
Anyone have any ideas?

How to include php page from different website to current website?

I have a created a domain but all the functions and classes of that domain is to be stored on another domain. but i am unable to get the page using include or require method. I am getting the following error:
Warning: include_once() [function.include-once]: Failed to enable crypto in /domain.com/includes/functions.php on line 3 Warning: include_once(https:/other_domain.com/access.php) [function.include-once]: failed to open stream: operation failed in /domain.com/includes/functions.php on line 3 Warning: include_once() [function.include]: Failed opening 'otherdomain.com/access.php' for inclusion (include_path='.:/opt/php52/lib/php') in /domain.com/includes/functions.php on line 3
The crypto error leads me to think this may be an issue with your SSL:
OPENSSL file_get_contents(): Failed to enable crypto

Tuleap: cannot not attach file in Tracker

Trying to attach a file (image) to an artifact in Tracker I've got an error:
2018/07/11 13:16:04 [error] 3553#0: *1299 FastCGI sent in stderr: "PHP message: PHP Warning: mkdir(): Permission denied in /usr/share/tuleap/plugins/tracker/include/Tracker/FormElement/Tracker_FormElement_Field_File.class.php on line 955
PHP message: PHP Warning: chown(): Operation not permitted in /usr/share/tuleap/src/common/backend/Backend.class.php on line 185
PHP message: PHP Warning: chgrp(): Operation not permitted in /usr/share/tuleap/src/common/backend/Backend.class.php on line 222
PHP message: PHP Warning: move_uploaded_file(/var/lib/tuleap/tracker/447/105): failed to open stream: No such file or directory in /usr/share/tuleap/plugins/tracker/include/Tracker/FormElement/Tracker_FormElement_Field_File.class.php on line 966
PHP message: PHP Warning: move_uploaded_file(): Unable to move '/tmp/phpinBdbP' to '/var/lib/tuleap/tracker/447/105' in /usr/share/tuleap/plugins/tracker/include/Tracker/FormElement/Tracker_FormElement_Field_File.class.php on line 966" while reading response header from upstream, client: 10.73.12.147, server: tuleap, request: "POST /plugins/tracker/?aid=4&func=artifact-update HTTP/2.0", upstream: "fastcgi://127.0.0.1:9000", host: "tuleap", referrer: "https://tuleap/plugins/tracker/?aid=4"
The artifact is updated without visible errors in GUI, but the changeset is empty, the image is not displayed.
I followed the full installation process when set up Tuleap (no Docker).
SELinux is disabled as suggested in this guide.
CentOS 7, Tuleap™ 10.1.99.104
It seems to have been a bug in this particular build of Tuleap.
Workaround:
chown -R codendiadm:codendiadm /var/lib/tuleap/tracker/
More info in the official bugtracker:
https://tuleap.net/plugins/tracker/?aid=11821

Failed to open stream and twitter won´t load

Severity: Warning Message:
file_get_contents(https://api.twitter.com/1/statuses/user_timeline.xml?user_id=432572114&include_entities=true&include_rts=true&count=3): failed to open stream: HTTP request failed!
HTTP/1.0 410 Gone Filename: controllers/ibiza.php
Line Number: 42
Any ideas?
Read https://dev.twitter.com/blog/api-v1-retirement-final-dates
API v1 is no longer available.

facebook access token error

Please help me to solve the following error.
Warning: file_get_contents() [function.file-get-contents]: https:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/australi/public_html/fb/index.php on line 27
Warning: file_get_contents(https://graph.facebook.com/me?access_token=AAABempp6Ls0BANc98WmqZAreBbUzPnT1xyer9wtPmbvlwsnZCc4AKwuvCAVosLxw4yItvOkDoIK5hyCvBPZAk90nLx4PZACorrZCZAAi9pGgZDZD) [function.file-get-contents]: failed to open stream: no suitable wrapper could be found in /home/australi/public_html/fb/index.php on line 27
Invalid Access Token
From what I can tell you need to allow the wrapper in your php.ini config file.

Categories