I've got a web application in Drupal that is basically acting as a proxy to a multi-page HTML form somewhere else. I am able to retrieve the page with cURL and parse it with DOMDocument, then embed the contents of the <form> inside a Drupal form:
<?php
function proxy_get_dom($url, $method = 'get', $arguments = array()) {
// Keep a static cURL resource for speed.
static $web = NULL;
if (!is_resource($web)) {
$web = curl_init();
// Don't include any HTTP headers in the output.
curl_setopt($web, CURLOPT_HEADER, FALSE);
// Return the result as a string instead of echoing directly.
curl_setopt($web, CURLOPT_RETURNTRANSFER, TRUE);
}
// Add any GET arguments directly to the URL.
if ($method == 'get' && !empty($arguments)) {
$url .= '?' . http_build_arguments($arguments, 'n', '&');
}
curl_setopt($web, CURLOPT_URL, $url);
// Include POST data.
if ($method == 'post' && !empty($arguments)) {
curl_setopt($web, CURLOPT_POST, TRUE);
curl_setopt($web, CURLOPT_POSTFIELDS, http_build_query($arguments));
}
else {
curl_setopt($web, CURLOPT_POST, FALSE);
}
$use_errors = libxml_use_internal_errors(TRUE);
try {
$dom = new DOMDocument();
$dom->loadHTML(curl_exec($web));
}
catch (Exception $e) {
// Error handling...
return NULL;
}
if (!isset($dom)) {
// Error handling...
return NULL;
}
libxml_use_internal_errors($use_errors);
return $dom;
}
function FORM_ID($form, &$form_state) {
// Set the initial URL if it hasn't already been set.
if (!isset($form_state['remote_url'])) {
$form_state['remote_url'] = 'http://www.example.com/form.faces';
}
// Get the DOMDocument
$dom = proxy_get_dom($form_state['remote_url'], 'post', $_POST);
if (!isset($dom)) {
return $form;
}
// Pull out the <form> and insert it into $form['embedded'].
$nlist = $dom->getElementsByTagName('form');
// assert that $nlist->length == 1
$form['embedded']['#markup'] = '';
foreach ($nlist->item(0)->childNodes as $childnode) {
// It would be better to use $dom->saveHTML but it does not accept the
// $node parameter until php 5.3.6, which we are not guaranteed to be
// using.
$form['embedded']['#markup'] .= $dom->saveXML($childnode);
}
// Apply some of the attributes from the <form> element onto our <form>
// element.
if (isset($element->attributes)) {
foreach ($nlist->item(0)->attributes as $attr) {
if ($attr->nodeName == 'action') {
$form_state['remote_action'] = $attr->nodeValue;
}
elseif ($attr->nodeName == 'class') {
$form['#attributes']['class'] = explode(' ', $attr->nodeValue);
}
elseif ($attr->nodeName != 'method') {
$form['#attributes'][$attr->nodeName] = $attr->nodeValue;
}
}
}
return $form;
}
function FORM_ID_submit($form, &$form_state) {
// Use the remote_action as the remote_url, if set.
if (isset($form_state['remote_action'])) {
$form_state['remote_url'] = $form_state['remote_action'];
}
// Rebuilt the form.
$form_state['rebuild'] = TRUE;
}
?>
However, the embedded form will not move past the first step. The issue seems to be that the page behind the proxy is setting a session cookie which I am ignoring in the above code. I can store the cookies with CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR, but I'm not sure where the file should be. For one thing it should definitely be a different location for each user, and it definitely should not be a publicly accessible location.
My question is: How do I store and send cookies from cURL per-user in Drupal?
Assuming you're using sessions, then use the user's session ID to name the cookie files. e.g.
curl_setopt(CURLOPT_COOKIEFILE, 'cookies.txt');
would give EVERYONE the same cookie file and they'll end up sharing the same cookie. But doing
curl_setopt(CURLOPT_COOKIEFILE, 'cookie-' . session_id() . '.txt');
will produce a unique session file for every user. You will have to manually remove that file, otherwise you're going to end up with a HUGE cookie filer repository. And if you're changing session ID's (e.g. session_regenerate_id(), you'll "lose" the cookie file because the session ID's won't be the same anymore.
Related
I'm trying to create a simple script that'll let me know if a website is based off WordPress.
The idea is to check whether I'm getting a 404 from a URL when trying to access its wp-admin like so:
https://www.audi.co.il/wp-admin (which returns "true" because it exists)
When I try to input a URL that does not exist, like "https://www.audi.co.il/wp-blablabla", PHP still returns "true", even though Chrome, when pasting this link to its address bar returns 404 on the network tab.
Why is it so and how can it be fixed?
This is the code (based on another user's answer):
<?php
$file = 'https://www.audi.co.il/wp-blabla';
$file_headers = #get_headers($file);
if(!$file_headers || strpos($file_headers[0], '404 Not Found')) {
$exists = "false";
}
else {
$exists = "true";
}
echo $exists;
You can try to find the wp-admin page and if it is not there then there's a good change it's not WordPress.
function isWordPress($url)
{
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1 );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
// grab URL and pass it to the browser
curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
// close cURL resource, and free up system resources
curl_close($ch);
if ( $httpStatus == 200 ) {
return true;
}
return false;
}
if ( isWordPress("http://www.example.com/wp-admin") ) {
// This is WordPress
} else {
// Not WordPress
}
This may not be one hundred percent accurate as some WordPress installations protect the wp-admin URL.
I'm probably late to the party but another way you can easily determine a WordPress site is by crawling the /wp-json. If you're using Guzzle by PHP, you can do this:
function isWorpress($url) {
try {
$http = new \GuzzleHttp\Client();
$response = $http->get(rtrim($url, "/")."/wp-json");
$contents = json_decode($response->getBody()->getContents());
if($contents) {
return true;
}
} catch (\Exception $exception) {
//...
}
return false;
}
I've got a problem...
I've a MVC-like framework and the redirect mechanism allows me too get snippets of HTML code generated by PHP on a remote host.
I'm getting these snippets by using the file_get_contents() function, with allow_url_fopen turned on.
The problem is the fact I use session data inside these code fragments and the session data is being lost every time. I'm assuming this new request is not sharing the same session data and therefore I need a way to get these fragments without losing my session data.
Any suggestions?
If the files your accessing are on the same server as the calling file then you might as well use include(); like #user574632's answer.
But if not, to keep the session you will need to handle the cookies the server sends;
Sessions are cookie based, server sets the session cookie your browser picks it up and uses it for all subsequent requests.
By default file_get_contents wont handle cookies, so your need to grab the header from the server by accessing $http_response_header array and then match with regex the Set-Cookie: header then store that and on following requests use the cookie and create a stream context with the cookie added to the header and pass that to fgc:
<?php
function get_cookies() {
//check cookies folder - or make it
if(!file_exists('./cookies/')){
mkdir('./cookies/', 0755, true);
}
$return = null;
foreach(glob("./cookies/*.txt") as $file) {
$return .= file_get_contents($file).';';
}
return $return;
}
function save_cookies($http_response_header) {
print_r($http_response_header);
foreach($http_response_header as $header) {
if(substr($header, 0, 10) == 'Set-Cookie'){
if(preg_match('#Set-Cookie: (([^=]+)=[^;]+)#i', $header, $matches)) {
$fp = fopen('./cookies/'.$matches[2].'.txt', 'w');
fwrite($fp, $matches[1]);
fclose($fp);
}
}
}
}
$opts = array('http' =>
array('header'=>'Cookie: '.get_cookies()."\r\n")
);
$context = stream_context_create($opts);
$contents = file_get_contents('http://mywebsite.com/snippets/', false, $context);
save_cookies($http_response_header);
echo $contents;
?>
Alternatively you should use curl instead its faster and handles cookies fine.
So something like the following, use curl and then revert to fgc if curl is not present, all wrapped up with cookie support in a class, so the 3 functions are contained:
<?php
//example usage
echo new curl_get_contents('http://example.com/page_that_needs_sessions');
class curl_get_contents{
public $result;
function __construct($url){
$this->curl_rev_fgc($url);
}
function __toString(){
return $this->result;
}
private function get_cookies() {
$return = null;
foreach(glob("./cookies/*.txt") as $file) {
$return .= file_get_contents($file).';';
}
return $return;
}
private function save_cookies($http_response_header) {
foreach($http_response_header as $header) {
if(substr($header, 0, 10) == 'Set-Cookie'){
if(preg_match('#Set-Cookie: (([^=]+)=[^;]+)#i', $header, $matches)) {
$fp = fopen('./'.$matches[2].'.txt', 'w');
fwrite($fp, $matches[1]);
fclose($fp);
}
}
}
}
private function curl_rev_fgc($url){
//check cookies folder - or make it
if(!file_exists('./cookies')){
mkdir('./cookies/', 0755, true);
}
$usragent = 'Mozilla/5.0 (compatible; Yourbot/0.1; +https://yoursite/bot.html)';
//Check curl is installed or revert to file_get_contents()
$curl = function_exists('curl_init') ? true : false;
if($curl){
$opts = array(
'http' => array(
'method' => "GET",
'header' => 'Cookie: '.$this->get_cookies().'\r\n', // cookie in fgc support
'user_agent' => $usragent)
);
$context = stream_context_create($opts);
$result = #file_get_contents($url, false, $context);
$this->save_cookies($http_response_header);
if(empty($result)){
$this->result = 'Error fetching: '.htmlentities($url);
}else{
$this->result = $result;
}
return;
}
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_TIMEOUT, 60);
curl_setopt($curl, CURLOPT_USERAGENT, $usragent);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
if(!file_exists('./cookies/curl.txt')){
file_put_contents('./cookies/curl.txt',null);
}
curl_setopt($curl, CURLOPT_COOKIEFILE, './cookies/curl.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, './cookies/curl.txt');
$result = curl_exec($curl);
if(empty($result)){
$this->result = 'Error fetching: '.htmlentities($url);
}else{
$this->result = $result;
}
curl_close($curl);
return;
}
}
?>
Use include instead. If you need to read the output into a variable to display later/elsewhere in the code, as suggested in the comments, use the output buffer:
ob_start();
include('path/to/file.php');
$included = ob_get_clean();
//nothing has been output to the browser yet
//later on
echo $included;
I need to get the response after the form is submitted. The form is submitted to a remote API server.
Here is some of the code:
/*
* Submits the data via a CURL session
*/
private function sendDetails(){
if(true || $_SERVER['REMOTE_ADDR'] != '85.17.27.88'){
$ch = curl_init($this->parseLink);
curl_setopt_array($ch, array(
CURLOPT_FRESH_CONNECT => true,
CURLOPT_HEADER => false,
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLINFO_HEADER_OUT => true,
CURLOPT_ENCODING => "",
CURLOPT_POSTFIELDS => $this->parseRequestString($this->result)
));
$this->returned = curl_exec($ch);
$this->headers = curl_getinfo($ch, CURLINFO_HEADER_OUT);
$this->result = $this->checkResponse();
if(!$this->result)
$this->setError($this->returned);
elseif(isset($this->p['mobi-submit'])) {
//redirect mobi users
$_SESSION['enquire-success-name'] = $this->p['enquire-name'];
wp_redirect('');
exit;
}
} else {
echo nl2br(var_export($this->result, true));
exit;
}
}
/*
* Checks the response from the webservice for errors / success
*/
private function checkResponse(){
libxml_use_internal_errors(true);
try {
$xml = new SimpleXMLElement($this->returned);
} catch(Exception $e){
return false;
}
if($xml) {
// If the response has a leadid attrib, then we submitted it successfully.
if(!empty($xml[0]['leadid'])) {
if(is_string($xml[0]))
$this->returned = $xml[0];
return true;
}
// If the errorcode is 7, then we have a resubmit and so it was successful.
elseif(!empty($xml->errors->error[0]['code']) && $xml->errors->error[0]['code'] == "7") {
if(is_string($xml->errors->error[0]))
$this->returned = $xml->errors->error[0];
return true;
}
// Otherwise try set the response to be the errormessage
elseif(!empty($xml->errors->error[0])){
if(is_string($xml->errors->error[0]))
$this->returned = $xml->errors->error[0];
return false;
}
// Otherwise set it to the first xml element and return false.
else {
if(is_string($xml[0]))
$this->returned = $xml[0];
return false;
}
}
// If the xml failed, revert to a more rudimentary test
elseif(stripos($this->returned, $this->expected) !== false) {
return true;
}
// If that also fails, expect error.
else {
return false;
}
}
I did not write this code and I'm not so familiar with curl. I need to get the $xml[0]['leadid'] response into another php file. Is this possible? Do I include the php file that has these function then create a new function? Or do I store it on my database then retrieve it from the database?
Would appreciate any help or further information!
You can get it from $this->returned['leadid'] if returned class member is public OR create a function
public function getReturnedVal($key = 'leadid'){
if(isset($this->returned[$key])){
return $this->returned[$key];
}
return "";
}
Just as you said in your question, your easiest option rather than dealing with sessions or databases to store the API response would be to include your script on the PHP file where you need the data (include('curl_api_script_file.php');) and then use a function like #volkinc suggested to echo the data into your PHP response page where you need it.
You could just use file_put_contents('http://path.to.another.php',$this->returned['leadid']); if you have fopen wrappers enabled.
The other php would have to just capture the transferred data.
Ok so I figured this out on my own after playing around with the classes etc. :
//Get the type of lead that is submitted
$lead_type = $insureFormResult->p['enquire-type'];
//Get the leadid from xml
$xml = $insureFormResult->returned;
if($xml){
$xml1 = new SimpleXMLElement($xml);
$leadid = $xml1->xpath("/response/#leadid")[0];
}
Thanks for your answers and inputs though!
I'm working on a little webcrawler as a side project at the moment and basically having it collect all hrefs on a page and then subsequently parsing those, my problem is.
How can I only get the actual page results? at the moment i'm using the following
foreach($page->getElementsByTagName('a') as $link)
{
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
$links[] = $link->getAttribute('href');
}
}
As you can see this will bring in jpegs, exe files etc. I only need to pickup the web pages like .php, .html, .asp etc.
I'm not sure if there is some function able to work this one out or if it will need to be regex from some sort of master list?
Thanks
Since the URL string alone doesn't connected with the resource behind it in any way you will have to go out and ask the webserver about them. For this there's a HTTP method called HEAD so you won't have to download everything.
You can implement this with curl in php like this:
function is_html($url) {
function curl_head($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTP_VERSION , CURL_HTTP_VERSION_1_1);
$content = curl_exec($curl);
curl_close($curl);
// redirected heads just pile up one after another
$parts = explode("\r\n\r\n", trim($content));
// return only the last one
return end($parts);
}
$header = curl_head('http://github.com');
// look for the content-type part of the header response
return preg_match('/content-type\s*:\s*text\/html/i', $header);
}
var_dump(is_html('http://github.com'));
This version is only accepts text/html responses and doesn't check if the response is 404 or other error (however follows redirects up to 5 jumps). You can tweak the regexp or add some error handling in either from the curl response, or by matching against the header string's first line.
Note: Webservers will run scripts behind these URLs to give you responses. Be careful not overload hosts with probing, or grabbing "delete" or "unsubscribe" type links.
To check if a page is valid (html,php... extension use this function:
function check($url){
$extensions=array("php","html"); //Add extensions here
foreach($extensions as $ext){
if(substr($url,-(strlen($ext)+1))==".".$ext){
return 1;
}
}
return 0;
}
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "") { if(check($link->getAttribute('href'))){ $links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');} }
elseif ( #$base_url['host'] == #$compare_url['host'] ) {
if(check($link->getAttribute('href'))){ $links[] = $link->getAttribute('href'); }
}
Consider using preg_match to check the type of the link (application , picture , html file) and considering the results decide what to do.
Another option (and simple) is to use explode and find the last string of the url which comes after a . (the extension)
For instance:
//If the URL will has any one of the following extensions , ignore them.
$forbid_ext = array('jpg','gif','exe');
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
if(check_link_type($link->getAttribute('href')))
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
if(check_link_type($link->getAttribute('href')))
$links[] = $link->getAttribute('href');
}
}
function check_link_type($url)
{
global $forbid_ext;
$ext = end(explode("." , $url));
if(in_array($ext , $forbid_ext))
return false;
return true;
}
UPDATE (instead of checking 'forbidden' extensions , let's look for good ones)
$good_ext = array('html','php','asp');
function check_link_type($url)
{
global $good_ext;
$ext = end(explode("." , $url));
if($ext == "" || !in_array($ext , $good_ext))
return true;
return false;
}
Hey all I have seen several questions on the topic here, but none of them have solved my problem. I have a script on my site which I want to use to generate several different types of emails to my users. I wanted a way to be able to create template files for the different emails which accept $_POST variables to fill in relevant information, and to simply make a post request to these templates and get back the response to place as the body of the email. I am attempting to write a function which would accept the location of the template file (either relative or absolute would work, but I would prefer relative honestly), and an array of parameters that I would like to send to the template via post. So far I have had no luck. Here is my code so far:
private function post_request($url, $data) {
$output = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$result = curl_exec($ch);
curl_close($ch);
if ($result) {
$output['status'] = "ok";
$output['content'] = $result;
} else {
$output['status'] = "failure";
$output['error'] = curl_error($ch);
}
curl_close($ch);
return $output;
}
I have been getting the error "couldn't connect to host" from curl, but after outputting my url to an error log I have been able to verify that copying and pasting the URL into firefox results in seeing the page correctly.
Any ideas? I am not married to the idea of using curl, so if there is a better option I would be more than happy to use it instead. Thanks for the help all!
You should be able to use file_get_contents() for this, so long as your host has not prevented it from accessing remote locations (and the $url script is not looking exclusively for POST data).
private function post_request($url, $data) {
$output = array();
$url_with_data = '';
foreach ( $data as $k=>$v ){ // Loop through data and create request string
$url_with_data .= '&' . $k . '=' . $v;
}
// Remove first ampersand and encode the data
$url_with_data = urlencode( substr( $url_with_data, 1 ) );
// Request file
// Format will be http://url.com?var1=data&var2=data&var3=data
$result = file_get_contents( $url . '?' . $url_with_data );
if ($result) {
$output['status'] = "ok";
$output['content'] = $result;
} else {
$output['status'] = "failure";
$output['error'] = 'Could not open remote file';
}
return $output;
}
Another option: You say that both files reside on the same server. If that is the case, you could simply require() the template builder.
private function post_request($url, $data) {
$output = array();
#require_once('./path/to/template_builder.php');
if ($result) {
$output['status'] = "ok";
$output['content'] = $result;
} else {
$output['status'] = "failure";
$output['error'] = 'Could not open remote file';
}
return $output;
}
Then in template_builder.php:
<?php
unset( $result );
if ( is_array( $data ) ){
// Parse $data ...
$result = $email_template;
}
As it turns out, the issue ended up being a server configuration error. The server was timing out while attempting to contact the file because it was hitting the wrong DNS server. Fixing that solved my problem!