Get html cross site, display it and get an element value

Get html cross site, display it and get an element value - php

I need to get the complete output from an aspx site. When the user leaves I will save what's in some specific elements in cookies. The problem is that the aspx is on a domain I don't have access to. I want the output to behave as in an iframe so links need to be clickable but it won't leave my page.
I think of either AJAX with PHP-proxy or an iframe that I can modify content in.
Is this possible?
If it is possible and it involves server-side code I would like to know if there are any free web hosts that support the full code( for example almost every free web host has safe_mode on for PHP).
EDIT: I want to display this page : School scheme. The URL doesn't to change, it just sends requests to the server (think via JavaScript). When the user leaves I will see what's in the select box id="TypeDropDownList" and what's in the select box id="ScheduleIDDropDownList".
When the user returns to my page I will print those values to the page via URL like this "http://www.novasoftware.se/webviewer/(S(lv1isca2txx1bu45c3kvic45))/design1.aspx?schoolid=27500&code=82820&type=" + type + "&id=" + id + "
I tried several php proxy scripts on 000webhost before I posted here.
for example this :
<?php
ob_start();
function logf($message) {
$fd = fopen('proxy.log', "a");
fwrite($fd, $message . "\n");
fclose($fd);
}
?>
<?
$url = $_REQUEST['url'];
logf($url);
$curl_handle = curl_init($url);
curl_setopt($curl_handle, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Owen's AJAX Proxy");
$content = curl_exec($curl_handle);
$content_type = curl_getinfo($curl_handle, CURLINFO_CONTENT_TYPE);
curl_close($curl_handle);
header("Content-Type: $content_type");
echo $content;
ob_flush();
?>
But it returns Warning: curl_setopt(): supplied argument is not a valid cURL handle resource in /home/a5379897/public_html/ajax-proxy.php on line 16
I tried to contact them about this because they say they have cURL enabled but they haven't responded yet.
I think it would be possible to just display the two select boxes when the user first visit the page. When options is selected it will make an iframe show the right page by passing "http://www.novasoftware.se/webviewer/(S(lv1isca2txx1bu45c3kvic45))/design1.aspx?schoolid=27500&code=82820&type=" + type + "&id=" + id + " to the src attribute.
The problem with that is that I will need to retrieve the select boxes someway and I will have the same problem.

You would need to use PHP as Javascript doesn't doesn't allow cross domain requests. Your PHP code would literally grab the page the client wants, process it (changing link's href to your page with a get variable of the page the original href links to). When they click the link they will be sent to the same page they are on now but the page will grab the new page and return that(processing that page too) and so on.
000webhost are a nice free webhost that allow you to do most of PHP's functions and don't put adverts on your site.

To get the whole aspx output as a string to manipulate, you can use file_get_contents(http://yoursite.com/yourpage.aspx);
For best results, open a stream as the context via http.
<?php
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
?>

Thanks to greg I could create this script that gets the page.
<html>
<head>
</head>
<body>
<?php
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
$host = 'http://www.novasoftware.se/webviewer/(S(bkjwdqntqzife4251x4sdx45))/';
$url = '/design1.aspx?schoolid=27500&code=82820&type=3&id={7294F285-A5CB-47D6-B268-E950CA205560}';
$changetothis='src="'.$host;
// Open the file using the HTTP headers set above
$file = file_get_contents($host.$url, false, $context);
$changed = str_replace('src="', $changetothis,$file);
echo $changed;
?>
</body>
</html>

Related

extract reCaptcha from web page to be completed externally via cURL and then return results to view page

I am creating a web scraper for personal use that scrape car dealership sites based on my personal input but several of the sites that I attempting to collect data from a blocked by a redirected captcha page. The current site I am scraping with curl returns this HTML
<html>
<head>
<title>You have been blocked</title>
<style>#cmsg{animation: A 1.5s;}#keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style>
</head>
<body style="margin:0">
<p id="cmsg">Please enable JS and disable any ad blocker</p>
<script>
var dd={'cid':'AHrlqAAAAAMA1gZrYHNP4MIAAYhtzg==','hsh':'C0705ACD75EBF650A07FF8291D3528','t':'fe','host':'geo.captcha-delivery.com'}
</script>
<script src="https://ct.captcha-delivery.com/c.js"></script>
</body>
</html>
I am using this to scrape the page:
<?php
function web_scrape($url)
{
$ch = curl_init();
$imei = "013977000272744";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_COOKIE, '_ym_uid=1460051101134309035; _ym_isad=1; cxx=80115415b122e7c81172a0c0ca1bde40; _ym_visorc_20293771=w');
curl_setopt($ch, CURLOPT_POSTFIELDS, array(
'imei' => $imei,
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec($ch);
return $server_output;
curl_close($ch);
}
echo web_scrape($url);
?>
And to reiterate what I want to do; I want to collect the Recaptcha from this page so when I want to view the page details on an external site I can fill in the Recaptcha on my external site and then scrape the page initially imputed.
Any response would be great!

Datadome is currently utilizing Recaptcha v2 and GeeTest captchas, so this is what your script should do:
Navigate to redirection https://geo.captcha-delivery.com/captcha/?initialCid=….
Detect what type of captcha is used.
Obtain token for this captcha using any captcha solving service like Anti Captcha.
Submit the token, check if you were redirected to the target page.
Sometimes target page contains an iframe with address https://geo.captcha-delivery.com/captcha/?initialCid=.. , so you need to repeat from step 2 in this iframe.
I’m not sure if steps above could be made with PHP, but you can do it with browser automation engines like Puppeteer, a library for NodeJS. It launches a Chromium instance and emulates a real user presence. NodeJS is a must you want to build pro scrapers, worth investing some time in Youtube lessons.
Here’s a script which does all steps above: https://github.com/MoterHaker/bypass-captcha-examples/blob/main/geo.captcha-delivery.com.js
You’ll need a proxy to bypass GeeTest protection.

based on the high demand for code, HERE is my upgraded scraper that bypassed this specific issue. However my attempt to obtain the captcha did not work and I still have not solved how to obtain it.
include "simple_html_dom.php";
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
// This function is where the Magic comes from. It bypasses ever peice of security carsales.com.au can throw at me
function get_web_page( $url ) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url ); //initiate the Curl program that we will use to scrape data off the webpage
curl_setopt_array( $ch, $options ); //set the data sent to the webpage to be readable by the webpage (JSON)
$content = curl_exec( $ch ); //creates function to read pages content. This variable will be used to hold the sites html
$err = curl_errno( $ch ); //errno function that saves all the locations our scraper is sent to. This is just for me so that in the case of a error,
//I can see what parts of the page has it seen and more importantly hasnt seen
$errmsg = curl_error( $ch ); //check error message function. for example if I am denied permission this string will be equal to: 404 access denied
$header = curl_getinfo( $ch ); //the information of the page stored in a array
curl_close( $ch ); //Closes the Curler to save site memory
$header['errno'] = $err; //sending the header data to the previously made errno, which contains a array path of all the places my scraper has been
$header['errmsg'] = $errmsg; //sending the header data to the previously made error message checker function.
$header['content'] = $content; //sending the header data to the previously made content checker that will be the variable holder of the webpages HTML.
return $header; //Return all the pages data and my identifying functions in a array. To be used in the presentation of the search results.
};
//using the function we just made, we use the url genorated by the form to get a developer view of the scraping.
$response_dev = get_web_page($url);
// print_r($response_dev);
$response = end($response_dev); //takes only the end of the developer response because the rest is for my eyes only in the case that the site runs into a issue

file_get_contents not outputting svg

I'm trying to grab an svg file with php and output it onto a page. Site is built on Wordpress.
<?php
echo file_get_contents("site.com/wp-content/themes/oaklandcentral/img/fb_logo.svg");
?>
Nothing is displaying. The actual link is correct and goes directly to the svg. Any thoughts?

Path is wrong, use file_get_contents( get_template_directory() . '/img/fb_logo.svg' )

echo file_get_contents("site.com/wp-content/themes/oaklandcentral/img/fb_logo.svg");
Assumptions:
your "filename" string starts with the protocol e.g. "http://"
that you can access the file directly from a browser (i.e. not a
file/folder permissions issue);
that file_get_contents is returning FALSE? (does the E_WARNING tell
you anything?)
I recollect seeing that some servers are not configured to allow file_get_contents (or readfile with URLs) so I would check investigate this first (maybe allow_url_fopen in php.ini???).
If this is not the case (or is a limitation by host provider) and you cannot find the cause of the problem then CURL should work (on most servers!).
$url = 'http://example.com/path-to/fb_logo.svg';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
$svg = curl_exec($ch);
curl_close($ch);
echo $svg;
Edit:
It also appears you could also use php include if you remove the xml header tag from your SVG file.
I assume you want to display the SVG image not its text code, in which case why not simply <img src="<?php echo $mySvgFileUrl"; ?>"> like any other image?

Add header using: header("Content-Type: image/svg+xml"); before you echo the svg. sometimes the browser assumes the data received is html by default.

What could possibly be the problem.
The file isn't where you think it is.
File permission isn't allowing the code to read the file.
Try to debug:
<?php
$file = "site.com/wp-content/themes/oaklandcentral/img/fb_logo.svg";
if ( file_exists($file) ) {
echo file_get_contents($file);
} else {
echo "File $file does not exist";
}
?>
Try to get the page with curl. Take a look at the headers and the raw https response.
curl -v URL_TO_PAGE

Try:
<?php echo file_get_contents(get_template_directory().'/theme/img/chevron-right-solid.svg'); ?>
this seemed to work for me, where using get_template_directory_uri() did not (it threw an OpenSSL error)
Then, you can target the SVG and style how you wish.
For example:
<button class="btn btn-primary">
<?php echo file_get_contents(get_template_directory().'/theme/img/chevron-right-solid.svg'); ?>
</button>
CSS might be:
.btn svg { line-height: 1.2; max-height: 1rem;}

Try to use below code.
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
$file = file_get_contents('site.com/wp-content/themes/oaklandcentral/img/fb_logo.svg', false, $context);

"HTTP/1.1 406 Not Acceptable" using "file_get_contents()" - Same domain

I'm using file_get_contents() to get a PHP file which I use as a template to create a PDF.
I need to pass some POST values to it, in order to fill the template and get the produced HTML back into a PHP variable. Then use it with mPDF.
This works perfectly on MY server (a VPS using PHP 5.6.24)...
Now, at the point where I'm installing the fully tested script on the client's live site (PHP 5.6.29),
I get this error:
PHP Warning: file_get_contents(http://www.example.com/wp-content/calculator/pdf_page1.php): failed to open stream: HTTP request failed! HTTP/1.1 406 Not Acceptable
So I guess this can be fixed in php.ini or some config file.
I can ask (I WANT TO!!) my client to contact his host to fix it...
But since I know that hosters are generally not inclined to change server configs...
I would like to know exactly what to change in which file to allow the code below to work.
For my personnal knowledge... Obviously.
But also to make it look "easy" for the hoster (and my client!!) to change it efficiently. ;)
I'm pretty sure this is just one PHP config param with a strange name...
<?php
$baseAddr = "http://www.example.com/wp-content/calculator/";
// ====================================================
// CLEAR OLD PDFs
$now = date("U");
$delayToKeepPDFs = 60*60*2; // 2 hours in seconds.
if ($handle = opendir('.')) {
while (false !== ($entry = readdir($handle))) {
if(substr($entry,-4)==".pdf"){
$fileTime = filemtime($entry); // Returns unix timestamp;
if($fileTime+$delayToKeepPDFs<$now){
unlink($entry); // Delete file
}
}
}
closedir($handle);
}
// ====================================================
// Random file number
$random = rand(100, 999);
$page1 = $_POST['page1']; // Here are the values, sent via ajax, to fill the template.
$page2 = $_POST['page2'];
// Instantiate mpdf
require_once __DIR__ . '/vendor/autoload.php';
$mpdf = new mPDF( __DIR__ . '/vendor/mpdf/mpdf/tmp');
// GET PDF templates from external PHP
// ==============================================================
// REF: http://stackoverflow.com/a/2445332/2159528
// ==============================================================
$postdata = http_build_query(
array(
"page1" => $page1,
"page2" => $page2
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$context = stream_context_create($opts);
// ==============================================================
$STYLE .= file_get_contents("smolov.css", false, $context);
$PAGE_1 .= file_get_contents($baseAddr . "pdf_page1.php", false, $context);
$PAGE_2 .= file_get_contents($baseAddr . "pdf_page2.php", false, $context);
$mpdf->AddPage('P');
// Write style.
$mpdf->WriteHTML($STYLE,1);
// Write page 1.
$mpdf->WriteHTML($PAGE_1,2);
$mpdf->AddPage('P');
// Write page 1.
$mpdf->WriteHTML($PAGE_2,2);
// Create the pdf on server
$file = "training-" . $random . ".pdf";
$mpdf->Output(__DIR__ . "/" . $file,"F");
// Send filename to ajax success.
echo $file;
?>
Just to avoid the "What have you tried so far?" comments:
I searched those keywords in many combinaisons, but didn't found the setting that would need to be changed:
php
php.ini
request
header
content-type
application
HTTP
file_get_contents
HTTP/1.1 406 Not Acceptable

Maaaaany thanks to #Rasclatt for the priceless help! Here is a working cURL code, as an alternative to file_get_contents() (I do not quite understand it yet... But proven functional!):
function curl_get_contents($url, $fields, $fields_url_enc){
# Start curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
# Required to get data back
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
# Notes that request is sending a POST
curl_setopt($ch,CURLOPT_POST, count($fields));
# Send the post data
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_url_enc);
# Send a fake user agent to simulate a browser hit
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Safari/601.1.56');
# Set the endpoint
curl_setopt($ch, CURLOPT_URL, $url);
# Execute the call and get the data back from the hit
$data = curl_exec($ch);
# Close the connection
curl_close($ch);
# Send back data
return $data;
}
# Store post data
$fields = array(
'page1' => $_POST['page1'],
'page2' => $_POST['page2']
);
# Create query string as noted in the curl manual
$fields_url_enc = http_build_query($fields);
# Request to page 1, sending post data
$PAGE_1 .= curl_get_contents($baseAddr . "pdf_page1.php", $fields, $fields_url_enc);
# Request to page 2, sending post data
$PAGE_2 .= curl_get_contents($baseAddr . "pdf_page2.php", $fields, $fields_url_enc);

php Curl clicked links

I need any link that has a "a href=" tag when clicked to be received via curl. I can't hard code these links as they are from a dynamic site so could be anything. How would I achieve this?
Thanks
Edit: Let me explain more. I have an app on my pc that uses a web front end. It catalogs files and gives yo options to rename delete etc. I want to add a public view however if I put it as is online then anyone can delete rename files. If I curl the pages I can remove the menu bars and editing options through the use of a different css. That part all works. The only part that isn't working is if I click on a link on the page it directs me back to the original link address and that defeats the point as the menu bars are back. I need it to curl the clicked links. Hope that makes more sense..
Here is my code that fetches the original link and curls that and changes the css to point to my own css. It points the java script to the original as I dont need to change that. I now need to make the "a href" links on the page when clicked be called by curl and not go to the original destination
<?php
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, 'http://192.168.0.14:8081/home/');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$curl_response = curl_exec($ch);
curl_close($ch);
//Change link url
$link = $curl_response;
$linkgo = '/sickbeard_public';
$linkfind = 'href="';
$linkreplace = 'href="' . $linkgo ;
$link = str_replace($linkfind, $linkreplace, $link);
//Change js url
$js = $link;
$jsgo = 'http://192.168.0.14:8081';
$jsfind = 'src="';
$jsreplace = 'src="' . $jsgo ;
$js = str_replace($jsfind, $jsreplace, $js);
//Fix on page link errors
$alink = $js;
$alinkgo = 'http://192.168.0.14:8081/';
$alinkfind = 'a href="/sickbeard_public/';
$alinkreplace = 'a href="' . $alinkgo ;
$alink = str_replace($alinkfind, $alinkreplace, $alink);
//Echo page back
echo $alink;
?>

You could grab all the URLs using a regular expression
// insert general warning about how parsing HTML using regex is evil :-)
preg_match('/href="([^"]+)"/', $html, $matches);
$urls = array_slice($matches, 1);
// Now just loop through the array and fetch the URLs with cUrl...

While I can't imagine why you would do that I think you should use ajax.
Attach an event on every a tag and send them to a script on your server where the magic of curl would happen.
Anyway you should explain why you need to fetch data with curl.

As far as I can understand your question you need to get the contents of URL via CURL... so here is the solution
Click here to get via curl
Then attach an event with the above <a> tag, e.g. in JQuery
$("#my_link").click(function(){
var target_url = $(this).attr("href");
//Send an ajax call to some of your page like cURL_wrapper.php with target_url as parameter in get
});
then in cURL_wrapper.php do follwoing
<?php
//Get the $target_url here from $_GET[];
$ch = curl_init($your_domain");
$fp = fopen("$target_url", "r");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>

Redirecting an HTTP POST

Ok, so in my web app's API I have an incoming HTTP post request.
I would like to pass that POST request on to a different server, without losing the data in the POST header. Is this possible? which type of redirect would I use? php examples?
Edit: The HTTP request is coming from a mobile app, not a web browser.
Thanks!

You could use cURL or sockets to re-post the data, but you can't really redirect it.
POST'ing to a URL with cURL:
$ch = curl_init('http://www.somewhere.com/that/receives/postdata.php');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($_POST));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);

RewriteRule current-page.php http://www.newserver.com/newpage.php [NC,P]
The P on there (proxy) will preserve the POST data. You'll need to turn on the apache proxy module if it isn't already.

I know this is an old question but this may help people who stumble upon this question. You should be able to send an HTTP 307 response code to make the user agent redirect to the new url and continue to use the same method and data. This answer has more details

If the client (ie the mobile app) HTTP library supports this, then you can return HTTP 307 from server which states that "the request should be repeated with another URI ... with the same method". This is essentially a temporary redirect but tells the client to use the the same method, a POST.
The client making the request must be able to respond accordingly to the HTTP 307 response and follow the redirection with the same method - for many libraries this may be an additional flag or setting.

You cannot tell a browser to make a post request through an HTTP header. The location header will redirect, but only for GET or HEAD requests.
You can work around this limitation by displaying a page with a hidden form with the method attribute set to POST and the action set to the URL you want the browser to post to, then automatically submit it on page load. Example:
<body onload="document.getElementById('form').submit();">
<form id="form" action="http://example.com/form_handler.php" method="POST">
<input type="hidden" name="param_1" value="data">
</form>
</body>
Alternately, you can make the POST request on your server and then display the results.

I used the following code to redirect a post. In my case I am using only application/octet-stream content type so make sure you take that into consideration.
$request = file_get_contents ( "php://input" );
$arrContextOptions=array(
"http" => array(
"method" => "POST",
"header" =>
'Content-Type: application/octet-stream'. "\r\n".
'Content-Length: ' . strlen($request) . "\r\n",
"content" => $request,
),
"ssl"=>array(
"allow_self_signed"=>true,
"verify_peer"=>false,
),
);
$arrContextOptions = stream_context_create($arrContextOptions);
header ( "HTTP/1.1" );
header ( "Content-Type: application/octet-stream" );
$result = file_get_contents('http://thenewaddress.yes.it.works', false, $arrContextOptions);
file_put_contents("php://output", $result);

I think the solution I'm about to go with is something like:
<?
$url = 'http://myserver.com/file.php';
foreach ($_POST as $key => $value) {
$text .= (strlen($text) > 0 ? '&' : '');
$text .= $key . '=' . $value;
}
header('Location: ' . $url . '?' . $text);
exit;
?>
Can anyone think of a reason why this is a bad idea?

If you want to take data from a POST request and simply POST it to another server, then use cURL.
--or--
If you want to take data from a POST request and redirect the client to that other server while POSTing the data, then use this method...
Dynamically generate a form with all of the POST data. Something likes this...
echo "<form name=\"someform\" action=\"http://www.somewhereelse.com/someform.whatever\">";
foreach ($_POST as $key=>$value) {
echo "<input type=\"hidden=\" name=\"" . htmlspecialchars($key) . "\" value=\"" . htmlspecialchars($value) . "\" />";
}
echo "</form>";
Then, submit that form with some JavaScript when the page is done loading...
document.forms['someform'].submit();

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get html cross site, display it and get an element value - php

Related

extract reCaptcha from web page to be completed externally via cURL and then return results to view page

file_get_contents not outputting svg

"HTTP/1.1 406 Not Acceptable" using "file_get_contents()" - Same domain

php Curl clicked links

Redirecting an HTTP POST

Categories

Resources