php scrape dynamically loaded content via ajax using knockout.js

php scrape dynamically loaded content via ajax using knockout.js - php

I need to scrape some data from a website which is being loaded via ajax using knockout.js (I don't know exactly on which technology it is working.)
Site is www.msc.com. Here I am searching for schedules like from Barcelona to Miami. So the result is loaded via ajax but doesn't show up in console or firebug.
I have tried too many times. Any help or suggestion will be appreciable.

Their script is located at: https://www.msc.com/CMSTemplates/CraftedCMS/WebServices/RouteFinder.svc/Routes
They prevent you from calling it directly in a browser tab/window, probably because what you're trying to do is against their policy. If they wanted people to scrape their DATA, they would not block direct requests to their API or they would provide another publicly documented API for you to use on your server.
With that said, you can see in your browser console that their web service returns JSON objects. You will have to hack (maybe illegally) your way by faking protocol variables in order to accomplish what you're aiming for. The first things to consider is that they only return results through that web service when:
a) The call is made through XMLHttpRequest as POST. (This you can fake it easily, but the next points, not so much...)
b) The call is made using a referer, in this precise case, the referer is: https://www.msc.com/routefinder?fromId=406&isCountryFrom=false&toId=83&isCountryTo=false
c) The call passes a cookie to the server, which is encrypted and signed, so each session is in their database and your key is probably unique, so good luck decrypting this: CMSPreferredCulture=fr-FR; ASP.NET_SessionId=gza5rfjrog2eb21ukrzma223; BIGipServerkentico.app~kentico_pool=439883018.20480.0000; bbbbbbbbbbbbbbb=LIKIGEACDJHDJPGPEOKGJBKODKDGOMHNKAEGEGKNODEDAEILEICBMLNLEFMAOIPPKMOIBBFAILFEEKJPIJDCBDDLFNBBMBPBGGKAIDOCMGHBEEIDMLPMIJJAMNFNIFMI; rxVisitor=1497537754979PTPODMSFNIR8BFVAKK353FS76M2D1KNN; dtPC=3$537845860_975h-vCQTABPJMGEOKDPDVNLHPCPDASGAPMCPCBA; rxvt=1497539656937|1497537754995; dtSa=-; dtLatC=8; _ga=GA1.2.1247106544.1497537756; _gid=GA1.2.879601947.1497537756; _gali=results; cookiePolicyApproved=true; MSCAgencyId=355840; _gat=1; _gat_local=1; dtCookie=3$B74DFC30736F7DBF485B79C31C55B167|www.msc.com|1

Related

How can I secure JSON web service?

I have location data containing lat,long,location_name to be shown in the map. Only logged in users can see this map. What I did was that I used php and with a select * to MySQL DB and then I used json_encode to format the data in usable way and echoed it to be grabbed in the front-end and used in map api. This php file echoing the JSON file is called mapData.php
I want this file to not be accessible even from logged user. I came across session and request headers in the mapData.php file (internal api file)but then again if h hacker sign up to my service and open dev console he/she can see the received file and with one side requesting tool can put the header and see the data. Or maybe changing the access level with Linux but I have no idea how.
Another method is uglify and minifying JSON but since I am having 29000 rows in my dB with another inner join I think it will slow down the process. Any suggestion for securing this internal api so that even logged in user cannot access to it?

I would hide the map data file in a subdirectory, then use a service to access the data file and retrieve just the data you need. If you absolutely need the 29,000 rows at once, then there's not much you can do. Even if you encrypt it, eventually the data is going to be in native JavaScript format, and then it's just a matter of running a debugger and peering in the data structures.

Using Delphi and HTTP POST to do web actions

I have a web application which I wrote in PHP. Each of my forms do an HTTP POST to a PHP file which processes the data and returns a result.
Now I want to use RAD Studio's Delphi XE4 to create an application which can be used on phones to perform basic functions on the site.
For example...
I have a function in my PHP file called F.
F Does some calculations with parameters passed using the $_REQUEST[''] directive.
So my question is: is there a way that I can use Delphi to post to my website and return the result.
I've searched for similar requests but no-one seems to have done this before.
I would even use a JavaScript file if someone can tell me how I can incorporate it?
I know jQuery has a $.ajax method, is there maybe a way to implement that?

I can assure you that you're not the first person to do an HTTP request via Delphi :)
You state that you're fetching the request data via $_REQUEST, so you'll get both POST and GET data, so perhaps these links might be of interest:
What's the simplest way to call Http GET url using Delphi?
What’s the simplest way to call Http POST url using Delphi?

PHP Get Cookie by Session ID (or otherwise pass data between two different connections)

Normally I try to format my question as a basic question and then explain my situation, but the solution I'm looking for might be the wrong one altogether, so here's the problem:
I'm building a catalog application for an auction website that has the ability to save individual lots. So far this has worked great by simply creating a cookie with a comma-separated list of IDs for those lots, via something like this:
$_COOKIE["MyLots_$AuctionId"] = implode(",",$arrayOfIds);
The problem I'm now hitting is that when I go to print the lots, I'm using wkhtmltopdf through the command-line to request the url of the printout I want, like this:
exec("wkhtmltopdf '$urlofmylots' filename.pdf");
The problem is that I can't pass a cookie to this call, because Apache sees an internal request, not the request of the user. I tried putting it in the get string, but once I have more than a pre-set limit for GET parameters, that value disappears from the $_GET array on the target url. I can't seem to find a way to send POST data between them. My next possible ideas are the following:
Maybe just pass the sessionID to the url, and see if there's a way that I can use PHP to dig through the cookies for that session and pull the right cookie, but that sounds like it'd be risky security-wise for a PHP server to allow (letting one session be aware of another). Example:
exec("wkhtmltopdf '$urlofmylots?sessionId=$sessionIdFromThisRequest' filename.pdf");
Possibly set a session variable and then pass that session Id, and see if I can use PHP to wade through that information instead (rather than using the cookie).
Would I be able to just create an array and somehow have that other script be aware of it, possibly by including it? That doesn't really solve the problem of wkhtmltopdf expecting a web-facing address as its first parameter.
(not really an idea, but some reasoning) In other instances of using this, I've just passed an ID to the script that generates the markup for wkhtmltopdf to parse, and the script uses that ID to get data from the database. I don't want to store this data in a file or the database for the simple purpose of transferring data from the caller to the callee in this case. Cookies and sessions seem cleaner since apache/php handle memory allocation for these sessions.
The ultimate problem here is that I'm trying to get my second script (referenced here by $urlofmylots) to be aware of data available to the calling script but it's being executed as if it were an external web request, not two php scripts being called from the web root.
Can anyone offer some insight here?

You might consider rendering whatever the output of $urlofmylots?lots=$lots_to_print would be to a temporary file and running wkhtmltopdf against that file.

AJAX security and user managment

I am working on a web application that will be hosted on a server that is "on the internet", not a LAN.
The app uses quite a bit of AJAX calls and has about 12 ajax handler files for the functions.
My question is instead of asking anybody here to write a tutorial on AJAX security, does anybody know of any good resources (website, book, whatever) that can help me with securing these files.
Right now, as long as you know the variable name its looking for you can freely get data from the database.
I was thinking maybe session validation, or something along those lines for the logged in user.
Anyways if you have any good resources I'll do the homework myself.
Thanks

AJAX calls are generally used to access web services, which is what it seems you are using them for here. If that is the case then what you need to be concerned about is the security layer that you have provided in the server-side scripting language you are using (looks like you are using PHP as per your question's tags).
The same way that you do authentication and protection for other pages on your site that aren't accessed via AJAX calls you can implement for your web services. For instance, if you require authentication for your application then you can store the user's ID in $_SESSION. From there you can check to make sure the user is logged in via $_SESSION whenever one of your web services is requested.

I've often seen AJAX calls that check the X-REQUESTED-WITH HTTP header to "verify" that the request originated from AJAX. Depending on how you're sending your AJAX calls (with XmlHttpRequest or a JS library), you can either use the standard value for this header, or set it to a custom value. That way, you can do something similar to this in PHP to check if the page was requested with AJAX:
http://davidwalsh.name/detect-ajax
if( !empty($_SERVER['HTTP_X_REQUESTED_WITH']) &&
strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest')
It is important to note that since it's an HTTP header, it can be spoofed, so it is by no means full-proof.

Here is a good resource. Securing Ajax Applications: Ensuring the Safety of the Dynamic Web
However a very simple method is to use a MD5 hash with a private key. e.g. USER_NAME+PRIVATE_KEY. If you know the users name on the website/login you can provide that key in an MD5 hash set to a javascript variable. Then simply pass the users name in your AJAX request and the REST service can just take the same private key plus the users name and compare the two hashes. You're simply sending across a hash, and the user name then. It's simple and effective. Virtually impossible to reverse too unless you have a simple private key.
So in your javascript you might have this set:
var user='username';
var hash='925c35bae29a5d18124ead6fd0771756'
Then, when you send your request you send something like this:
myService.php?user=username&hash=925c35bae29a5d18124ead6fd0771756&morerequests=goodthings
When you check it, in the service you would do something like this
<?php
if(md5($_REQUEST['user']."_privatekey")==$_REQUEST['hash']){
echo 'passed validation';
}else{
echo 'sorry charlie';
}?>
Obviously you would need to use PHP or something else to generate the hash with the private key, but I think you get the general idea. _privatekey should be something complex in the event you do have a troll that tries to hack it.

how to prevent 'manual execution' of external PHP script

In simplest terms, I utilize external PHP scripts throughout my client's website for various purposes such as getting search results, updating content, etc.
I keep these scripts in a directory:
www.domain.com/scripts/scriptname01.php
www.domain.com/scripts/scriptname02.php
www.domain.com/scripts/scriptname03.php
etc..
I usually execute them using jQuery AJAX calls.
What I'm trying to do is find is a piece of code that will detect (from within) whether these scripts are being executed from a file via AJAX or MANUALLY via URL by the user.
IS THIS POSSIBLE??
I've have searched absolutely everywhere and tried various methods to do with the $_SERVER[] array but still no success.

What I'm trying to do is find is a piece of code that will detect (from within) whether these scripts are being executed from a file via AJAX or MANUALLY via URL by the user.
IS THIS POSSIBLE??
No, not with 100% reliability. There's nothing you can do to stop the client from simulating an Ajax call.
There are some headers you can test for, though, namely X-Requested-With. They would prevent an unsophisticated user from calling your Ajax URLs directly. See Detect Ajax calling URL

Most AJAX frameworks will send an X-Requested-With: header. Assuming you are running on Apache, you can use the apache_request_headers() function to retrieve the headers and check for it/parse it.
Even so, there is nothing preventing someone from manually setting this header - there is no real 100% foolproof way to detect this, but checking for this header is probably about as close as you will get.
Depending on what you need to protect and why, you might consider requiring some form of authentication, and/or using a unique hash/PHP sessions, but this can still be reverse engineered by anyone who knows a bit about Javascript.
As an idea of things that you can verify, if you verify all of these before servicing you request it will afford a degree of certainty (although not much, none if someone is deliberately trying to cirumvent your system):
Store unique hash in a session value, and require it to be sent back to you by the AJAX call (in a cookie or a request parameter) so can compare them at the server side to verify that they match
Check the X-Requested-With: header is set and the value is sensible
Check that the User-Agent: header is the same as the one that started the session
The more things you check, the more chance an attacker will get bored and give up before they get it right. Equally, the longer/more system resources it will take to service each request...

There is no 100% reliable way to prevent a user, if he knows the address of your request, from invoking your script.
This is why you have to authenticate every request to your script. If your script is only to be called by authenticated users, check for the authentication again in your script. Treat it as you will treat incoming user input - validate and sanitize everything.
On Edit: The same could be said for any script which the user can access through the URL. For example, consider profile.php?userid=3

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php scrape dynamically loaded content via ajax using knockout.js - php

Related

How can I secure JSON web service?

Using Delphi and HTTP POST to do web actions

PHP Get Cookie by Session ID (or otherwise pass data between two different connections)

AJAX security and user managment

how to prevent 'manual execution' of external PHP script

Categories

Resources