I want to load external websites inside div and make it a bit smaller to accommodate inside div more properly.
just like Google search do
I tried this:
$("#targetDiv").load("www.google.com");
but it is not working.
I tried iframe but it has still 2 problems:
scrolling is still enabled by pressing arrow keys & PGUP PGDOWN
how to make contents inside iframe smaller
Don't know which method i should use
which is more optimized
or any alternative?
What you're trying to do is not going to work. Unfortunately, JavaScript isn't allowed to make cross-domain requests for security reasons (reference: http://en.wikipedia.org/wiki/Same%5Forigin%5Fpolicy).
If you create a script written in PHP that resides on your own server that submits the request, that could work but the user wouldn't have a valid session and there's a risk that the URL (links) from the other site won't work if they're relative.
Example:
$('#targetDiv').load('load.php?url=www.google.com')
You could also have a look at jquery-crossframe. I've never used it but it claims to do what you're looking for.
The best option is to use an iframe element.
You are not going to be able to load a cross domain ajax call like that with jquery. from http://api.jquery.com/load/
Additional Notes:
Due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.
If iframe is not an option you can retrieve the data via an ajax call to a php page using curl.
Francois is right in that your ajax requests are restricted to same origin policy. That means you cannot load contents from other websites directly. What your are trying to achieve, however, is possible if your source supports JSONP. If you want to specifically load google search engine results check out Google Custom Search API
Related
Please let me know is it possible to scrap some info after ajax loaded with PHP? I had only used SIMPLE_HTML_DOM for static pages.
Thanks for advice.
Scraping the entire site
Scraping Dynamic content requires you to actually render the page. A PHP server-side scraper will just do a simple file_get_contents or similar. Most server based scrappers wont render the entire site and therefore don't load the dynamic content generated by the Ajax calls.
Something like Selenium should do the trick. Quick google search found numerous examples on how to set it up. Here is one
Scraping JUST the Ajax calls
Though I wouldn't consider this scraping you can always examine an ajax call by using your browsers dev tools. In chrome while on the site hit F12 to open up the dev tools console.
You should then see a window like the above. Hit the network tab and then hit chrome's refresh button. This will show every request made between you and the site. You can then filter out specific requests.
For example if you are interested in Ajax calls you can select XHR
You can then click on any of the listed items in the tabled section to get more information.
File get content on AJAX call
Depending on how robust the APIs are on these ajax calls you could do something like the following.
<?php
$url = "http://www.example.com/test.php?ajax=call";
$content = file_get_contents($url);
?>
If the return is JSON then add
$data = json_decode($content);
However, you are going to have to do this for each AJAX request on a site. Beyond that you are going to have to use a solution similar to the ones presented [here].
Finally you can also implement PhantomJS to render an entire site.
Summary
If all you want is the data returned by specific ajax calls you might be able to get them using file_get_contents. However, if you are trying to scrape the entire site that happens to also use AJAX to manipulate the document then you will NOT be able to use SIMPLE_HTML_DOM.
Finally I worked around my problem. I just get a POST url with all parameters from ajax call and make the same request using SIMPLE_HTML_DOM class.
I realized that many of web app use # in their app's URL.
For example, Google Analytics.
This address is in the URL bar when I am viewing the visitor's language page:
https://www.google.com/analytics/web/?hl=en#report/visitors-language/a33185827w60383872p61754588/
This address is in the address bar when I am viewing the visitors' geolocation page:
https://www.google.com/analytics/web/?hl=en#report/visitors-geo/a33185827w60383872p61754588/
I think that this is the Google Analytics web app passing #report/visitors-language and #report/vistiors-geo.
I know that Google analytics is using an <iframe>. It seems that only the main content box is changing when displaying content.
Is # used because of the <iframe> functionality?
There are several answers but none cover the backend part.
Here is a URL, one from your own example:
www.google.com/analytics/web/?hl=en#report/visitors-language/a33185827w60383872p61754588/
You can think about the post-hash (including the hash #) part as a client-side request.
The web server will never know what was entered after the hash sign. It is the browser pointing to a specific ID on the page.
For basic web pages, if you have this HTML: <a name="main">welcome</a>
on a web page at www.example.com/welcome, going to www.example.com/welcome#main will scroll your browser viewport to the welcome text in the <a> HTML tag.
The web server will not know whether #main was in the URL or not.
Values in the URL after a question mark are called URL parameters, e.g. www.example.com/?foo=bar. The web server can deliver different content based on those values.
However, there is a technology developed by Google called AJAX (Asynchronous JavaScript and XML) that makes use of the # part in the URL to deliver different content without a page load. It's not using an <iframe>.
Using JavaScript, you can trigger a change in the URL's post-hash part and make a request to the server to get a specific part of the page, for example for the URL www.example.com/welcome#main2 Even if an element named #main2 does not exist, you can show one using JavaScript.
A hashbang is #!. It is used to make search engine indexing easier by indicating that this part is a dynamic web page.
This is the "hash" in the url.
Many browsers support hash change event in javascript.
as per my knowledge the hash change is the revolution in the ajax callbacks.
as such when the user interacts with the any link with a hash then on the hash change the event is fired and you can apply any thing with the javascript.
one more thing is that hash change is supported by the browser history.
see below URL
SEO and the use of !# in a url
or Read it
'#! is called a "hashbang" and they are the root of all that is evil in web development.'
Basically, weak web developers decided to use #anchor names as a kludgy hack to get "web 2.0" things to work on their page, then complained to google that their page rank suffered. Google made a work around to their kludge by enabling the hashbang.
Weak web developers took this work around as gospel. Don't use it. It is a crutch.
Web development that depends on hashbangs is web-development done wrong.
This article is far more well worded than I could ever be, and deals with the Gawker media fiasco from their migration to a (failed) hashbang centric website. It tells you WHAT is happening and why it's bad.
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
Correct me if I'm wrong, the hashtag in that URL would be used as an anchor to scroll the page to an element with an id. For example, I send you to the url http://example.com/sample#example, and the page would scroll (just display) at the element (I'm using a div as an arbitrary example, it could be anything).
Ajax and hash mark in the url mostly used for quick action.
If you have a part in your site that can be visible only by fire event (mostly click) - it would be hard to share it. With hash mark in the url you can (by javascript) make the browser think that you did the required action and it will display the relevant part.
Normally the '#' is using in url will find the particular id which is next to '#' in that particular page. By using this we can view the particular content at middle of the page also.
I'm creating a website where users enter a URL and it's displayed in an iFrame, to be brief. I know a lot of websites have code to break out of iFrames (popular example, Google).
Is there any way to check, with JavaScript or PHP whether a given URL will break out of an iFrame?
As a side-note, I don't mind taking a website snapshot but I haven't found an existing adequate website and I can't seem to install wkhtmltoimage/pdf...but that's a different question.
So long as the iframe's URL is different to that of the parent (your website) the iframe's JavaScript cannot access anything in its parent.
For cross-domain iframe communication to work one might use HTML5's PostMessage (which has decent support as of right now) or passing params via the URL of the iframe.
Both of these methods require the parent (your website) to explicitly intercept the 'calls' from the iframe and do whatever...
All in all, for security reasons an iframe from an unknown source can't simply alter the parent site holding the iframe.
So I understand this may come across as a open question, but I need away to solve this issue.
I can either make the site do ajax request that load the body content, or I can have links that re-loads the whole page.
I need the site to be SEO compliant, and I would really like the header to not re-load when the content changes, the reason is that we have a media player that plays live audio.
Is there away that if Google BOT or someone without ajax enabled it does the site like normal href but if ajax or javascript allowed do it the ajax way.
Build the website without JS first, ensure it works as wished, each link linking a new unique page. Google parses your site without JS, so what you see with JS off is what he sees.
Then add the JS, with click handlers to prevent the default page reload and do your ajax logic instead. You could use JQuery and .load() to do this quite easily.
Other solution, you could use the recommended Google method ( https://developers.google.com/webmasters/ajax-crawling/ ), but it's more work and less effective SEO-wise.
Or you can put your audio player in a iFrame...
I'm new to YQL, and just trying to learn how to do some fairly simple tasks.
Let's say I have a list of URLs and I want to get their HTML source as a string in javascript (so I can later insert it to a database via ajax). How would I go about getting this info back in Javascript? Or would I have to do it in PHP? I'm fine with either, really - whatever can work.
Here's the example queries I'd run on their console:
select * from html where url="http://en.wikipedia.org/wiki/Baroque_music"
And the goal is to essentially save the HTML or maybe just the text or something, as a string.
How would I go about doing this? I somewhat understand how the querying works, but not really how to integrate with javascript and/or php (say I have a list of URLs and I want to loop through them, getting the html at each one and saving it somewhere).
Thanks.
You can't read other pages with Javascript due to a built-in security feature in web browsers. It is called the Same origin policy.
The usual method is to scrape the content of these sites from the server using PHP.
There is an other option with javascript called a bookmarklet.
You can add the bookmarklet in your bookmarks bar, and each time you want the content of a site click the bookmark.
A script will be loaded in the host page, it can read the content and post it back to your server.
Oddly enough, the same origin policy, does not prevent you to POST data from this host page to your domain. You need to POST a FORM to an IFRAME that has a source hosted on your domain.
You won't be able to read the response you get back from the POST.
But you can poll with a setInterval making a JSONP call to your domain to know if the POST was successful.