I'm trying to pull a piece of data from the website www.coinmarketcap.com
specifically, the market cap number up the top.
I've been trying to figure this out the past hour or so and have read MANY different ways people use these web scrapers but have not been successful at all. Could someone shed some light?
There are multiple ways, but the easiest is just take their url:
https://files.coinmarketcap.com/generated/stats/global.json
Please note: They might not like this. Maybe they dont want external parties to use their scripts. So also buidl a check wether the file still exists and doesnt give a 403 back.
How did I find this:
When the page loads,the header with the information loads after the document ready, so it can not have been made by the server and has to be AJAX.
Now we know that it is AJAX, we want to know which file. You do this by opening your browsers console. All browsers have a network tab, showing all resources being loaded. When you filter by XHR you see all AJAX request. Then you try to find the right one.
Related
I'm trying to make a page divided in 2 frames. One that shows an external page... a shop for example,and other(mine) that offers related content to what is being shown in the external page.
I've been searching how to know the URL of the frame, but it can't be done due to security reasons(XSS, clickjacking, etc).
But I know it has to be one way to know it. I recently read that some plugins like the facebook's, have the ability to know where the client is while he is navigating.
I hope that my intention is clear. Do you know if this is possible?
Without having the site's owner install some JS on their site, you will find it very hard to find a cross-browser way to do this. FB do have people include a script and this is how they are able to access information cross domain.
You may be able to create plugins to do this, however you will have to create one per browser.
for one project, i need to get the facebook source page (html one) via a php application.
i try lot of method like curl, file_get_content, change my ini_set, etc.... but facebook never let me get the html result file.
Does anyone can help ?
for example this page :
ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
$data = file_get_contents("http://apps.facebook.com/is_cool/?cafe_action=album&view=scroll",0);
Print strip_tags($data,"");
Thanks a lot.
Damien
Comment 1 :
- I need to create 2 application. I want to parse the html code to get some information from one to the other. I don't want to duplicate or take the facebook code. I just want to make a "view source" (like IE or firefox) and put it on a file, without ask my users. When my user is logged in my first application, i just want to is is credential to get the other content.
The reason you're having problems is that the majority of the facebook homepage content is loaded via AJAX. The data is not hardcoded into what your browser renders.
You should think of a different way to accomplish your goals. If you tell us a little more about what you're trying to do, we can probably help you find an alternate method.
I am thinking about writing a script that will perform a sort of checkout procedure automatically similar to a program like Ebay snipe.
I will know what the page exactly looks like. All I really want to do is load the page from a different domain than the one that is running my script into an iframe, have jquery insert the data into the appropriate fields and then use javascript so click the submit button.
I have been reading about security issues with accessing information across different domains. On the domain I am trying to submit to I would like to call a few jquery functions such as .find() to get the id of the submit buttons so I can programatically click on them.
This might sound malicious or something which its not there is something going on sale that will sell out quick and I will not be around to click refresh one hundred times to try and buy it. I figured it would be a cool project to make a script that buys it for me.
Anyway my first question is, is this possible? Secondly, what would be the best way to solve this problem? I was going to use PHP/Javascript/Jquery. Will this even work/be allowed. Also if anyone has any other information that might help me out that would be great. Thanks.
Not going to happen... The only way I think you can accomplish this, for yourself, is by inserting your code with FireBug (or the like) on a per-use basis, or perhaps in a GreaseMonkey configuration... but it's not something that you could publish so that others would get the same functionality just by going to your page.
In Firefox, you can create a bookmark that runs JavaScript (instead of navigating to another page). So, now you can inject any javascript into your own page.
With this info, you can load jQuery from another domain along with any other scripts and automate whatever you like.
This only works for you, the person with the special bookmark, but you can hand the bookmark to others for their use.
hi im using ajax to extract all the pages into the main page but am not being able to control the refresh , if somebody refreshes the page returns back to the main page can anybody give me any solutions , i would really appreciate the help...
you could add anchor (#something) to your URL and change it to something you can decode to some particular page state on every ajax event.
then in body.onload check the anchor and decode it to some state.
back button (at least in firefox) will be working alright too. if you want back button to work in ie6, you should add some iframe magic.
check various javascript libraries designed to support back button or history in ajax environment - this is probably what you really need. for example, jQuery history plugin
You can rewrite the current url so it gives pointers to where the user was - see Facebook for examples of this.
I always store the 'current' state in PHP session.
So, user can refresh at any time and page will still be the same.
if somebody refreshes the page returns back to the main page can anybody give me any solutions
This is a feature, not a bug in the browser. You need to change the URL for different pages. Nothing is worse then websites that use any kind of magic either on the client side or the server side which causes a bunch of completely different pages to use the same URL. Why? How the heck am I gonna link to a specific page? What if I like something and want to copy & paste the URL into an IM window?
In other words, consider the use cases. What constitutes a "page"? For example, if you have a website for stock quotes--should each stock have a unique URL? Yes. Should you have a unique URL for every variation you can make to the graph (i.e. logarithmic vs linear, etc)? Depends--if you dont, at least provide a "share this" like google maps does so you can have some kind of URL that you can share.
That all said, I agree with the suggestion to mess with the #anchor and parse it out. Probably the most elegant solution.
I own an image hosting site and would like to generate one popup per visitor per day. The easiest way for me to do this was to write a php script that called subdomains, like ads1.sitename.com
ads2.sitename.com
unfortunatly most of my advertisers want to give me a block of javascript code to use rather than a direct link, so I can't just make the individual subdomains header redirects.I'd rather use the subdomains that way I can manage multiple advertisers without changing any code on page, just code in my php admin page. Any ideas on how I can stick this jscript into the page so I don't need to worry about a blank ads1.sitename.com as well as the popup coming up?
I doubt you'll find much sympathy for help with pop-up ads.
How about appending a simple window.close() after the advertising code? That way their popup is displayed and your window closes neatly.
I'm not sure that I've ever had a browser complain that the window is being closed. This method has always worked for me. (IE, Firefox, etc.)
At the risk of helping someone who wants to deploy popup ads (which is bound to fail due to most popup blockers anyway), why can't you just have the subdomains load pages that load the block of Javascript the advertisers give you?
Hey, cut the guy some slack. Popups might not be very nice, but at least he's trying to reduce the amount of them. And popup blockers are going to fix most of it anyway. In any case, someone else might find this question with more altruistic goals (not sure how they'd fit that with popups, but hey-ho).
I don't quite follow your question, but here's some ideas:
Look into Server Side Includes (SSI) to easily add a block of javascript to each page (though you could also do it with a PHP include instead)
Do your advertiser choosing in your PHP script rather than calling the subdomains
Decipher the javascript to work out what it's doing and put a modified version in the subdomain page so it doesn't need an additional popup. Shouldn't be too hard.