I would like to get the source of one of a page with a flash/html5 video player. If I use the normal curl request, I only get the flash code, but I want to get the html5 video code.
Is it possible to change the header (I already tried to change the x-flash-version in the header, but it doesn't work) or something to say the javascript,which checks, whether Iam using flash, that Iam not using flash?
Thank you in advance,
Noro
Without a URL of the page that you are trying to get the HTML for it is hard to know how the page is deciding whether to render HTML video code or flash code.
It is very likely that JavaScript on the page is changing the DOM at the time to support the Video tags.
If you have a browser that renders with video tags then look at the headers being sent in the POST / GET request and match those headers in your curl call. You can experiment by using the -H flag to the curl command line to send different headers until you get the result you need.
Related
The url I want to scrape is https://www.tokopedia.com/juraganlim/info
and I just want to get the number of transaction like in this image (inside box is what I need to take):
I really am confused with ajax because I don't know the url which comes or goes.
When I inspect using Firefox it produces so many links.
Please anyone can give me a clue? Or directly the script?
I am using php to develop a twitter search api which is able to search twitter, and save posted images from tweets.
It all works fine etc, but for facebook, instead of the image being loaded with the webpage, its loaded after in a response. Using firebug, going to the Net tag, I can see the html source code I need under the response tab for a getphoto(). I am looking to grab an img src from this html text, but
Facebook seems to load the basic stucture, then reload the page with the image on it.
My question is: How can I get this 'response body'?
I have used get_headers() before, but I dont think it will work in this situation, and I have trawlled the net looking for an answer to this, but none have appeared.
Any help would be much appreciated, thx in adv.
Dont think my code will help explaining, but willing to put some up
EDIT:
example facebook url: /https://www.facebook.com/photo.php?pid=1258064&l=acb54aab14&id=110298935669685
that would take you to the page containing the image
This is the image tag:
img class="fbPhotoImage img" id="fbPhotoImage" src="https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash3/522357_398602740172635_110298935669685_1258064_1425533517_n.jpg" alt=""
But this does not show up until the response comes through.
I have a get_header funciton in to expand shortened URL's, due to twitters love for them, and this can get and image from other 3rd party photo sites with multiple shortens/redirects.
Have not used cURL before, is it the best/only way?
Thanks again
instead of the image being loaded with the webpage, its loaded after in a response
I don't know what this means.
I can only guess that the URL you are trying to fetch the html from, which your code is expected to parse to extract an image URL, is actually issuing a redirect.
Use curl for your transfers and tell it to follow redirects - NB this will only work with header redirects - not meta equiv redirects, meta refresh redirects nor javascript location redirects.
(maybe Faceboo0k don't want you to leech their content?)
how is it possible to grab web page source from a ajax type web page:
curl doesn't seem to be able to get ajax generated source.
Sorry if duplicate, but looking throw questions didn't find answer.
If the page you want to grab uses ajax to compose different parts of it, then the content does not exist until all the loading is done.
You couldn't do this with curl, as curl acts as a client requesting only the URL you instruct it, but has no javascript engine to interpret the script and load other parts of the page.
If the content you are looking for is in one of the parts loaded through ajax, you should use the chrome inspector -> network tab and see what is the exact URL of the loaded page, then load that page using curl.
Is there any way to detect in PHP is GET request made by embed src in browser?
<img src="xxxxx.php">
I was trying use "REFFER", but it's very not good solution.
I don't know lot about http, but maybe if browser use tag, it send header accept image or anything like this what i can read in php ?
I just want create script what will display picture if embed, but if open in browser in normal way, like url it will redirect user to other page.
Is there any way to detect in PHP is GET request made by embed src in browser?
No.
If you hide your files in a folder outside the root, created a PHP file that all images routed through (image.php?lovely_duck.jpg) and then looked for $_SERVER['HTTP_REFERER'], that would be possible. But you can forge $_SERVER['HTTP_REFERER'] and it isn’t always reliable.
If you’re trying to find a way to stopping people getting your images, they’ll always find a way.
You can (but i would NOT suggest it) check the Accept header.
If the request was triggered by an <img src=... it should NOT contain "text/html".
Also described here
But again: i would not suggest using this.
At the moment that will break the download image functionality in firefox and possible other browser.
Additionally it won't prevent "evil guys" to download all your images because they simply can set the required HTTP header.
I'm new to YQL, and just trying to learn how to do some fairly simple tasks.
Let's say I have a list of URLs and I want to get their HTML source as a string in javascript (so I can later insert it to a database via ajax). How would I go about getting this info back in Javascript? Or would I have to do it in PHP? I'm fine with either, really - whatever can work.
Here's the example queries I'd run on their console:
select * from html where url="http://en.wikipedia.org/wiki/Baroque_music"
And the goal is to essentially save the HTML or maybe just the text or something, as a string.
How would I go about doing this? I somewhat understand how the querying works, but not really how to integrate with javascript and/or php (say I have a list of URLs and I want to loop through them, getting the html at each one and saving it somewhere).
Thanks.
You can't read other pages with Javascript due to a built-in security feature in web browsers. It is called the Same origin policy.
The usual method is to scrape the content of these sites from the server using PHP.
There is an other option with javascript called a bookmarklet.
You can add the bookmarklet in your bookmarks bar, and each time you want the content of a site click the bookmark.
A script will be loaded in the host page, it can read the content and post it back to your server.
Oddly enough, the same origin policy, does not prevent you to POST data from this host page to your domain. You need to POST a FORM to an IFRAME that has a source hosted on your domain.
You won't be able to read the response you get back from the POST.
But you can poll with a setInterval making a JSONP call to your domain to know if the POST was successful.