how to parse a page including embedded page/ iframe?

how to parse a page including embedded page/ iframe? - php

I want to parse a streaming website in PHP to find the streaming links and codes.
The streaming page ie index.htm looks like
<html>
<iframe> something something src= xyz.aspx?id=abc <iframe>
<\html>
When I parse the main page I can see those codes..but I need to parse the code of xyz.asx?id....
If I parse that page directly it doesnt show the streaming codes...
The main page and the iframe page somehow connected...
Is there any way to parse both at the same time..perhaps with referring or something..so the the second page thinks as if it is loaded on the stream page.
Any clue? Curl?

Since it seems like you are only searching for a couple codes, you might want to write a regex (GASP! I KNOW, REGEX + HTML = BAD, hear me out). I'd find a regex that captures all <iframe> tags. Then parse those for the src attribute, and then get the information that you need.
Edit: I don't think I understood your question entirely, you also want to embed the iframe'd page as if it were viewed in a browser?
If so, read on!
Ok, now that you have the src attribute of the <iframe>, just download the linked page (I'm a little rusty with PHP, I'm sure you can find one of the thousands of functions that would work) and str_replace() the <iframe> tag with the downloaded code

Related

Load entire html file to parse with php?

i try to parse a website's html file with php fopen(). Thats works so far very well but the problem is, that there are serveral posts on the site that aren't shown in the html file, because u have to scroll far down until the posts load.
As an example, i try to count the total amount of comments in my own facebook page. (Just an example, if it's shown somewhere on facebook, that doesn't help me)
How can I make the html file load completely?
Thank you

you cannot, directly. What you are doing is called scraping. You have to inspect the queries made by the browser in your developer tools when viewing that page yourself, and reproduce those queries in php through fopen() or any other means (CUrl, etc...)

Get Header Response Body

I am using php to develop a twitter search api which is able to search twitter, and save posted images from tweets.
It all works fine etc, but for facebook, instead of the image being loaded with the webpage, its loaded after in a response. Using firebug, going to the Net tag, I can see the html source code I need under the response tab for a getphoto(). I am looking to grab an img src from this html text, but
Facebook seems to load the basic stucture, then reload the page with the image on it.
My question is: How can I get this 'response body'?
I have used get_headers() before, but I dont think it will work in this situation, and I have trawlled the net looking for an answer to this, but none have appeared.
Any help would be much appreciated, thx in adv.
Dont think my code will help explaining, but willing to put some up
EDIT:
example facebook url: /https://www.facebook.com/photo.php?pid=1258064&l=acb54aab14&id=110298935669685
that would take you to the page containing the image
This is the image tag:
img class="fbPhotoImage img" id="fbPhotoImage" src="https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash3/522357_398602740172635_110298935669685_1258064_1425533517_n.jpg" alt=""
But this does not show up until the response comes through.
I have a get_header funciton in to expand shortened URL's, due to twitters love for them, and this can get and image from other 3rd party photo sites with multiple shortens/redirects.
Have not used cURL before, is it the best/only way?
Thanks again

instead of the image being loaded with the webpage, its loaded after in a response
I don't know what this means.
I can only guess that the URL you are trying to fetch the html from, which your code is expected to parse to extract an image URL, is actually issuing a redirect.
Use curl for your transfers and tell it to follow redirects - NB this will only work with header redirects - not meta equiv redirects, meta refresh redirects nor javascript location redirects.
(maybe Faceboo0k don't want you to leech their content?)

PHP extract article excerpt from a page

anyone have any idea how to generate excerpt from any given article page (so could source from many type of sites)? Something like what facebook does when you paste a url into the post. Thank you.

What you're looking to do is called web scraping. The basic method for doing so would be to capture the page (you can scrape a URL using file_get_contents), and then somehow parse it for the content that you want (ie. pull out content from the <body> tag).
In order to parse the returned HTML, you should use a DOM parser. PHP has its own DOM classes which you can use.

Here is a video tutorial about how to do that:
http://net.tutsplus.com/tutorials/php/how-to-create-blog-excerpts-with-php/

How to hide an iframe url in HTML source code

How to hide an iframe url in HTML source code.I have two applications one applications get an url of another application into its iFrame,so that it displays in its source code.I dont want to display another application url in the source code.

I think you would need to set the IFRAME URL via JavaScript. The Javascript could then be Obfuscated, so that the URL would not be in plain text... Please see the following link for the obfuscator:
http://www.javascriptobfuscator.com/Default.aspx
i.e. if it was jQuery...
$("#myiFrame").attr('src','http://www.google.com');
becomes:
var _0xc1cb=["\x73\x72\x63","\x68\x74\x74\x70\x3A\x2F\x2F\x77\x77\x77\x2E\x67\x6F\x6F\x67\x6C\x65\x2E\x63\x6F\x6D","\x61\x74\x74\x72","\x23\x6D\x79\x69\x46\x72\x61\x6D\x65"];$(_0xc1cb[3])[_0xc1cb[2]](_0xc1cb[0],_0xc1cb[1]);

You can't hide it per say, but you can run it through something like TinyURL so that anyone interested would need to go an extra step. Anyway, that's the only thing I can think of. However, if you are displaying that page in a frame, what's the harm in having the URL in the source code? There really isn't a good, foolproof way to prevent someone determined from finding out the location of that iframe page.

You can create a php script which uses curl to call the url through localhost, then use this script as your iframe source.
If you have an issue with relative links and sub-directories, you can put your curl script inside the sub-directory.

file_get_contents - also get the pictures

Hello there everybody! I've run into a problem lately when coding in PHP and file_get_contents. My problem is that when i load a website like this:
<? echo file_get_contents($_GET['url']); ?>
The pictures of the website i load doesn't show. For example when I go to Google, no pictures are shown. This is for every website i visit. How can i fix this?

The HTML page you are displaying assumes you also have the images available, which you don't as they are on the original page's server (e.g. Google.com).
The quickest way to make sure everything on the HTML page loads is to add <base href="http://www.google.com/" />. This tells the browser to go back to the original path for the rest of the contents including images, CSS, scripts etc.
You'll want to inject that between the <head></head> of the HTML page you're displaying. You could use a regular expression or Simple HTML DOM.
Hope that helps

Don't do this. You're stealing other web sites' content. It also doesn't work well, as you've noticed, since all relative URLs are broken.
Can you use an iframe instead? As in:
<iframe src="<?php echo htmlspecialchars($_GET['url']) ?>"></iframe>
This is nicer since you're not hiding the web site you're proxying from the end user.

I think this is because the image urls are relative <img src="/img/foo.png"> meaning it looks for the image on your server instead and say googles. Fixing this requires looking through all urls in the source and changing them from relative to absolute.

file_get_contents() does what it says, gets the content of the file or Url supplied as argument.
An HTML page doesn't have images inside it, they're not the page's content, the HTML page only has references to external files, that have their own content.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.