How to display image url from website sub pages using php code - php

I am using below mentioned php code to display images from webpages.Below mentioned code is able to display image url from main page but unable to display image urls from sub pages.
enter code here
<?php
include_once('simple_html_dom.php');
$target_url = "http://fffmovieposters.com/";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('img') as $img)
{
echo $img->src."<br />";
echo $img."<br/>";
}
?>

If by sub-page you mean a page that http://fffmovieposters.com is linking to, then of course that script won't show any of those since you're not loading those pages.
You basically have to write a spider that not only finds images, but also anchor tags and then repeats the process for those links. Just remember to add some filters so that you don't process pages more than once or start processing the entire internet by following external links.
Pseudo'ish code
$todo = ['http://fffmovieposters.com'];
$done = [];
$images = [];
while( ! empty($todo))
$link = array_shift($todo);
$done[] = $link;
$html = get html;
$images += find <img> tags
$newLinks = find <a> tags
remove all external links and all links already in $done from $newLinks
$todo += $newLinks;
Or something like that...

Related

Trouble printing array of titles and urls

I'm not very good at PHP.
I have a folder of .html files that will change often, and then I want to search through the folder, parse the <h1> tags, and then print/echo each <h1> tag and its url.
Getting the <h1> tags out of the .html files was easy enough with some Googling, but I cannot seem to print a list of <h1> titles and their corresponding URL's.
Here is what I have so far:
$url_list = glob('posts/*.html'); // Searches for all files and folders in above directory that end in .html.
foreach ($url_list as $url) { // Creates an array of post URL's and title <h1> tags.
$post = new DOMDocument(); // Creates string to load blog post.
$post->loadHTMLFile($url); // Loads blog post into string $post from its URL.
$h1_tags = $post->getElementsByTagName('h1'); // Finds all <h1> tags.
$first_h1 = $h1_tags->item(0); // Gets value of first <h1> tag.
$title = $first_h1->nodeValue; // Sets $title to value of first <h1> tag.
if (!empty($title)) { // Will only run on files which have a date in their metadata.
$post_list[$url] = $title;
$post_list[$title] = $url;
}
}
sort($post_list); // Sorts list of posts in alphabetical order.
$num = 1;
foreach ($post_list as $title) { //
echo "<h2>".($num++).". {$title} = {$url}</h2>";
}
You are adding the titles and URL's into the same list - but reversed. If you build up your data as...
if (!empty($title)) { // Will only run on files which have a date in their metadata.
$post_list[$title] = $url;
}
So this only adds it in once, and then output it like...
foreach ($post_list as $title => $url) { //
echo "<h2>".($num++).". {$title} = {$url}</h2>";
}
Edit: change sort() to asort()

Change src atribute from img, using Simple HTML Dom php library

I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?
i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!
did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;

scraping images from url using php

i am trying to make a page that allows me to grab and save images from another link , so here's what i want to add on my page:
text box (to enter url that i want to get images from).
save dialog box to specify the path to save images.
but what i am trying to do here i want to save images only from that url and from inside specific element.
for example on my code i say go to example.com and from inside of element class="images" grab all images.
notes: not all images from the page, just from inside the element
whether element has 3 images in it or 50 or 100 i don't care.
here's what i tried and worked using php
<?php
$html = file_get_contents('http://www.tgo-tv.net');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
echo $matches[ 1 ][ 0 ];
?>
this gets image name and path but what i am trying to make is a save dialog box and the code must save image directly into that path instead of echo it out
hope you understand
Edit 2
it's ok of Not having save dialog box. i must specify save path from the code
If you want something generic, you can use:
<?php
$the_site = "http://somesite.com";
$the_tag = "div"; #
$the_class = "images";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#class,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
print $img_src."\n";
}
Usage:
Change the site, tag, which can be a div, span, a, etc. also change the class name.
For example, change the values to:
$the_site = "https://stackoverflow.com/questions/23674744/what-is-the-equivalent-of-python-any-and-all-functions-in-javascript";
$the_tag = "div"; #
$the_class = "gravatar-wrapper-32";
Output:
https://www.gravatar.com/avatar/67d8ca039ee1ffd5c6db0d29aeb4b168?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24da669dda96b6f17a802bdb7f6d429f?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24780fb6df85a943c7aea0402c843737?s=32&d=identicon&r=PG
Maybe you should try HTML DOM Parser for PHP. I've found this tool recently and to be honest it works pretty well. It was JQuery-like selectors as you can see on the site. I suggest you to take a look and try something like:
<?php
require_once("./simple_html_dom.php");
foreach ($html->find("<tag>") as $<tag>) //Start from the root (<html></html>) find the the parent tag you want to search in instead of <tag> (e.g "div" if you want to search in all divs)
{
foreach ($<tag>->find("img") as $img) //Start searching for img tag in all (divs) you found
{
echo $img->src . "<br>"; //Output the information from the img's src attribute (if the found tag is <img src="www.example.com/cat.png"> you will get www.example.com/cat.png as result)
}
}
?>
I hope i helped you less or more.

Get and return media url (m3u8) using PHP

I have a website that hosts videos from a client. On the website the files load externally via m3u8 link.
The client would now like to have those videos on a Roku channel.
If I simply use the m3u8 link from the site it gives an error because the url generated is sent with a cookie and so a client must click and the link to generate a new code for them.
I would like if possible (and I have not seen this here) is to scrape the html page and just return the link via PHP script on the website from the Roku.
I know how to get titles and such using pure php but am having problems returning the m3u8 link..
I do have code to show I am not looking for handouts and actually am trying.
This is what I have used for getting the title name for example.
Note: I would like to know if it is possible to have one php that autofills the html page per url so I do not have to use a different php for each video with the url pretyped in.
<?php
$html = file_get_contents('http://example.com'); //get the html returned from the following url
$movie_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$movie_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$movie_xpath = new DOMXPath($movie_doc);
//get all the titles
$movie_row = $movie_xpath->query('//title');
if($movie_row->length > 0){
foreach($movie_row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
?>
There is a simple approach for this, which involves using regex.
In this example let's say the video M3u8 file is located at: http://example.com/theVideoPage
You would point the video URL Source in your XML to your PHP file.
http://thisPhpFileLocation.com
<?php
$html = file_get_contents("http://example.com/theVideoPage");
preg_match_all(
'/(http.*m3u8)/',
$html,
$posts, // will contain the article data
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($posts as $post) {
$link = $post[0];
header("Location: $link");
}
?>
Now if you want to use a URL that you can append a URL link at the end it could look something like this and you would use an address as such for a Video Url located at
http://thisPhpFileLocation.com?id=theVideoPage
<?php
$id = $_GET['id'];
$html = file_get_contents("http://example.com".$id);
preg_match_all(
'/(http.*m3u8)/',
$html,
$things, // will contain the article data
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($things as $thing) {
$link = $thing[1];
// clear out the output buffer
while (ob_get_status())
{
ob_end_clean();
}
// no redirect
header("Location: $link");
}
?>

PHP - Get all images from class with simple html dom parser

I need to get all images from the info box in Wikipedia page. I made this code but it gets all images from the page not only for the info box ,i need some help.
include("simple_html_dom.php");
$wikilink = "http://en.wikipedia.org/wiki/Aberdeen_F.C.";
//Wikipedia page to parse
$html = file_get_html($wikilink);
$images_array = array();
foreach ($html->find('table.infobox vcard td, img') as $element) {
$allimages = strtok($element->src . '|', '|');
array_push($images_array, $allimages);
}
print_r($images_array);
The below example shows the html elements what i want to get

Categories