How to open for each url when I got the strings

How to open for each url when I got the strings - php

I have got a problem with opening the urls. It will not open for each url when I have output the list of urls in the PHP after I have extract the urls from the mysql database.
Here is the php:
<?php
//Connect to the database
require_once('config.php');
$qrytable1="SELECT links FROM channels_list";
$result1=mysql_query($qrytable1) or die('Error:<br />' . $qry . '<br />' . mysql_error());
while ($row = mysql_fetch_array($result1))
{
echo $row["links"];
$baseUrl = file_get_contents($row["links"]);
$domdoc = new DOMDocument();
$domdoc->strictErrorChecking = false;
$domdoc->recover=true;
$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementsByTagName('a');
foreach ($links as $link)
{
echo "we are now opening for each url";
}
}
Here is the output for the urls:
http://example.com.com/some_name/?id=963
http://example.com.com/some_name/?id=102
http://example.com.com/some_name/?id=103
http://example.com.com/some_name/?id=104
http://example.com.com/some_name/?id=171
http://example.com.com/some_name/?id=106
http://example.com.com/some_name/?id=107
http://example.com.com/some_name/?id=108
http://example.com.com/some_name/?id=402
http://example.com.com/some_name/?id=403
http://example.com.com/some_name/?id=404
http://example.com.com/some_name/?id=405
http://example.com.com/some_name/?id=406
http://example.com.com/some_name/?id=408
http://example.com.com/some_name/?id=407
http://example.com.com/some_name/?id=409
http://example.com.com/some_name/?id=435
http://example.com.com/some_name/?id=436
http://example.com.com/some_name/?id=439
http://example.com.com/some_name/?id=440
http://example.com.com/some_name/?id=410
http://example.com.com/some_name/?id=411
http://example.com.com/some_name/?id=413
http://example.com.com/some_name/?id=414
http://example.com.com/some_name/?id=415
http://example.com.com/some_name/?id=417
http://example.com.com/some_name/?id=418
http://example.com.com/some_name/?id=421
I think there is a problem with this code:
$links = $domdoc->getElementsByTagName('a');
I don't have the html tag in my PHP page, it is only show the strings of the actual urls like what I show on above.
What I'm expect is I want to open each url when I get the list of urls from mysql.
Can you please help me with how I can open for each url when I get the urls from mysql database?

i'm not exactly sure, what you mean by "open for each url".
if you want as output a list of links, on which you can click:
while ($row = mysql_fetch_array($result1))
{
echo "<a href='".$row["links"]."'>".$row["links"]."</a>";
}
if you want to download the contents of each url:
while ($row = mysql_fetch_array($result1))
{
$content_string = file_get_contents($row["links"]);
}
$content_string is a content of a page as string, not sure what you want to do with it.

Related

Is there a function to add text to x if x not containing y | PHP

So im working on an url crawler but i get a lot of paths without the domain and http.
And i want to make a function if the path not contain the domain and http in that to add it.
here is my code
<?php
$source_url = 'http://www.google.com/';
$html = file_get_contents($source_url);
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
$input_url = $link->getAttribute('href');
echo $input_url . "<br>";
}
?>
If there is not any how i can just extract the urls containing http

You can use regular expressions to check whether the link is an absolute URL or relative i.e. contains domain or not. What I have done is check the link whether it starts with http:// or https://. If it isn't then the source domain is added to the beginning of the link.
foreach ($links as $link) {
$input_url = $link->getAttribute('href');
if (!preg_match('/^https?:\/\//', $input_url)) {
$input_url = $source_url . preg_replace('/^\//', '', $input_url);
}
echo $input_url . "<br>";
}

Web scraping information from the site in PHP

I do PHP script, the script must copy the list of publications (from the homepage) and copy the information that is inside these publications.
I need to copy content from my previous site and add the content to the new site!
I have some success, my PHP script copies the list of publications on the home page. I need to make a script that pulled information inside each publication (title, photo, full text)!
For this, I wrote a function that extracts a link to each post.
Help me write a function that will copy information on a given link!
<?php
header('Content-type: text/html; charset=utf-8');
require 'phpQuery.php';
function print_arr($arr){
echo '<pre>' . print_r($arr, true) . '</pre>';
}
$url = 'http://goruzont.blogspot.com/';
$file = file_get_contents($url);
$doc = phpQuery::newDocument($file);
foreach($doc->find('.blog-posts .post-outer .post') as $article){
$article = pq($article);
$text = $article->find('.entry-title a')->html();
print_arr($text);
$texturl = $article->find('.entry-title a')->attr('href');
echo $texturl;
$text = $article->find('.date-header')->html();
print_arr($text);
$img = $article->find('.thumb a')->attr('style');
$img."<br>"; if (preg_match('!background:url.(.+). no!',$img,$match)) {
$imgurl = $match[1];
} else
{echo "<img src = http://goruzont.blogspot.com".$item.">";}
echo "<img src='$imgurl'>";
}
?>

PHP and MySql Web Crawler

I need some help with my web crawler exercise.I should write a crawler for saving the first pages of some websites and all of their content in a MySql database. I am using xampp MySql database. Here is my code:
<?php
$main_url="webpage";
$str = file_get_contents($main_url);
// Gets Webpage Title
if(strlen($str)>0)
{
$str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
$title=$title[1];
}
// Gets Webpage Description
$b =$main_url;
#$url = parse_url( $b );
#$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
$description=$tags['description'];
// Gets Webpage Internal Links
$doc = new DOMDocument;
#$doc->loadHTML($str);
$items = $doc->getElementsByTagName('a');
foreach($items as $value)
{
$attrs = $value->attributes;
$sec_url[]=$attrs->getNamedItem('href')->nodeValue;
}
$all_links=implode(",",$sec_url);
//Gets Webpage images
require_once('C:\xampp\htdocs\simple_html_dom.php');
require_once('C:\xampp\htdocs\url_to_absolute.php');
$url='webpage'
$html=file_get_html('webpage');
foreach($html->find('img') as $element) {
echo url_to_absolute($url, $element->src), "\n";
}
$images=
// Store Data In Database
$host="localhost";
$username="root";
$password="";
$databasename="db";
$connect=mysql_connect($host,$username,$password);
$db=mysql_select_db($databasename);
mysql_query("insert into webpage_details values('$main_url','$title','$description','$all_links','$images')");
?>
I should save everything from a home page, i have a problem saving the images. Any ideas?
Are there any other items I need to save?
Thanks!

Display rss xml index in html

I am adding an RSS feed to my website. I created the RSS.xml index file and next I want to display its contents in a nicely formatted way in a webpage.
Using PHP, I can do this:
$index = file_get_contents ($path . 'RSS.xml');
echo $index;
But all that does is dump the contents as a long stream of text with the tags removed.
I know that treating RSS.xml as a link, like this:
<a href="../blogs/RSS.xml">
<img src="../blogs/feed-icon-16.gif">Blog Index
</a>
causes my browser to parse and display it in a reasonable way when the user clicks on the link. However I want to embed it directly in the web page and not make the user go through another click.
What is the proper way to do what I want?

Use the following code:
include_once('Simple/autoloader.php');
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->enable_cache(false);
$feed->set_output_encoding('utf-8');
$feed->init();
$i=0;
$items = $feed->get_items();
foreach ($items as $item) {
$i++;
/*You are getting title,description,date of your rss by the following code*/
$title = $item->get_title();
$url = $item->get_permalink();
$desc = $item->get_description();
$date = $item->get_date();
}
Download the Simple folder data from : https://github.com/jewelhuq/Online-News-Grabber/tree/master/worldnews/Simple
Hope it will work for you. There $url mean your rss feed url. If you works then response.

Turns out, it's simple by using the PHP xml parer function:
$xml = simplexml_load_file ($path . 'RSS.xml');
$channel = $xml->channel;
$channel_title = $channel->title;
$channel_description = $channel->description;
echo "<h1>$channel_title</h1>";
echo "<h2>$channel_description</h2>";
foreach ($channel->item as $item)
{
$title = $item->title;
$link = $item->link;
$descr = $item->description;
echo "<h3><a href='$link'>$title</a></h3>";
echo "<p>$descr</p>";
}

Replacing link with plain text with php simple html dom

I have a program that removes certain pages from a web; i want to then traverse the remaining pages and "unlink" any links to those removed pages. I'm using simplehtmldom. My function takes a source page ($source) and an array of pages ($skipList). It finds the links, and I'd like to then manipulate the dom to convert the element into the $link->innertext, but I don't know how. Any help?
function RemoveSpecificLinks($source, $skipList) {
// $source is the html source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->href = ''; // Should convert to simple text element
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}

I have never used simplehtmldom, but this is what I think should solve your problem:
function RemoveSpecificLinks($source, $skipList) {
// $source is the HTML source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->outertext = $link->plaintext; // THIS SHOULD WORK
// IF THIS DOES NOT WORK TRY:
// $link->outertext = $link->innertext;
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}
Please provide me some feedback as if this worked or not, also specifying which method worked, if any.
Update: Maybe you would prefer this:
$link->outertext = $link->href;
This way you get the link displayed, but not clickable.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to open for each url when I got the strings - php

Related

Is there a function to add text to x if x not containing y | PHP

Web scraping information from the site in PHP

PHP and MySql Web Crawler

Display rss xml index in html

Replacing link with plain text with php simple html dom

Categories

Resources