Get content in faster way from url using php - php

I am using php, I want to get the content from url in faster way.
Here is a code which I use.
Code:(1)
<?php
$content = file_get_contents('http://www.filehippo.com');
echo $content;
?>
Here is many other method to read files like fopen(), readfile() etc. But I think file_get_contents() is faster than these method.
In my above code when you execute it you see that it give every thing from this website even images and ads. I want to get only plan html text no css-style, images and ads. How can I get this.
See this to understand.
CODE:(2)
<?php
$content = file_get_contents('http://www.filehippo.com');
// do something to remove css-style, images and ads.
// return the plain html text in $mod_content.
echo $mod_content;
?>
If I do that like above then I am going in wrong way, because I already get the full content in variable $content and then modify it.
Can here is any function method or anything else which get the directly plain html text from url.
Below code is written only to understanding, this is not the original php code.
IDEAL CODE:(3);
<?php
$plain_content = get_plain_html('http://www.filehippo.com');
echo $plain_content; // no css-style, images and ads.
?>
If I can get this function it will be much faster than others. Can it is possible.
Thanks.

Try this.
$content = file_get_contents('http://www.filehippo.com');
$this->html = $content;
$this->process();
function process(){
// header
$this->_replace('/.*<head>/ism', "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE html PUBLIC '-//WAPFORUM//DTD XHTML Mobile 1.0//EN' 'http://www.wapforum.org/DTD/xhtml-mobile10.dtd'><html xmlns='http://www.w3.org/1999/xhtml'><head>");
// title
$this->_replace('/<head>.*?(<title>.*<\/title>).*?<\/head>/ism', '<head>$1</head>');
// strip out divs with little content
$this->_stripContentlessDivs();
// divs/p
$this->_replace('/<div[^>]*>/ism', '') ;
$this->_replace('/<\/div>/ism','<br/><br/>');
$this->_replace('/<p[^>]*>/ism','');
$this->_replace('/<\/p>/ism', '<br/>') ;
// h tags
$this->_replace('/<h[1-5][^>]*>(.*?)<\/h[1-5]>/ism', '<br/><b>$1</b><br/><br/>') ;
// remove align/height/width/style/rel/id/class tags
$this->_replace('/\salign=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sheight=(\'?\"?).*?\\1/ism','');
$this->_replace('/\swidth=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sstyle=(\'?\"?).*?\\1/ism','');
$this->_replace('/\srel=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sid=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sclass=(\'?\"?).*?\\1/ism','');
// remove coments
$this->_replace('/<\!--.*?-->/ism','');
// remove script/style
$this->_replace('/<script[^>]*>.*?\/script>/ism','');
$this->_replace('/<style[^>]*>.*?\/style>/ism','');
// multiple \n
$this->_replace('/\n{2,}/ism','');
// remove multiple <br/>
$this->_replace('/(<br\s?\/?>){2}/ism','<br/>');
$this->_replace('/(<br\s?\/?>\s*){3,}/ism','<br/><br/>');
//tables
$this->_replace('/<table[^>]*>/ism', '');
$this->_replace('/<\/table>/ism', '<br/>');
$this->_replace('/<(tr|td|th)[^>]*>/ism', '');
$this->_replace('/<\/(tr|td|th)[^>]*>/ism', '<br/>');
// wrap and close
}
private function _replace($pattern, $replacement, $limit=-1){
$this->html = preg_replace($pattern, $replacement, $this->html, $limit);
}
for more - https://code.google.com/p/phpmobilizer/

you can use regular expression to delete css-script's tags and image's tags, just replace those codes with blank space
preg_replace($pattern, $replacement, $string);
for more detail of function go here: http://php.net/manual/en/function.preg-replace.php

Related

Is it possible to change original html text in php?

I am trying to make "manner friendly" website. We use different declination dependent on gender and other factors. For example:
You did = robili
It did = robilo
She did = robila
Linguisticaly this is very simplified (and unlucky) example! I would like to change html text in php file where appropriate. For example
<? php
something
?>
html text of the page and somewhere is the word "robil"
<div>we tried to robil^i|o|a^</div>
<? php something ?>
Now I would like to replace all occurences of different tokens ^characters|characters|characters^ and replace them by one of their internal values according to "gender".
It is easy in javascript on the client side, but you will see all this weird "tokenizing" before javascript replace it.
Here I do not know the elegant solution.
Or do you have better idea?
Thanks for advice.
You can add these scripts before and after the HTML:
<?php
// start output buffering
ob_start();
?>
<html>
<body>
html text of the page and somewhere is the word "robil"
<div>we tried to robil^i|o|a^, but also vital^si|sa|ste^, borko^mal|mala|malo^ </div>
</body>
</html>
<?php
$use = 1; // indicate which declination to use (0,1 or 2)
// get buffered html
$html = ob_get_contents();
ob_end_clean();
// match anything between '^' than's not a control chr or '^', min 5 and max 20 chrs.
if (preg_match_all('/\^[^[:cntrl:]\^]{3,20}\^/',$html,$matches))
{
// replace all
foreach (array_unique($matches[0]) as $match)
{
$choices = explode('|',trim($match,'^'));
$html = str_replace($match,$choices[$use],$html);
}
}
echo $html;
This returns:
html text of the page and somewhere is the word "robil" we tried to
robilo, but also vitalsa, borkomala

How to chain in phpquery (almost everything can be a chain)

Good day everyone,
I'm very new with phpquery and this is my first post here at stackoverflow for a reason that i cant find the correct for syntax for the phpquery chaining. I know someone knows what i been looking for.
I only want to remove the a certain div inside a div.
<div id = "content">
<p>The text that i want to display</p>
<div class="node-links">Stuff i want to remove</div>
</content>
This few lines of codes works perfect
pq('div.node-links')->remove();
$text = pq('div#content');
print $text; //output: The text that i want to display
But when I tried
$text = pq('div#content')->removeClass('div.node-links'); //or
$text = pq('div#content')->remove('div.node-links');
//output: The text that i want to display (+) Stuff i want to remove
Can someone tell me why the second block of code is not working?
Thanks!
The first line of code will only work if your trying to remove the class from div.node-links, it won't remove the node.
If you are trying to remove the class you need to change it from:
$text = pq('div#content')->removeClass('div.node-links');
// to
$text = pq('div#content')->find('.node-links')->removeClass('node-links')->end();
which will output:
<div id="content">
<p>The text that i want to display</p>
<div>Stuff i want to remove</div>
</div>
As for the second line of code.. I'm not exactly sure why it is not working, it seems like your not selecting .node-links but I was able to get the desired results using these.
// $markup = file_get_contents('test.html');
// $doc = phpQuery::newDocumentHTML($markup);
$text = $doc->find('div#content')->children()->remove('.node-links')->end();
// or
$text = pq('div#content')->find('.node-links')->remove()->end();
// or
$text = pq('div#content > *')->remove('.node-links')->parent();
Hope that helps
Since remove() does not take any parameter, you can do:
$text = pq('div#content div.node-links')->remove();

Save the contents of manipulated div to a variable and pass to php file

I have tried to use AJAX, but nothing I come up with seems to work correctly. I am creating a menu editor. I echo part of a file using php and manipulate it using javascript/jquery/ajax (I found the code for that here: http://www.prodevtips.com/2010/03/07/jquery-drag-and-drop-to-sort-tree/). Now I need to get the edited contents of the div (which has an unordered list in it) I am echoing and save it to a variable so I can write it to the file again. I couldn't get that resource's code to work so I'm trying to come up with another solution.
If there is a code I can put into the $("#save").click(function(){ }); part of the javascript file, that would work, but the .post doesn't seem to want to work for me. If there is a way to initiate a php preg_match in an onclick, that would be the easiest.
Any help would be greatly appreciated.
The code to get the file contents.
<button id="save">Save</button>
<div id="printOut"></div>
<?php
$header = file_get_contents('../../../yardworks/content_pages/header.html');
preg_match('/<div id="nav">(.*?)<\/div>/si', $header, $list);
$tree = $list[0];
echo $tree;
?>
The code to process the new div and send to php file.
$("#save").click(function(){
$.post('edit-menu-process.php',
{tree: $('#nav').html()},
function(data){$("#printOut").html(data);}
);
});
Everything is working EXCEPT something about my encoding of the passed data is making it not read as html and just plaintext. How do I turn this back into html?
EDIT: I was able to get this to work correctly. I'll make an attempt to switch this over to DOMDocument.
$path = '../../../yardworks/content_pages/header.html';
$menu = htmlentities(stripslashes(utf8_encode($_POST['tree'])), ENT_QUOTES);
$menu = str_replace("<", "<", $menu);
$menu = str_replace(">", ">", $menu);
$divmenu = '<div id="nav">'.$menu.'</div>';
/* Search for div contents in $menu and save to variable */
preg_match('/<div id="nav">(.*?)<\/div>/si', $divmenu, $newmenu);
$savemenu = $newmenu[0];
/* Get file contents */
$header = file_get_contents($path);
/* Find placeholder div in user content and insert slider contents */
$final = preg_replace('/<div id="nav">(.*?)<\/div>/si', $savemenu, $header);
/* Save content to original file */
file_put_contents($path, $final);
?>
Menu has been saved.
To post the contents of a div with ajax:
$.post('/path/to/php', {
my_html: $('#my_div').html()
}, function(data) {
console.log(data);
});
If that's not what you need, then please post some code with your question. It is very vague.
Also, you mention preg_match and html in the same question. I see where this is going and I don't like it. You can't parse [X]HTML with regex. Use a parser instead. Like this: http://php.net/manual/en/class.domdocument.php

Tag stripping allowing some html tags - Facebook-ish

I am doing something like posting function in a local app it's working fine really but it lacks with validation and not to mention the validation I made was a mess. I'm using jQuery oEmbed.
What I wanted is to print the illegal html tag(s) as is and activate/perform(I don't know the right term) the html tags I have allowed.
Any suggestions?
This is the best solution i came up.
First replaced all the < and > for html code then replaced back the allowed tags.
<?php
$original_str = "<html><b>test</b><strong>teste</strong></html>";
$allowed_tags = array("b", "strong");
$sans_tags = str_replace(array("<", ">"), array("<",">"), $original_str);
$regex = sprintf("~<(/)?(%s)>~", implode("|",$allowed_tags));
$with_allowed = preg_replace($regex, "<\\1\\2>", $sans_tags);
echo $with_allowed;
echo "\n";
Result:
guax#trantor:~$ php teste.php
<html><b>test</b><strong>teste</strong></html&gt
I wonder if there's any solution for replacing all at once. But it works.

PHP - remove <img> tag from string

Hey, I need to delete all images from a string and I just can't find the right way to do it.
Here is what I tryed, but it doesn't work:
preg_replace("/<img[^>]+\>/i", "(image) ", $content);
echo $content;
Any ideas?
Try dropping the \ in front of the >.
Edit: I just tested your regex and it works fine. This is what I used:
<?
$content = "this is something with an <img src=\"test.png\"/> in it.";
$content = preg_replace("/<img[^>]+\>/i", "(image) ", $content);
echo $content;
?>
The result is:
this is something with an (image) in it.
You need to assign the result back to $content as preg_replace does not modify the original string.
$content = preg_replace("/<img[^>]+\>/i", "(image) ", $content);
I would suggest using the strip_tags method.
Sean it works fine i've just used this code
$content = preg_replace("/<img[^>]+\>/i", " ", $content);
echo $content;
//the result it's only the plain text. It works!!!
I wanted to display the first 300 words of a news story as a preview which unfortunately meant that if a story had an image within the first 300 words then it was displayed in the list of previews which really messed with my layout. I used the above code to hide all of the images from the string taken from my database and it works wonderfully!
$news = $row_latest_news ['content'];
$news = preg_replace("/<img[^>]+\>/i", "", $news);
if (strlen($news) > 300){
echo substr($news, 0, strpos($news,' ',300)).'...';
}
else {
echo $news;
}
$this->load->helper('security');
$h=mysql_real_escape_string(strip_image_tags($comment));
If user inputs
<img src="#">
In the database table just insert character this #
Works for me
simply use the form_validation class of codeigniter:
strip_image_tags($str).
$this->load->library('form_validation');
$this->form_validation->set_rules('nombre_campo', 'label', 'strip_image_tags');

Categories