How to chain in phpquery (almost everything can be a chain) - php

Good day everyone,
I'm very new with phpquery and this is my first post here at stackoverflow for a reason that i cant find the correct for syntax for the phpquery chaining. I know someone knows what i been looking for.
I only want to remove the a certain div inside a div.
<div id = "content">
<p>The text that i want to display</p>
<div class="node-links">Stuff i want to remove</div>
</content>
This few lines of codes works perfect
pq('div.node-links')->remove();
$text = pq('div#content');
print $text; //output: The text that i want to display
But when I tried
$text = pq('div#content')->removeClass('div.node-links'); //or
$text = pq('div#content')->remove('div.node-links');
//output: The text that i want to display (+) Stuff i want to remove
Can someone tell me why the second block of code is not working?
Thanks!

The first line of code will only work if your trying to remove the class from div.node-links, it won't remove the node.
If you are trying to remove the class you need to change it from:
$text = pq('div#content')->removeClass('div.node-links');
// to
$text = pq('div#content')->find('.node-links')->removeClass('node-links')->end();
which will output:
<div id="content">
<p>The text that i want to display</p>
<div>Stuff i want to remove</div>
</div>
As for the second line of code.. I'm not exactly sure why it is not working, it seems like your not selecting .node-links but I was able to get the desired results using these.
// $markup = file_get_contents('test.html');
// $doc = phpQuery::newDocumentHTML($markup);
$text = $doc->find('div#content')->children()->remove('.node-links')->end();
// or
$text = pq('div#content')->find('.node-links')->remove()->end();
// or
$text = pq('div#content > *')->remove('.node-links')->parent();
Hope that helps

Since remove() does not take any parameter, you can do:
$text = pq('div#content div.node-links')->remove();

Related

How to format plaintext in PHP Simple HTML DOM Parser?

I'm trying to extract the content of a webpage in plain text - without the html tags. Here's some sample code:
$dom = \Sunra\PhpSimple\HtmlDomParser::file_get_html($url);
$result['body'] = $dom->find('body', 0)->plaintext;
The problem is that what I get in $result['body'] is very messy. The HTML was removed, sure, but sentences often merge into others since there are no spaces or periods to delimit where the text from one HTML tag ended, and text from the following tag begins.
An example:
<body>
<div class="H2">Header</div>
<div class="P">this is a paragraph</div>
<div class="P">this is another paragraph</div>
</body>
Results in:
"Headerthis is a paragraphthis is another paragraph"
Desired result:
"Header. this is a paragraph. this is another paragraph"
Is there any way to format the result from plaintext or perhaps apply extra manipulation on the innertext before using plaintext to achieve clear delimiters for sentences?
EDIT:
I'm thinking of doing something like this:
foreach($dom->find('div') as $element) {
$text = $element->plaintext;
$result['body'] .= $text.'. ';
}
but there's a problem when the divs are nested, since it would add the content of the parent, which includes text from all children, and then add the content of the children, effectively duplicating the text. This can be fixed simply by checking if there is a </div> inside the $text though.
Perhaps I should try callbacks.
Possibly something like this? Tested.
<?php
require_once 'vendor/autoload.php';
$dom = \Sunra\PhpSimple\HtmlDomParser::file_get_html("index.html");
$result['body'] = implode('. ', array_map(function($element) {
return $element->plaintext;
}, $dom->find('div')));
echo $result['body'];
<body>
<div class="H2">Header</div>
<div class="P">this is a paragraph</div>
<div class="P">this is another paragraph</div>
</body>
Try this code:
$result = array();
foreach($html->find('div') as $e){
$result[] = $e->plaintext;
}

Is it possible to change original html text in php?

I am trying to make "manner friendly" website. We use different declination dependent on gender and other factors. For example:
You did = robili
It did = robilo
She did = robila
Linguisticaly this is very simplified (and unlucky) example! I would like to change html text in php file where appropriate. For example
<? php
something
?>
html text of the page and somewhere is the word "robil"
<div>we tried to robil^i|o|a^</div>
<? php something ?>
Now I would like to replace all occurences of different tokens ^characters|characters|characters^ and replace them by one of their internal values according to "gender".
It is easy in javascript on the client side, but you will see all this weird "tokenizing" before javascript replace it.
Here I do not know the elegant solution.
Or do you have better idea?
Thanks for advice.
You can add these scripts before and after the HTML:
<?php
// start output buffering
ob_start();
?>
<html>
<body>
html text of the page and somewhere is the word "robil"
<div>we tried to robil^i|o|a^, but also vital^si|sa|ste^, borko^mal|mala|malo^ </div>
</body>
</html>
<?php
$use = 1; // indicate which declination to use (0,1 or 2)
// get buffered html
$html = ob_get_contents();
ob_end_clean();
// match anything between '^' than's not a control chr or '^', min 5 and max 20 chrs.
if (preg_match_all('/\^[^[:cntrl:]\^]{3,20}\^/',$html,$matches))
{
// replace all
foreach (array_unique($matches[0]) as $match)
{
$choices = explode('|',trim($match,'^'));
$html = str_replace($match,$choices[$use],$html);
}
}
echo $html;
This returns:
html text of the page and somewhere is the word "robil" we tried to
robilo, but also vitalsa, borkomala

How to extract HTML element from a source file

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. In case it's pure HTML, DOM parser could be used; in case there is no DIV in DIV, I can imagine how to use preg_match. This is what I am trying to do - I have a code (loaded into a string) like:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div>
<div>
<img >
</div>
</div>
</div>
and my task is to replace content of "mydiv" DIV with a new one e.g.
<div id="newdiv>
some text
</div>
so the string will look like this after the change:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div id="newdiv>
some text
</div>
</div>
I have already tried:
1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included.
2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\/div>/', $src, $matches), which fails in case more child divs are included.
The best approach I have found so far is:
1) find id="mydiv" string
2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea)
3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange
This is quite "heavy" solution, which can stop working in some cases, where the code is different (e.g. onpage PHP code contains the chars as well instead of just simple "include"). So I am looking so some better solution.
You could try something like this:
$file = 'filename.php';
$content = file_get_contents($file);
$array_one = explode( '<div id="mydiv">' , $content );
$my_div_content = explode("</div>" , $array_one[1] )[0];
Or use preg_match like you said:
preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)
Yes there is. First you need to use a function that will get the content of the file. Lets call the file homepage.php:
$homepageString = file_get_contents('homepage.php');
Now you have a string with all the content. The next thing you would do is use the preg_replace() function to take out the part of code that you want to take out:
$newHomepageString = preg_replace('/id="mydiv"/',"", $homepageString);
Now you overwrite the existing homepage.php file with the new source code:
file_put_contents("homepage.php", $newHomepageString);
Let me know if it worked for you! :)

Get content in faster way from url using php

I am using php, I want to get the content from url in faster way.
Here is a code which I use.
Code:(1)
<?php
$content = file_get_contents('http://www.filehippo.com');
echo $content;
?>
Here is many other method to read files like fopen(), readfile() etc. But I think file_get_contents() is faster than these method.
In my above code when you execute it you see that it give every thing from this website even images and ads. I want to get only plan html text no css-style, images and ads. How can I get this.
See this to understand.
CODE:(2)
<?php
$content = file_get_contents('http://www.filehippo.com');
// do something to remove css-style, images and ads.
// return the plain html text in $mod_content.
echo $mod_content;
?>
If I do that like above then I am going in wrong way, because I already get the full content in variable $content and then modify it.
Can here is any function method or anything else which get the directly plain html text from url.
Below code is written only to understanding, this is not the original php code.
IDEAL CODE:(3);
<?php
$plain_content = get_plain_html('http://www.filehippo.com');
echo $plain_content; // no css-style, images and ads.
?>
If I can get this function it will be much faster than others. Can it is possible.
Thanks.
Try this.
$content = file_get_contents('http://www.filehippo.com');
$this->html = $content;
$this->process();
function process(){
// header
$this->_replace('/.*<head>/ism', "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE html PUBLIC '-//WAPFORUM//DTD XHTML Mobile 1.0//EN' 'http://www.wapforum.org/DTD/xhtml-mobile10.dtd'><html xmlns='http://www.w3.org/1999/xhtml'><head>");
// title
$this->_replace('/<head>.*?(<title>.*<\/title>).*?<\/head>/ism', '<head>$1</head>');
// strip out divs with little content
$this->_stripContentlessDivs();
// divs/p
$this->_replace('/<div[^>]*>/ism', '') ;
$this->_replace('/<\/div>/ism','<br/><br/>');
$this->_replace('/<p[^>]*>/ism','');
$this->_replace('/<\/p>/ism', '<br/>') ;
// h tags
$this->_replace('/<h[1-5][^>]*>(.*?)<\/h[1-5]>/ism', '<br/><b>$1</b><br/><br/>') ;
// remove align/height/width/style/rel/id/class tags
$this->_replace('/\salign=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sheight=(\'?\"?).*?\\1/ism','');
$this->_replace('/\swidth=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sstyle=(\'?\"?).*?\\1/ism','');
$this->_replace('/\srel=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sid=(\'?\"?).*?\\1/ism','');
$this->_replace('/\sclass=(\'?\"?).*?\\1/ism','');
// remove coments
$this->_replace('/<\!--.*?-->/ism','');
// remove script/style
$this->_replace('/<script[^>]*>.*?\/script>/ism','');
$this->_replace('/<style[^>]*>.*?\/style>/ism','');
// multiple \n
$this->_replace('/\n{2,}/ism','');
// remove multiple <br/>
$this->_replace('/(<br\s?\/?>){2}/ism','<br/>');
$this->_replace('/(<br\s?\/?>\s*){3,}/ism','<br/><br/>');
//tables
$this->_replace('/<table[^>]*>/ism', '');
$this->_replace('/<\/table>/ism', '<br/>');
$this->_replace('/<(tr|td|th)[^>]*>/ism', '');
$this->_replace('/<\/(tr|td|th)[^>]*>/ism', '<br/>');
// wrap and close
}
private function _replace($pattern, $replacement, $limit=-1){
$this->html = preg_replace($pattern, $replacement, $this->html, $limit);
}
for more - https://code.google.com/p/phpmobilizer/
you can use regular expression to delete css-script's tags and image's tags, just replace those codes with blank space
preg_replace($pattern, $replacement, $string);
for more detail of function go here: http://php.net/manual/en/function.preg-replace.php

Add a span tag to Title without javascript

UPDATE:
Yes I am Using PHP in my pages.
Hello Friends I was thinking..... Is there a way to add a <span> tag to the title without using javascript?
May be using Regex or php or some other method. I dont really know.
Let me explain....
My HTML is like this:
<h3 class="title">The Title Goes Here</h3>
What I want is to automatically add a span tag, so the the final HTML looks like this.
<h3 class="title"><span>The </span>Title Goes Here</h3>
I want to wrap only the first word of the title in a <span> tag.
I know this can easily be dont using Javascript but I am looking for a non-javascript solution.
Please Help!
You can do this with DOMDocument in PHP if you don't want to do it with the javascript DOM:
$html = '<h3 class="title">The Title Goes Here</h3>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
foreach($xp->query('//h3[#class="title"]') as $parent) {
$title = $parent->nodeValue;
list($first, $rest) = explode(' ', $title, 2);
$span = new DOMElement('span', $first. ' ');
$parent->nodeValue = $rest;
$parent->insertBefore($span, $parent->firstChild);
}
foreach($doc->getElementsByTagName('body')->item(0)->childNodes as $node)
{
echo $doc->saveHTML($node);
}
My answer is that the cannot be done. You can't manipulate a page in the browser without JavaScript. This can only be achieved by editing the page on the server manually, or by dynamically generating it using PHP logic, or an equivalent solution, of which there are many.
If you are doing this for a corporate solution that is only used on a single corporate standard browser, you could look into building a plugin for the browser.

Categories