Getting unstyled text using QueryPath in PHP

Getting unstyled text using QueryPath in PHP - php

I'm just getting to grips with QueryPath after using HTML Simple Dom for quite some time and am finding that the QP documentation doesn't seem to offer much in the way of examples for all of its functions.
At the moment I'm trying to retrieve some text from a HTML doc that doesn't make much use of ID's or Classes, so I'm a little outside of my comfort zone.
Here's the HTML:
<div class="blue-box">
<div class="top">
<h2><img src="pic.gif" alt="Advertise"></h2>
<p>Some uninteresting stuff</p>
<p>More stuff</p>
</div>
</div>
<div class="blue-box">
<div class="top">
<h2><img src="pic2.gif" alt="Location"></h2>
**I NEED THIS TEXT**
<div style="margin:stuff">
<img src="img3.gif">
</div>
</div>
</div>
I was thinking about selecting the class 'box-blue' as the starting point and then descending from there. The issue is that there could be any number of box-blue classes in the HTML doc.
Therefore I was thinking that maybe I should try to select the image with alt="Location" and then use ->next()->text() or something along those lines?
I've tried about 15 variations os far and none are getting the text I need.
Assistance most appreciated!

Can you have a look to this example http://jsfiddle.net/Pedro3M/mujtk/
I made like you said using the alt attribute, if you confirm if this is always unique
$("img[alt='Location']").parent().parent().text();

How about:
$doc->find('div.top:has(img[alt="Location"])')->text();

Related

Clean HTML Sourcecode with PHP

i'm looking for a way to keep my HTML code output via PHP clean.
If you look into the source code, the result looks like this:
<section><div class="card">
<div class="card-body">
<h5 class="card-title">Special title treatment</h5>
<p class="card-text">With supporting text below as a natural lead-in</p>
Go somewhere </div>
</div>
</section><section><div class="card">
<div class="card-body">
<h5 class="card-title">Special title treatment</h5>
<p class="card-text">With supporting text below as a natural lead-in content.</p>
Go somewhere </div>
</div></section>
I want it to look like this:
<section>
<div class="card">
<div class="card-body">
<h5 class="card-title">Special title treatment</h5>
<p class="card-text">With supporting text below as a natural lead-in</p>
Go somewhere </div>
</div>
</section>
<section>
<div class="card">
<div class="card-body">
<h5 class="card-title">Special title treatment</h5>
<p class="card-text">With supporting text below as a natural lead-in</p>
Go somewhere </div>
</div>
</section>
this is my php output code:
ob_start();
include_once ROOT.'/global/header.php';
print $content_output; // the included files
include_once ROOT.'/global/footer.php';
$output = ob_get_contents();
ob_end_clean();
echo $output;
The reason for this is that I am building a scaffold where blocks are created for a website. For example the start page consists of block2, block7, block1 and block5. At the end the customer gets a clean HTML, which consists of the above mentioned blocks.

If your PHP fully renders the HTML, why would you want it to look good? It is not like any other developer is going to look inside the compiled HTML, right?
The browser does not care how your HTML is formatted, if it is valid HTML, it is valid HTML. This should also not affect the SEO of your webpage.
In the case you are manually writing HTML in PHP code. You should avoid echoing full HTML strings. You can do this by using as much inline PHP as you can. For example:
<?php if(//Statement): ?>
<h1><?= $test ?></h1>
<?php endif; ?>
This way you know PHP is not going to affect the indentation of the markup.

You can use DOMDocument to process & format HTML. DOMDocument is tough to use & much of it could use better documentation.
If all you want to do is pretty print the html, something like this should do what you need:
$html = '<div>Happy Coding todayyy</div>';
$doc = new \DOMDocument($html);
$doc->formatOutput = true;
$cleanHtml = $doc->saveHTML();
You could also look for an html beautifier, but it doesn't look like there's any particularly mature projects for that.
I also want to add that running DOMDocument on every single request to format html adds additional overhead. More cpu cycles means more energy, so something to be mindful of. You probably won't see any real change in script execution time though.
Some existing projects that might make DOM work easier for you. Things to maybe try if DOMDocument doesn't do quite what you want (I'm not 100% sure the code above will do the trick, nor do I know if any of these repos can definitely solve your problem):
Voku's port of simple_html_dom <- simple_html_dom has been around for awhile. Haven't tried Voku's port, but his repos that I've reviewed are usually very good quality.
A DomDocument extension by ivopetkov <- I think this one is the most mature
Another option by scotteh <- Don't know anything about it
A DomDocument extension by me. Stable, small but nice feature set

I want to remove certain parent- and child-divs in all my wordpress posts with php or some other script

Is there a quick way, via script maybe, to remove a certain pair of div's out of all my wordpress posts? For example:
I want to go from this:
<div class="single_textimage">
<div class="youtube_play"><iframe src="-,-"></iframe></div>
<div class="single_textimage_text">Some text.</div>
<div class="single_textimage_copyright">Some text.</div>
</div>
To this:
<div class="youtube_play"><iframe src="-,-"></iframe></div>
AND
From this:
<div class="single_textimage">
<img class="aligncenter size-full wp-image-1700" src="-,-" />
<div class="single_textimage_text">Some text.</div>
<div class="single_textimage_copyright">Some text.</div>
</div>
To this:
<img class="aligncenter size-full wp-image-1700" src="-,-" />
So I want the divs: single_textimage, single_textimage_text and single_textimage_copyright to go.
I hope there is an easy script, or difficult for that matter. Via "php", "mysql" or "jquery" for example, that I can put in test.php in the root or something...
I hope I supplied you with enough information. If I haven't made myself clear enough, please reply. :)

Seems to me like you should be able to take those out of whatever template your using - probably in a PHP include, but I don't really use WordPress, so I wouldn't know where without seeing all your files. If you're bent on using jQuery instead of modifying the template, I would throw in some CSS too, to hide the elements that will be removed:
.single_textimage, .single_textimage_text, .single_textimage_copyright{
display:none;
}
Then you can take the elements you want to keep out of their parent DIVs, and place them right after (or before):
$('.youtube_play, .wp-image-1700').each(function(){
$(this).parent().after($(this));
});
Then you can remove the elements you don't want from the page:
$('.single_textimage, .single_textimage_text, .single_textimage_copyright').remove();
Here's a fiddle: https://jsfiddle.net/3uztorzL/

I would use this search and replace utility to update all of the content in the DB:
https://interconnectit.com/products/search-and-replace-for-wordpress-databases/
You'll need a regex to replace <div class="single_textimage_text">Some text.</div> (assuming the "some text" is different in each post). The utility supports regex replace. This may do it:
<div class="single_textimage_text">(.*?)</div>
Make sure you make a backup before you do the replace.

Add <div> on a MediaWiki geshi syntax highlight extension

I use mediawiki to take note about the procedure that I follow, the source codes I write in mediawiki are highlighted with the expansion Genshi Syntax HighLight. I want to modify this expansion in mediawiki so it could be created a box above the source code in which it is written the programming language I used. I tried to see expansion sources in my mediawiki but I didn't find the segment in which is "sketch" the <div>. I also saw material about the creation of new expansion in mediawiki to understand how it runs, but I don't understand where the box is created.
I use syntax hightligher like this
some_code
and this is the result in html code generate from mediawiki
<div class="mw-geshi mw-code mw-content-ltr" dir="ltr">
<div class="bash source-bash">
<pre class="de1">
some_code
</pre>
</div>
</div>
I want to prepen the div to first div, like this
<div class='gsh-lang-label'>Language bash</div>
<div class="mw-geshi mw-code mw-content-ltr" dir="ltr">
<div class="bash source-bash">
<pre class="de1">
some_code
</pre>
</div>
</div>
Can you explain me if it is possible to do it and how can I face the problem?

I think ordinary jQuery will solve this problem. Something like:
$(".mw-geshi").each(function(){
$(this).before("<div class='gsh-lang-label'>" +
$(this).children().first().attr("class").split(' ')[0] +
"</div>")
})
Put this in [[MediaWiki:Common.js]], so this script will be run for every user.

Get specific html content from other site with PHP

I want to try and get the latest movie I checked on the IcheckMovies site and display it on my website. I don't know how, I've read about php_get_contents() and then getting an element but the specific element I want is rather deep in the DOM-structure. Its in a div in a div in a list in a ...
So, this is the link I want to get my content from: http://www.icheckmovies.com/profiles/robinwatchesmovies and I want to get the first title of the movie in the list.
Thanks so much in advance!
EDIT:
So using the file_get_contents() method
<?php
$html = file_get_contents('http://www.icheckmovies.com/profiles/robinwatchesmovies/');
echo $html;
?>
I got this html output. Now, I just need to get 'Smashed' so the content of the href link inside the h3 inside a div inside a div inside a list. This is where I don't know how to get it.
...
<div class="span-7">
<h2>Checks</h2>
<ol class="itemList">
<li class="listItem listItemSmall listItemMovie movie">
<div class="listImage listImageCover">
<a class="dvdCoverSmall" title="View detailed information on Smashed (2012)" href="/movies/smashed/"></a>
<div class="coverImage" style="background: url(/var/covers/small/10/1097928.jpg);"></div>
</div>
<h3>
<a title="View detailed information on Smashed (2012)" href="/movies/smashed/">Smashed</a>
</h3>
<span class="info">6 days ago</span>
</li>
<li class="listItem listItemSmall listItemMovie movie">
<li class="listItem listItemSmall listItemMovie movie">
</ol>
<span>
</div>
...

There are some libraries which could help you!
One I've used for the same purpose, a long time ago, is this: http://simplehtmldom.sourceforge.net/
I hope it help you!

follow steps to achieve this
STEP1:-
First get the contents using file_get_contents in a php file
ex: getcontent.php
<?php
echo file_get_contents("http://www.icheckmovies.com/movies/checked/?user=robinwatchesmovies ");
?>
STEP2:-
CALL the above script using ajax call and add the content to a visibility hidden field in the html.
ex:
$('#hidden_div').html(response);
html:-
<html>
<body>
<div id='hidden_div' style='visibility:hidden'>
</div>
</body>
</html>
STEP3:-
now extract the id what ever you want.

What you are asking for is called as web scraping ,I have done this a few months back, the process goes like this,
Make a HttpRequest to the site from which you need the content,check
the php class for it
Use a DOM parse library for handling the downloaded page (it would be in html),simple HTLM DOM would be a good choice
Extract your required information
Here are some tutorials for you,
HTML Parsing and Screen Scraping with the Simple HTML DOM
Library
Beginning web page scraping with php
SO Posts:
HTML Scraping in Php
And best of all Google is your friend just search for "PHP scraping"

Using columns in Wordpress page that include header

I'm a little bit of a newb when it comes to PHP, but know enough to get around and would like to know if this can be done. sample
I want to break my wordpress page into 2 columns, but also want to have the header in the 1st column.... along with other text. I don't want the header floating over both columns...
The second column will house images only...
is that possible? In my head it makes sense, but then when I try and work it out, I'm just not sure....
And I just got thinking... I have my home page static with the smooth slider on it, so that is now going to cause more grief.
Any help, advice or pointers would be greatly appreciated.
Thanks in advance

This is more CSS/HTML than PHP, however that's fine. The first thing you need to do is understand how to make a two column layout. Then you will need to have the post title in the first column, something like this:
<article>
<div id="col1">
<h1>Post Title</h1>
Lorem ipsum dolor sit amet...
</div>
<div id="col2">
<img src="" />
</div>
</article>
To make this into WordPress you will of course need to add the WordPress Tags:
<div id="col1">
<h1><?php the_title(); ?></h1>
<?php the_content(); ?>
</div>
Finally, adding the image(s) on the right. It can be done easily using WordPress' built in functionality, if you only need one image: (Note you will have to add something in your theme's functions.php file as per the WordPress Docs)
<div id="col2">
<?php
if ( has_post_thumbnail() ) { // check if the post has a Post Thumbnail assigned to it.
the_post_thumbnail();
}
?>
</div>
To add multiple images, it gets more complex and you'll have to start looking for a plugin to achieve that goal.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Getting unstyled text using QueryPath in PHP - php

Can you have a look to this example http://jsfiddle.net/Pedro3M/mujtk/ I made like you said using the alt attribute, if you confirm if this is always unique $("img[alt='Location']").parent().parent().text();

How about: $doc->find('div.top:has(img[alt="Location"])')->text();

Related

Clean HTML Sourcecode with PHP

I want to remove certain parent- and child-divs in all my wordpress posts with php or some other script

Add <div> on a MediaWiki geshi syntax highlight extension

Get specific html content from other site with PHP

Using columns in Wordpress page that include header

Categories

Resources