Xpath grabbing URL

Xpath grabbing URL - php

Usually i get what I want with Xpath but this time I can't grab the url that I want. In this page there are a couple urls in this div tag "btn-cont col-md-8 typesquare_tags" so I am trying to grab just one the href which is this: href="https://www.31sumai.com/mfr/K1503/outline.html"
So I am using this code block but I couldn't reach it.
foreach($links as $href){
$getContent = pageContent($href);
$getXpath = new \DOMXPath($getContent);
$Route = $getXpath->query("//div[#class='btn-cont col-md-8 typesquare_tags']/a[3]");
foreach ($Route as $link3){
$linkBOX[] = trim($link3->getAttribute('href'));
}
}
Do I missing something here?
PS. pageContent is a function which include DOMDoc/LoadHTML

"typesquare_tags" class name added dynamically. Try to locate div by first two class names:
"//div[#class='btn-cont col-md-8']/a[3]"
or
"//div[contains(#class, 'btn-cont') and contains(#class, 'col-md-8')]/a[3]"

Related

scraping images from url using php

i am trying to make a page that allows me to grab and save images from another link , so here's what i want to add on my page:
text box (to enter url that i want to get images from).
save dialog box to specify the path to save images.
but what i am trying to do here i want to save images only from that url and from inside specific element.
for example on my code i say go to example.com and from inside of element class="images" grab all images.
notes: not all images from the page, just from inside the element
whether element has 3 images in it or 50 or 100 i don't care.
here's what i tried and worked using php
<?php
$html = file_get_contents('http://www.tgo-tv.net');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
echo $matches[ 1 ][ 0 ];
?>
this gets image name and path but what i am trying to make is a save dialog box and the code must save image directly into that path instead of echo it out
hope you understand
Edit 2
it's ok of Not having save dialog box. i must specify save path from the code

If you want something generic, you can use:
<?php
$the_site = "http://somesite.com";
$the_tag = "div"; #
$the_class = "images";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#class,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
print $img_src."\n";
}
Usage:
Change the site, tag, which can be a div, span, a, etc. also change the class name.
For example, change the values to:
$the_site = "https://stackoverflow.com/questions/23674744/what-is-the-equivalent-of-python-any-and-all-functions-in-javascript";
$the_tag = "div"; #
$the_class = "gravatar-wrapper-32";
Output:
https://www.gravatar.com/avatar/67d8ca039ee1ffd5c6db0d29aeb4b168?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24da669dda96b6f17a802bdb7f6d429f?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24780fb6df85a943c7aea0402c843737?s=32&d=identicon&r=PG

Maybe you should try HTML DOM Parser for PHP. I've found this tool recently and to be honest it works pretty well. It was JQuery-like selectors as you can see on the site. I suggest you to take a look and try something like:
<?php
require_once("./simple_html_dom.php");
foreach ($html->find("<tag>") as $<tag>) //Start from the root (<html></html>) find the the parent tag you want to search in instead of <tag> (e.g "div" if you want to search in all divs)
{
foreach ($<tag>->find("img") as $img) //Start searching for img tag in all (divs) you found
{
echo $img->src . "<br>"; //Output the information from the img's src attribute (if the found tag is <img src="www.example.com/cat.png"> you will get www.example.com/cat.png as result)
}
}
?>
I hope i helped you less or more.

Modifying Database Data with PHP DOM (enclosing in tags)

I have many articles, divided into sections, stored in a database. Each section consists of a section tag, followed by a header (h2) and a primary div. Some also have subheaders (h3). The raw display looks something like this:
<section id="ecology">
<h2 class="Article">Ecology</h2>
<div class="Article">
<h3 class="Article">Animals</h3>
I'm using the following DOM script to add some classes, ID's and glyphicons:
$i = 1; // initialize counter
// initialize DOMDocument
$dom = new DOMDocument;
#$dom->loadHTML($Content); // load the markup
$sections = $dom->getElementsByTagName('section'); // get all section tags
if($sections->length > 0) { // if there are indeed section tags inside
// work on each section
foreach($sections as $section) { // for each section tag
$section->setAttribute('data-target', '#b' . $i); // set id for section tag
// get div inside each section
foreach($section->getElementsByTagName('h2') as $h2) {
if($h2->getAttribute('class') == 'Article') { // if this div has class maindiv
$h2->setAttribute('id', 'a' . $i); // set id for div tag
}
}
foreach($section->getElementsByTagName('div') as $div) {
if($div->getAttribute('class') == 'Article') { // if this div has class maindiv
$div->setAttribute('id', 'b' . $i); // set id for div tag
}
}
$i++; // increment counter
}
}
// back to string again, get all contents inside body
$Content = '';
foreach($dom->getElementsByTagName('body')->item(0)->childNodes as $child) {
$Content .= $dom->saveHTML($child); // convert to string and append to the container
}
I'd like to modify the above code so that it places certain examples of "inner text" between tags.
For example, consider these headings:
<h3 class="Article">Animals</h3>
<h3 class="Article">Plants</h3>
I would like the DOM to change them to this:
<h3 class="Article"><span class="label label-default">Animals</span></h3>
<h3 class="Article"><span class="label label-default">Plants</span></h3>
I want to do something similar with the h2 tags. I don't yet know the DOM terminology well enough to search for good tutorials - not to mention confusion with DOM programs and jQuery. ;)
I think these are the basic functions I need to focus on, but I don't know how to plug them in:
$text = $data->textContent;
elementNode.textContent=string
Two Notes: 1) I understand I can do this with jQuery (perhaps a lot easier), but I think PHP might be better, as they say some users can have JavaScript disabled. 2) I'm using the class "Article" largely to distinguish elements I want to be styled by PHP DOM. A header with a different class, or no class at all, should not be affected by the DOM script.

simple html dom - space in class name

I'm using PHP Simple HTML DOM to get element from a source code of a site (not mine) and when I find a ul class that is called "board List",this is not found.I think it might be a problem of space but I don't know how to solve it.
this is a piece of php code:
$html = str_get_html($result['content']); //get the html of the site
$board = $html->find('.board List'); // Find all element which class=board List,but in my case it doesn't work,with other class name it works
and this is a piece of html code of the site:
<!-- OTHER HTML CODE BEFORE THIS --><ul class="board List"><li id="c111131" class="skin_tbl">
<table class="mback" cellpadding="0" cellspacing="0" onclick="toggleCat('c111131')"><tr>
<td class="mback_left"><div class="plus"></div><td class="mback_center"><h2 class="mtitle">presentiamoci</h2><td class="mback_right"><span id="img_c111131"></span></table>
<div class="mainbg">
<div class="title top"><div class="aa"></div><div class="bb">Forum</div><div class="yy">Statistiche</div><div class="zz">Ultimo Messaggio</div></div>
<ul class="big_list"><!-- OTHER HTML AFTER THIS -->

I solved it by removing board from the find parameter,as this:
$board = $html->find('.List');
now the parser seems to work correctly

With simple you would probably want to use:
$html->find('*[class="board List"]', 0);
If you really want to use:
$html->find('.board.List', 0);
Then use this one.

The answer is that: You cannot use spaces in classnames. spaces are the seperaters of classes
if you have <div class="container wrapper-something anothersomething"></div> then you can use .container, .wrapper-something or .anothersomething as a selector and you allways match that div.
So in your code you have <ul class="board List">, so to get a match in a css-selector ($html->find('{here_comes_the_css_selector}');) you can use eather .board or .List as the selctor
Therefor your line $board = $html->find('.board List'); should look more like this:
$board = $html->find('.board.List');
// maches every element who has class 'board' AND 'List'
// Here it is really important that there is no spaces between those 2 selectors
// or
$board = $html->find('.List');
// maches every element who has class 'List'
// or
$board = $html->find('.board');
// maches every element who has class 'board'

$board = $html->find('[class="board List"]');
With this syntax SimpleHTMLDOM finds elements with multiple class attribute

DOM Document - How to get the text inside a tag without the inner tags

Assume I have a dom_document containing the following html and it is put in a variable called $dom_document
<div>
<a href='something'>some text here</a>
I want this
</div>
What i would like is retrieve the text that is inside the div tag ('I want this'), but not the a tag. What i do is the following:
$dom_document->nodeValue;
Unfortunately with this statement I have the a tag in with it. Hope someone can help. Thank you in advance. Cheers. Marc

You can use XPath for it:
$xpath = new DOMXpath($dom_document);
$textNodes = $xpath->query('//div/text()');
foreach ($textNodes as $txt) {
echo $txt->nodeValue;
}

Zend_Dom_Query query element issue

I have an issue where I have a div that doesnt have a class or id. Is it possible to select an div element when I know its innerText ie
<div class="thishere"></div>
<div>Search on a this text</div>
If not, the div before it has a class, how do i find its next sibling?
$selector = new Zend_Dom_Query($response->getBody());
$nodes = $selector->query('????');

Using JavaScript you can loop through every element on the page like this says and find that div with the special class. Then, you'll know that the next element in the loop will be that second div and you can get its contents using element.innerHTML.

$text = <<<text
<div class="thishere"></div>
<div>Search on a this text</div>
text;
$selector = new Zend_Dom_Query ($text);
$nodes = $selector->queryXpath('//div[contains(text(),"Search on a this text")]');
foreach ($nodes as $node)
{
...
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Xpath grabbing URL - php

"typesquare_tags" class name added dynamically. Try to locate div by first two class names: "//div[#class='btn-cont col-md-8']/a[3]" or "//div[contains(#class, 'btn-cont') and contains(#class, 'col-md-8')]/a[3]"

Related

scraping images from url using php

Modifying Database Data with PHP DOM (enclosing in tags)

simple html dom - space in class name

DOM Document - How to get the text inside a tag without the inner tags

Zend_Dom_Query query element issue

Categories

Resources