Get the current ID of parent element using domXpath - php

I'm working on getting the ranking of a domain on baidu.
What I'm trying to do is get the position of the result when the domain appear, I manage to get the domain name, my problem is the position.
I need to get the id(this is the position) of the result c-container when the domain appears on the result. hope you help me.
thanks.
$finder = new DomXPath($document);
$results = $finder->query("//*[contains(#class, 'result c-container')]");
if($element){
$data = array();
foreach ($results as $result) {
# code...
$as = $result->getElementsByTagName('a');
foreach ($as as $a){
if ($a->getAttribute('class') === 'c-showurl') {
$textUrl = $a->nodeValue;
if (($pos = strpos($textUrl, "}")) !== FALSE) {
$textUrl = substr($textUrl, $pos+1);
}
$domain = trimUrl($domain);
if(preg_match("/{$domain}/i", $textUrl)) {
$data['domain'] = $textUrl;
$data['id'] = ?
}
}
}
}
array_push($res, $data);
}else{
$data = array();
array_push($res, $data);
}

From the documentation
$item->parentNode->tagName
exmaple
if($item->parentNode->tagName == "h2") {
$href = $item->getAttribute("href");
$text = trim(preg_replace("/[\r\n]+/", " ", $item->nodeValue));
$links[] = [
'href' => $href,
'text' => $text
];
}
source: https://www.the-art-of-web.com/php/html-xpath-query/#section_3

Related

How to arrange data in arrays after Scraping from a website using PHP?

I am trying to scraped data related to lottery stuff from this webpage "https://www.controcampus.it/estrazioni/", which includes making a separate array for each type of lottery and in each array collecting three types of information i.e, "name" of the lottery, "date" of announcement and finally the "values/winning numbers" of that lottery. So far I have been able to get and arrange the "name" and "date" but I am not able to arrange the "values" in the same array.
I am able to grab all the the "values" from the website but not able to arrange them separately as I mentioned above. If anyone senior could help, I would really appreciate that! I am just a beginner with PHP. Thank You very much!!!
My code is given bellow:-
<?php
# imdb_birthdates.php
require 'vendor/autoload.php';
use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;
$goutteClient = new \Goutte\Client();
$guzzleClient = new \GuzzleHttp\Client(array(
'timeout' => 6000,
));
// $goutteClient->setClient($guzzleClient);
$crawler = $goutteClient->request('GET', 'https://www.controcampus.it/estrazioni');
$tables= [];
$dt =[];
$crawler
->filter('div.fullwidth h3')
->each(function ($node) use ($crawler) {
$name = $node->text();
$words = explode(' ', $name);
$date = '';
foreach ($words as $key => $word) {
if(is_numeric($word)) {
$date = $words[$key] . ' ' . $words[$key+1] . ' ' .$words[$key+2];
unset($words[$key], $words[$key+1], $words[$key+2]);
break;
}
}
$name = implode(' ', $words);
//$word = array("Estrazione", "VinciCasa", "MillionDAY", "MillionDAY EXTRA", " ", "Lotto", "SuperEnalotto", "10 e Lotto", "EuroJackpot");
//$realWord = implode(" ", $word);
$word1 = "VinciCasa";
$word2 = "MillionDAY";
$word3 = "MillionDAY EXTRA";
$word4 = "del Lotto";
$word5 = "SuperEnalotto";
$word6 = "10 e Lotto";
$word7 = "EuroJackpot";
// Test if string contains the word
if(strpos($name, $word1) !== false){
$name = $word1;
}
//for word2
if(strpos($name, $word2) !== false){
$name = $word2;
}
//for word3
if(strpos($name, $word3) !== false){
$name = $word3;
}
//for word4
if(strpos($name, $word4) !== false){
$name = $word4;
}
//for word5
if(strpos($name, $word5) !== false){
$name = $word5;
}
//for word6
if(strpos($name, $word6) !== false){
$name = $word6;
}
//for word7
if(strpos($name, $word7) !== false){
$name = $word7;
}
$dt[] = [
'name' => $name,
'date' => $date,
'values' => '',
];
print_r($dt);
});
$table = [];
// $crawler->filter('tr')->each(function($node) {
// print_r($node->text());
// });
$index = 0;
$crawler->filter('tr')->each(function($node) use($index) {
$node->filter('td')->each(function($nested_node) use($index) {
$table[$index] = $nested_node->text();
print_r($table);
});
$index++;
});
// $crawler->filter(' tr')->each(function($node) use ($table){
// // $th = $element->filter('td')->eq(0)->text();
// // $td = $element->filter('td')->eq(1)->text();
// // $table[] = $th;
// // $table[] = $td;
// $node->filter('td')->each(function($nested_node) {
// $table[] = $nested_node->text();
// });
// print_r($table);
// });

Extract some string from the URL

I have a URL, e.g:
https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/.
From the above URL, I want to extract my-product-name-display if this URL contains it, if not, I want the string after /ex/{BYADE3323} as below URL does not contain my-product-name-display.
https://www.example.com/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/
I have tried below code:
`$url_param = "https://www.example.com/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/";`
or
`$url_param = "https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/";`
$e_product_title = explode('.com/', $url_param);
if(isset($e_product_title)){
$product_title = $e_product_title[1];
//now explode the ex
$get_asin = explode('/ex/',$product_title);
$final_product_title = str_replace('-',' ',$get_asin[0]);
$get_asin_final = explode('/', $get_asin[1]);
$asin_v2 = $get_asin_final[0];
}
else{
$get_asin = explode('/ex/',$url_param);
print_r($get_asin);
}
echo $final_product_title." ".$asin_v2;
Thanks in advance.
You can explode() the string,
Check if my-product-name-display and BYADE3323 is in the array.
If present, find out BYADE3323's index.
Add 1 to it and check if the next element is present.
<?php
$str = 'https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/';
$str = str_replace('://', '__', $str);
$arr = explode('/', $str);
$return = '';
if (in_array('my-product-name-display', $arr) && in_array('BYADE3323', $arr)) {
$idx = array_search('BYADE3323', $arr);
$idx2 = $idx + 1;
if (! empty($idx) && ! empty($arr[$idx2])) {
$idx += 1;
$return = $arr[$idx2];
}
}
echo $return;
EDIT:
As per comments from OP, following is the program for array of urls and array of search strings.
<?php
$searchStrings = [];
$searchStrings[] = ['my-product-name-display', 'BYADE3323'];
$searchStrings[] = ['your-product-name-display', 'BYADE4434'];
$urls = [];
$urls[] = 'https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/';
$urls[] = 'https://www.example.com/your-product-name-display/ex/BYADE4434/wgsi?nfh3420000ooo2323nfnf/';
$urls[] = 'https://www.example.com/their-product-name-display/ex/TEST343/wgsi?nfh3420000ooo2323nfnf/';
$urls[] = 'https://www.example.com/my-product-name-display/ex/ANASDF33/wgsi?nfh3420000ooo2323nfnf/';
$urls[] = 'https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/';
$return = [];
if (! empty($urls)) {
foreach ($urls as $url) {
if (! empty($searchStrings)) {
foreach ($searchStrings as $searchString) {
$str = implode('/ex/', $searchString);
if (strpos($url, $str) !== false) {
$arr = explode('/', $url);
$idx = array_search('BYADE3323', $arr);
$idx2 = $idx + 1;
if (! empty($idx) && ! empty($arr[$idx2])) {
$idx += 1;
$return[] = $arr[$idx2];
}
}
}
}
}
}
echo '<pre>';
print_r($return);
echo '</pre>';
Output:
Array
(
[0] => wgsi?nfh3420000ooo2323nfnf
[1] => wgsi?nfh3420000ooo2323nfnf
)
Try this to fetch from URL values.
pass url to the function. You can extract it.
Here is the URL :
https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/
So when u want only BYADE3323 this value.
When you print $parts array, you can find every values after your Host name.
Where your host name is https://www.example.com.
function GetStringAfterSecondSlashInURL($the_url)
{
$parts = explode("/",$the_url,3);
if(isset($parts[2]))
return $parts[2];
}
Use parse_url() function this will help you definitely.
You can refer it from official PHP site: parse-url.
You can use strpos to identify weather 'my-product-name-display' is exist s in url or not and execute code accordingly.
strpos($url_param, 'my-product-name-display') !== false
Modified code:
function get_product_title($url_param) {
$get_asin = explode('/ex/', $url_param);
$get_asin_final = explode('/', $get_asin[1]);
$asin_v2 = $get_asin_final[0];
return $asin_v2;
}
$url_param = "https://www.example.com/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/";
$url_param = "https://www.example.com/my-product-name-display/ex/BYADE3323/wgsi?nfh3420000ooo2323nfnf/";
$product_name = '';
if (strpos($url_param, 'my-product-name-display') !== false) {
$e_product_title = explode('.com/', $url_param);
if (isset($e_product_title)) {
$product_title = $e_product_title[1];
//now explode the ex
$product_name = get_product_title($product_title);
}
echo "my product name display" . $product_name;
}
else {
$product_name = get_product_title($url_param);
echo $product_name;
}

How to get both array key from value in 2 dimensional array (PHP)

$arr['animal'][0] = 'Dog';
$arr['animal'][1] = 'Cat';
From that array basically I need to create a function with the array value parameter and then it gives me the array keys.
For example:
find_index('Cat');
Output :
The result is animal, 1
You could probably do something like
function find_index($value) {
foreach ($arr as $index => $index2) {
$exists = array_search($value, $index2);
if ($exists !== false) {
echo "The result is {$index}, {$exists}";
return true;
}
}
return false;
}
Try this:
$arr['animal'][0] = 'Dog';
$arr['animal'][1] = 'Cat';
function find_index($searchVal, $arr){
return array_search($searchVal, $arr);
}
print_r(find_index('Cat', $arr['animal']));
Consider this Array,
$arr['animal'][] = 'Dog';
$arr['animal'][] = 'Cat';
$arr['insects'][] = 'Insect1';
$arr['insects'][] = 'Insect2';
Here is Iterator Method,
$search = 'InsectSub1';
$matches = [];
$arr_array = new RecursiveArrayIterator($arr);
$arr_array_iterator = new RecursiveIteratorIterator($arr_array);
foreach($arr_array_iterator as $key => $value)
{
if($value === $search)
{
$fill = [];
$fill['category'] = $arr_array->key();
$fill['key'] = $arr_array_iterator->key();
$fill['value'] = $value;
$matches[] = $fill;
}
}
if($matches)
{
// One or more Match(es) Found
}
else
{
// Not Found
}
$arr['animal'][] = 'Dog';
$arr['animal'][] = 'Cat';
$arr['insects'][] = 'Insect1';
$arr['insects'][] = 'Insect2';
$search_for = 'Cat';
$search_result = [];
while ($part = each($arr)) {
$found = array_search($search_for, $part['value']);
if(is_int($found)) {
$fill = [ 'key1' => $part['key'], 'key2' => $found ];
$search_result[] = $fill;
}
}
echo 'Found '.count($search_result).' result(s)';
print_r($search_result);

One result array

I'm trying to add the results of a script to an array, but once I look into it there is only one item in it, probably me being silly with placement
function crawl_page($url, $depth)
{
static $seen = array();
$Linklist = array();
if (isset($seen[$url]) || $depth === 0) {
return;
}
$seen[$url] = true;
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')) {
$href = rtrim($url, '/') . '/' . ltrim($href, '/');
}
if(shouldScrape($href)==true)
{
crawl_page($href, $depth - 1);
}
}
echo "URL:",$url;
echo http_response($url);
echo "<br/>";
$Linklist[] = $url;
$XML = new DOMDocument('1.0');
$XML->formatOutput = true;
$root = $XML->createElement('Links');
$root = $XML->appendChild($root);
foreach ($Linklist as $value)
{
$child = $XML->createElement('Linkdetails');
$child = $root->appendChild($child);
$text = $XML->createTextNode($value);
$text = $child->appendChild($text);
}
$XML->save("linkList.xml");
}
$Linklist[] = $url; will add a single item to the $Linklist array. This line needs to be in a loop I think.
static $Linklist = array(); i think, but code is awful

Build Array Tree from URLs

I need to build an tree (with arrays) from given urls.
I have the following list of urls:
http://domain.com/a/a.jsp
http://domain.com/a/b/a.jsp
http://domain.com/a/b/b.jsp
http://domain.com/a/b/c.jsp
http://domain.com/a/c/1.jsp
http://domain.com/a/d/2.jsp
http://domain.com/a/d/a/2.jsp
now i need an array like this:
domain.com
a
a.jsp
b
a.jsp
b.jsp
c.jsp
c
1.jsp
d
2.jsp
a
2.jsp
How can i do this with php?
i thought mark's solution was a bit complicated so here's my take on it:
(note: when you get to the filename part of the URI, I set it as both the key and the value, wasn't sure what was expected there, the nested sample didn't give much insight.)
<?php
$urls = array(
'http://domain.com/a/a.jsp',
'http://domain.com/a/b/a.jsp',
'http://domain.com/a/b/b.jsp',
'http://domain.com/a/b/c.jsp',
'http://domain.com/a/c/1.jsp',
'http://domain.com/a/d/2.jsp',
'http://domain.com/a/d/a/2.jsp'
);
$array = array();
foreach ($urls as $url)
{
$url = str_replace('http://', '', $url);
$parts = explode('/', $url);
krsort($parts);
$line_array = null;
$part_count = count($parts);
foreach ($parts as $key => $value)
{
if ($line_array == null)
{
$line_array = array($value => $value);
}
else
{
$temp_array = $line_array;
$line_array = array($value => $temp_array);
}
}
$array = array_merge_recursive($array, $line_array);
}
print_r($array);
?>
$urlArray = array( 'http://domain.com/a/a.jsp',
'http://domain.com/a/b/a.jsp',
'http://domain.com/a/b/b.jsp',
'http://domain.com/a/b/c.jsp',
'http://domain.com/a/c/1.jsp',
'http://domain.com/a/d/2.jsp',
'http://domain.com/a/d/a/2.jsp'
);
function testMapping($tree,$level,$value) {
foreach($tree['value'] as $k => $val) {
if (($val == $value) && ($tree['level'][$k] == $level)) {
return true;
}
}
return false;
}
$tree = array();
$i = 0;
foreach($urlArray as $url) {
$parsed = parse_url($url);
if ((!isset($tree['value'])) || (!in_array($parsed['host'],$tree['value']))) {
$tree['value'][$i] = $parsed['host'];
$tree['level'][$i++] = 0;
}
$path = explode('/',$parsed['path']);
array_shift($path);
$level = 1;
foreach($path as $k => $node) {
if (!testMapping($tree,$k+1,$node)) {
$tree['value'][$i] = $node;
$tree['level'][$i++] = $level;
}
$level++;
}
}
echo '<pre>';
for ($i = 0; $i < count($tree['value']); $i++) {
echo str_repeat(' ',$tree['level'][$i]*2);
echo $tree['value'][$i];
echo '<br />';
}
echo '</pre>';

Categories