How to parse web form fields with xPath? - php

In order to retrieve name/value pairs.

This should do it:
$dom = new DOMDocument;
$dom->load('somefile.html');
$xpath = new DOMXPath($dom);
$data = array();
$inputs = $xpath->query('//input');
foreach ($inputs as $input) {
if ($name = $input->getAttribute('name')) {
$data[$name] = $input->getAttribute('value');
}
}
$textareas = $xpath->query('//textarea');
foreach ($textareas as $textarea) {
if ($name = $textarea->getAttribute('name')) {
$data[$name] = $textarea->nodeValue;
}
}
$options = $xpath->query('//select/option[#selected="selected"]');
foreach ($options as $option) {
if ($name = $option->parentNode->getAttribute('name')) {
$data[$name] = $option->getAttribute('value');
}
}
Depending on whether or not you have multiple forms in your HTML, you may want to pass the second argument to query() to differentiate them, and add an extra loop.
You will have to tweak it a bit if you use array keys (e.g.: yourfield[]).

Related

passing more then one path to xPath query

I have links with a different paths, and trying retrieve data from those links. So I don't want to do it separate. Made a query list, and used foreach on that list.
function passPath($list){
$list = [
"//li[#class='out']/a[1]",
"//ul[#class='ul right_ul clearfix']/li[2]/a",
"//ul[#class='ul right_ul clearfix']/li[2]/a"
];
foreach($list as $val){
return $val;
}
}
Then used that function inside DOMXpath's query.
function getPath($urls){
foreach($urls as $k => $val){
$url = $urls;
$html = content($val);
$path = new \DOMXPath($html);
$xPath = passPath($val);
$route = $path->query($xPath);
foreach($route as $value){
if ($value->nodeValue != false) {
$urls [] = trim($value->getAttribute('href'));
unset($urls[$k]);
}
}
}
return array_unique($urls);
}
it's running without an error. But there is foreach problem here. because it's just retrieving one element's data. not keep going other elements... What I am missing here?
$data = getPath($urls)
var_dump($data)
by the way: content() is file_get_content/loadHTML function.
I changed your code for earning list of href.
# You want to parse all pages using url list. So you created function named `getPath($urls)`.
function getPath($urls) {
# I suggest you'd rather declare $ret for storing values to return.
$ret = [];
# Using foreach, you can parse all url.
foreach ($urls as $k => $url) { # $val is url value of $urls. And I changed $val to $url.
# content() is file_get_content/loadHTML function.
$html = content($url);
# Create new DOMXPath object using $html.
$path = new \DOMXPath($html);
# This function is not required.
# By the way, second element and third element of $xPathList are equal. I think the third element is not required.
// $xPath = passPath($url);
$xPathList = [
"//li[#class='out']/a[1]",
"//ul[#class='ul right_ul clearfix']/li[2]/a",
"//ul[#class='ul right_ul clearfix']/li[2]/a"
];
foreach ($xPathList as $xPath) {
$nodes = $path->query($xPath);
foreach ($nodes as $node) {
if ($node->nodeValue != false) {
$ret[] = trim($node->getAttribute('href'));
}
}
}
}
return array_unique($ret);
}
$data = getPath($urls);
var_dump($data);

output and call array from class function (rollingcurl)

Excuse my English, please.
I use Rollingcurl to crawl various pages.
Rollingcurl: https://github.com/LionsAd/rolling-curl
My class:
<?php
class Imdb
{
private $release;
public function __construct()
{
$this->release = "";
}
// SEARCH
public static function most_popular($response, $info)
{
$doc = new DOMDocument();
libxml_use_internal_errors(true); //disable libxml errors
if (!empty($response)) {
//if any html is actually returned
$doc->loadHTML($response);
libxml_clear_errors(); //remove errors for yucky html
$xpath = new DOMXPath($doc);
//get all the h2's with an id
$row = $xpath->query("//div[contains(#class, 'lister-item-image') and contains(#class, 'float-left')]/a/#href");
$nexts = $xpath->query("//a[contains(#class, 'lister-page-next') and contains(#class, 'next-page')]");
$names = $xpath->query('//img[#class="loadlate"]');
// NEXT URL - ONE TIME
$Count = 0;
$next_url = "";
foreach ($nexts as $next) {
$Count++;
if ($Count == 1) {
/*echo "Next URL: " . $next->getAttribute('href') . "<br/>";*/
$next_link = $next->getAttribute('href');
}
}
// RELEASE NAME
$rls_name = "";
foreach ($names as $name) {
$rls_name .= $name->getAttribute('alt');
}
// IMDB TT0000000 RLEASE
if ($row->length > 0) {
$link = "";
foreach ($row as $row) {
$tt_info .= #get_match('/tt\\d{7}/is', $doc->saveHtml($row), 0);
}
}
}
$array = array(
$next_link,
$rls_name,
$tt_info,
);
return ($array);
}
}
Output/Return:
$array = array(
$next_link,
$rls_name,
$tt_info,
);
return ($array);
Call:
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
function get_match($regex, $content, $pos = 1)
{
/* do your job */
preg_match($regex, $content, $matches);
/* return our result */
return $matches[intval($pos)];
}
require "RollingCurl.php";
require "imdb_class.php";
$imdb = new Imdb;
if (isset($_GET['action']) || isset($_POST['action'])) {
$action = (isset($_GET['action'])) ? $_GET['action'] : $_POST['action'];
} else {
$action = "";
}
echo " 2222<br /><br />";
if ($action == "most_popular") {
$popular = '&num_votes=1000,&production_status=released&groups=top_1000&sort=moviemeter,asc&count=40&start=1';
if (isset($_GET['date'])) {
$link = "https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,".$_GET['date'].$popular;
} else {
$link = "https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,2018".$popular;
}
$urls = array($link);
$rc = new RollingCurl([$imdb, 'most_popular']); //[$imdb, 'most_popular']
$rc->window_size = 20;
foreach ($urls as $url) {
$request = new RollingCurlRequest($url);
$rc->add($request);
}
$stream = $rc->execute();
}
If I output everything as "echo" in the class, everything is also displayed. However, I want to call everything individually.
If I now try to output it like this, it doesn't work.
$stream[0]
$stream[1]
$stream[3]
Does anyone have any idea how this might work?
Thank you very much in advance.
RollingCurl doesn't do anything with the return value of the callback, and doesn't return it to the caller. $rc->execute() just returns true when there's a callback function. If you want to save anything, you need to do it in the callback function itself.
You should make most_popular a non-static function, and give it a property $results that you initialize to [] in the constructor.. Then it can do:
$this->results[] = $array;
After you do
$rc->execute();
you can do:
foreach ($imdb->results as $result) {
echo "Release name: $result[1]<br>TT Info: $result[2]<br>";
}
It would be better if you put the data you extracted from the document in arrays rather than concatenated strings, e.g.
$this->$rls_names = [];
foreach ($names as $name) {
$this->$rls_names[] = $name->getAttribute('alt');
}
$this->$tt_infos = [];
foreach ($rows as $row) {
$this->$tt_infos[] = #get_match('/tt\\d{7}/is', $doc->saveHtml($row), 0);
}
$this->next_link = $next[0]->getAttribute('href'); // no need for a loop to get the first element of an array

Invalid argument supplied for foreach() inside if else PHP [duplicate]

Why am I getting this PHP Warning?
Invalid argument supplied for foreach()
Here is my code:
// look for text file for this keyword
if (empty($site["textdirectory"])) {
$site["textdirectory"] = "text";
}
if (file_exists(ROOT_DIR.$site["textdirectory"].'/'.urlencode($q).'.txt')) {
$keywordtext =
file_get_contents(ROOT_DIR.$site["textdirectory"].'/'.urlencode($q).'.txt');
}
else {
$keywordtext = null;
}
$keywordsXML = getEbayKeywords($q);
foreach($keywordsXML->PopularSearchResult as $item) {
$topicsString = $item->AlternativeSearches;
$relatedString = $item->RelatedSearches;
if (!empty($topicsString)) {
$topics = split(";",$topicsString);
}
if (!empty($relatedString)) {
$related = split(";",$relatedString);
}
}
$node = array();
$node['keywords'] = $q;
2
$xml = ebay_rss($node);
$ebayItems = array();
$totalItems = count($xml->channel->item);
$totalPages = $totalItems / $pageSize;
$i = 0;
foreach ($xml->channel->item as $item) {
$ebayRss =
$item->children('http://www.ebay.com/marketplace/search/v1/services');
if ($i>=($pageSize*($page-1)) && $i<($pageSize*$page)) {
$newItem = array();
$newItem['title'] = $item->title;
$newItem['link'] = buyLink($item->link, $q);
$newItem['image'] = ebay_stripImage($item->description);
$newItem['currentbid'] = ebay_convertPrice($item->description);
$newItem['bidcount'] = $ebayRss->BidCount;
$newItem['endtime'] = ebay_convertTime($ebayRss->ListingEndTime);
$newItem['type'] = $ebayRss->ListingType;
if (!empty($ebayRss->BuyItNowPrice)) {
$newItem['bin'] = ebay_convertPrice($item->description);
}
array_push($ebayItems, $newItem);
}
$i++;
}
$pageNumbers = array();
for ($i=1; $i<=$totalPages; $i++) {
array_push($pageNumbers, $i);
}
3
// get user guides
$guidesXML = getEbayGuides($q);
$guides = array();
foreach ($guidesXML->guide as $guideXML) {
$guide = array();
$guide['url'] = makeguideLink($guideXML->url, $q);
$guide['title'] = $guideXML->title;
$guide['desc'] = $guideXML->desc;
array_push($guides,$guide);
}
What causes this warning?
You should check that what you are passing to foreach is an array by using the is_array function
If you are not sure it's going to be an array you can always check using the following PHP example code:
if (is_array($variable)) {
foreach ($variable as $item) {
//do something
}
}
This means that you are doing a foreach on something that is not an array.
Check out all your foreach statements, and look if the thing before the as, to make sure it is actually an array. Use var_dump to dump it.
Then fix the one where it isn't an array.
How to reproduce this error:
<?php
$skipper = "abcd";
foreach ($skipper as $item){ //the warning happens on this line.
print "ok";
}
?>
Make sure $skipper is an array.
Because, on whatever line the error is occurring at (you didn't tell us which that is), you're passing something to foreach that is not an array.
Look at what you're passing into foreach, determine what it is (with var_export), find out why it's not an array... and fix it.
Basic, basic debugging.
Try this.
if(is_array($value) || is_object($value)){
foreach($value as $item){
//somecode
}
}

Decoding mixture of array and stdObject to Database

I am trying to decode a collection of json files to a mysql database and return the decoded values to a datatable for presentation. I have one table called ec2_instances, and want to send to that table an array of values which are located at cfi configuration which works fine. but have now added a new column called aws account id which is on object rather than an array I have updated the model to include the new column but I am struggling
<?php
function from_camel_case($input)
{
preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
$ret = $matches[0];
foreach ($ret as &$match)
{
$match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
}
return implode('_', $ret);
}
$resource_types = array();
$resource_types['AWS::EC2::Instance'] = 'EC2Instance';
$resource_types['AWS::EC2::NetworkInterface'] = 'EC2NetworkInterface';
$resource_types['AWS::EC2::VPC'] = 'VPC';
$resource_types['AWS::EC2::Volume'] = 'Volume';
$resource_types['AWS::EC2::SecurityGroup'] = 'EC2SecurityGroup';
$resource_types['AWS::EC2::Subnet'] = 'Subnet';
$resource_types['AWS::EC2::RouteTable'] = 'RouteTable';
$resource_types['AWS::EC2::EIP'] = 'EIP';
$resource_types['AWS::EC2::NetworkAcl'] = 'NetworkAcl';
$resource_types['AWS::EC2::InternetGateway'] = 'InternetGateway';
$accounts = DB::table('aws_account')->get();
$account_id = array($accounts);
$account_id_exists = array_add($account_id, 'key', 'value');
foreach(glob('../app/views/*.json') as $filename)
{
//echo $filename;
$data = file_get_contents($filename);
if($data!=null)
{
$decoded=json_decode($data,true);
if(isset($decoded["Message"]))
{
//echo "found message<br>";
$message= json_decode($decoded["Message"]);
if(isset($message->configurationItem))
{
// echo"found cfi<br>";
$insert_array = array();
$cfi = $message->configurationItem;
switch ($cfi->configurationItemStatus)
{
case "ResourceDiscovered":
//echo"found Resource Discovered<br>";
if (array_key_exists($cfi->resourceType,$resource_types))
{
//var_dump($cfi->resourceType);
$resource = new $resource_types[$cfi->resourceType];
foreach ($cfi->configuration as $key => $value)
{
if (in_array($key,$resource->fields))
{
$insert_array[from_camel_case($key)] = $value;
}
}
if (array_key_exists($cfi->awsAccountId,$resource_types))
{
$resource = new $resource_types[$cfi->awsAccountId];
foreach ($cfi->awsAccountId as $key => $value)
{
if (in_array($key,$resource->fields))
{
$insert_array[from_camel_case($key)] = $value;
}
}
$resource->populate($insert_array);
if (!$resource->checkExists())
{
$resource->save();

Show XML tag full path with php

Let's assume we want to process this Feed: http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000
I'm trying to show the nodes of an XML file this way:
deals->deal->dealsite
deals->deal->deal_id
deals->deal->deal_title
This is in order to be able to process feeds that we don't know what their XML tags are. So we will let the user choose that deals->deal->deal_title is the Deal Title and will recognize it that way.
I have been trying ages to do this with this code:
class HandleXML {
var $root_tag = false;
var $xml_tags = array();
var $keys = array();
function parse_recursive(SimpleXMLElement $element)
{
$get_name = $element->getName();
$children = $element->children(); // get all children
if (empty($this->root_tag)) {
$this->root_tag = $this->root_tag.$get_name;
}
$this->xml_tags[] = $get_name;
// only show children if there are any
if(count($children))
{
foreach($children as $child)
{
$this->parse_recursive($child); // recursion :)
}
}
else {
$key = implode('->', $this->xml_tags);
$this->xml_tags = array();
if (!in_array($key, $this->keys)) {
if (!strstr('>', $key) && count($this->keys) > 0) { $key = $this->root_tag.'->'.$key; }
if (!in_array($key, $this->keys)) {
$this->keys[] = $key;
}
}
}
}
}
$xml = new SimpleXMLElement($feed_url, null, true);
$handle_xml = new HandleXML;
$handle_xml->parse_recursive($xml);
foreach($handle_xml->keys as $key) {
echo $key.'<br />';
}
exit;
but here's what I get instead:
deals->deal->dealsite
deals->deal_id
deals->deal_title
See on 2nd and 3rd line the deal-> part is missing.
I have also tried with this code: http://pastebin.com/FkPWXF64 but it's definitely not the best way to go and it doesn't always work.
No matter how many times I couldn't do it.
In one of my sites I use a little different approach to handle xml feed. In your case it would look like:
$xml = simplexml_load_file("http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000");
foreach($xml->{'deal'} as $deal)
{
$dealsite = $deal->{'dealsite'};
$dael_id = $deal->{'dael_id'};
$deal_title = $deal->{'deal_title'};
$deal_url = $deal->{'deal_url'};
$deal_city = $deal->{'deal_city'};
$deal_category = $deal->{'deal_category'};
// and so on for the rest
// do some stuff with the variables like insert into MySQL
}

Categories