php xpath run multiple queries on xml file and then match results

php xpath run multiple queries on xml file and then match results - php

I have an XML Feed with deals and I need to get the titles and the images.
I use xpath and this is an example: /deals/deal/deal_title (for deal title)
/deals/deal/deal_image (for deal image)
The problem is that some deals don't have the deal image set at all so when I link deal title with deal image I sometimes get the wrong image.
In order to track down the problem I created two separate arrays: one with titles and the other one with images.
The weird thing that causes the problem is that on the images array the empty instances are moved to the end of the array.
For example if we assume that "deal title2" has no image and "deal title3" has image the "deal title3" image is used for "deal title2".
Use this link to see the code I made: http://pastebin.com/HEuTJQjZ
The interesting part starts from: $doc = new DOMDocument();
Basically what it does is to execute many xpath queries to get titles, images, prices etc and then it adds them to the database.
The problem starts when a deal doesn't have a tag set so it just uses the next value.
I don't understand how it magically moves all the empty instances to the bottom. Xpath isn't supposed to order the results, right?
I have even tried to use the [] operators to get the specific image but doesn't help since the results are sorted the wrong way.
Example feed: http://www.clickbanner.gr/xml/?xml_type=deals&affiliate_ID=14063
EDIT:
The real problem is that xpath does not order the results by document order and modifies the expected order. Is this a bug or something or is there a way to force the results to order by document order? See also: XPath query result order
Thank you in advance.

Evaluate the following two XPath expressions for any values of $k in the interval [1, count(/deals/deal)]:
/deals/deal[$k]/deal_title
and
deals/deal[$k]/deal_image
In this way you know whether an image was selected, or not.
For example, if count(/deals/deal) is 3, then you will evaluate these XPath expressions:
/deals/deal[1]/deal_title and deals/deal[1]/deal_image
/deals/deal[2]/deal_title and deals/deal[2]/deal_image
/deals/deal[3]/deal_title and deals/deal[3]/deal_image

I think you should try this way:
Walkthrough /deal/deal_id tag values
When search for a pair of tags: /deal[/deal_id="$deal_id"]/deal_title and /deal[/deal_id="$deal_id"]/deal_image (using real deal_id) in place of $deal_id
You will get pairs of deal_title and deal_image for each deal and they would match each over correct

Related

PHPWord Prevent rows from the same table appearing on different pages

thanks for help with previous PHPWord issue. I have another.
I'm creating documents with tables where each table is basically used as a stylized container for a list of items. For example, in my case i have a collection of legal definitions. Each definition has a code, title, and textual description that appears in a table. So when there's multiple definitions, each definition has its own table, and appears like so:
and so on. Each table isnt really a table, its more i'm kind of hijacking tables to stylize my document. The problem is when I have a long list of items, invariably some of the tables will be split between pages where the top row of the table will be at the very bottom of the page and bottom row of the table will be at the very top of the next page, like so
This is very undesired. Is there any way to tell PHP word that "hey, if this table is going to be split between pages, just put the whole table on the next page" ??
I'm also using PDFmake for making pdf's and it has a pageBreakBefore function that can be used for exactly this purpose. I notice that paragraphs have a pageBreakBefore style which can force each pragraph to appear on a new page, but this isnt what i'm looking for. Is there some way i can maybe get into how PHPWord builds the document to put a conditional test in maybe?
Any input is greatly appreciated thanks.

I had the same problem .... and found the answer today in "normal" MsWord documentation. I found the equivalent in the phpWord doucmentation, tried it and it works:
In your paragraph formats, set 'keepNext' => true

Pagination with XML and PHP?

before anyone asks; I've googled my 'question', I've also looks at the 'Questions that may already have your answer' and none of them work.
What I'm wanting to do is 'Pagination'. However, I don't want to use Databases as I've never had to and I'd rather not give up and go to them now as XML does everything I want it for.
The code I have is the following:
$files = glob('include/articles/*.xml');
foreach($files as $file){
$xml = new SimpleXMLElement($file, 0, true);
}
I've tried these ones already: XML pagination with PHP, PHP XML pagination and Pagination Filtered XML file and have achieved nothing. I have also tried a lot of Javascript 'pagination' scripts and still nothing.
So to sum it up: I have four articles (More to be added) and I want to show 2articles per a page. The following information will be 'pulled' from the xml file: ID, TITLE, CONTENT, PICTURE, AUTHOR, DATE by doing $xml->id and so on for the rest of them. Does anyone know of any way of doing this? as I've spent the past four hours (Its 4:04AM GMT) and have found nothing that works yet. (If I find anything that does work I'll make sure to update the question with the working code encase there is anyone else out there that needs help with this too.)

For a start define the order in which you want your articles to appear. I.e. which article goes on page 1, which one on page 2, etc. This is important, because that order will be the base for your pagination algorithm. Please note that glob() is not guaranteed to return results in any specific order, which means the order can change from one invocation of your script to another (notably when you add new articles) -- almost certainly not what you want.
Then the second step is to introduce another variable which is part of your URL that denotes the actual page (number) you're on. The URL query string would be a natural choice for putting this information, so your URL's look like: article.php?page=1. On the PHP side you can use the $_GET superglobal to retrieve the query string parameters.
Thirdly, use the new style URL's whenever you link to your article.php script. Additionally, validate the input --especially when you also want to display the current page based on this parameter (or you will end up with an injection vulnerability). This also means you want to have a default value (in case the value is invalid/wrong/ or not supplied at all for some reason).
Finally, filter your articles based on the two key pieces of information: the order of the articles w.r.t. the page number and the page number: i.e compute the actual articles that should appear on the current page.

PHP searching and counting the occurence of a string in a file in atom format

I'm looking for a way to parse a URL that's in the atom format, for example, the results shown here - http://search.twitter.com/search.atom?q=Stackoverflow&:)&since:2011-05-24&rpp=100&page=1
So far, I tried using the file_get_contents(); function, and saving this to a text document, but it's only outputting in 21kb chunks (each time I re-run the script, it appends a new, extra 21kb onto the end of the existing file)
I need to be able to find the amount of times the string <published> occurs in the document (in order to find how many tweets are published on the page). Is there a function I can use to either search&count in the HTML of the URL directly, or one to save the HTML of the URL (the entirety of it, around 120kb) to a file locally, and then search&count that file?

All i can think of here is using SimpleXML to parse it, use Xpath to find just the published tags and then count the number of results from that Xpath. This is probably the way I'd do it but then again you could always use preg_match which does return the number of times your regex matches in the string

Intelligently grab first paragraph/starting text

I'd like to have a script where I can input a URL and it will intelligently grab the first paragraph of the article... I'm not sure where to begin other than just pulling text from within <p> tags. Do you know of any tips/tutorials on how to do this kind of thing?
update
For further clarification, I'm building a section of my site where users can submit links like on Facebook, it'll grab an image from their site as well as text to go with the link. I'm using PHP and trying to determine the best method of doing this.
I say "intelligently" because I'd like to try to get content on that page that's important, not just the first paragraph, but the first paragraph of the most important content.

If the page you want to grab is foreign or even if it is local but that you don't know its structure in advance, I'd say the best to achieve this would be by using the php DOM functions.
function get_first_paragraph($url)
{
$page = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($page);
/* Gets all the paragraphs */
$p = $doc->getElementsByTagName('p');
/* extracts the first one */
$p = $p->items(0);
/* returns the paragraph's content */
return $p->textContent;
}

Short answer: you can't.
In order to have a PHP script "intelligently" fetch the "most important" content from a page, the script would have to understand the content on the page. PHP is no natural language processor, nor is this a trivial area of study. There might be some NLP toolkits for PHP, but I still doubt it would be easy then.
A solution that can be achieved with reasonable effort would be fetch those entire page with an HTML parser and then look out for elements with certain class names or ids commonly found in blog engines. You could also parse for hAtom Microformats. Or you could look out for Meta tags within the document and more clearly defined information.

I wrote a Python script a while ago to extract a web page's main article content. It uses a heuristic to scan all text nodes in a document and group together nodes at similar depths, and then assume the largest grouping is the main article.
Of course, this method has its limitations, and no method will work on 100% of web pages. This is just one approach, and there are many other ways you might accomplish it. You may also want to look at similar past questions on this subject.

Drupal: Accessing all rows in a view as an array

Apologies for the awkward wording in this question; I'm still trying to wrap my head around the beast that is Drupal and haven't quite gotten the vocabulary down yet.
I'm looking to access all rows in a view as an array (so I can apply some array sorting and grouping functions before display) in a display output. The best I can tell, you are able to access individual rows as an array using row-style output, but seemingly not in display output.
Thanks!

You have to change the Row style setting: to NODE.
Click on Theme Information.
Create an file with the name of one you find in the Display output point (I would use the second one eq. views-view--portfolio.tpl.php)
And now you can use your own Node Template and access the $node variable.

Ultimately, I had to use node_load on each item and load the results of that into an array. Inefficient, but it worked.

I found this thread on Drupal.org about this question, but those solutions don't quite work.
How to get a "result Array" with views_get_view() (as with views_get_current_view())
They return only the list of IDs, not the actual rendered fields.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.