Bibliography in PHP with CSL - php

I am trying to display a bibliography in PHP and allowing the use of CSL to format it, but am coming up short of good examples of how to implement it. Basically, I am looking for a library or script which can take a bibliography, in the form of Bibtex or JSON or similar, and output it as HTML through PHP.
Formatting with CSL, through for example citeproc-php, would accomodate a vast variety of output styles. Does anyone know of any examples of this, or up-to-date libraries for doing so?

The author of citeproc-php answered an issue on GitHub with some details:
<?php
include 'vendor/autoload.php';
use \AcademicPuma\CiteProc\CiteProc;
$bibliographyStyleName = 'apa';
$lang = "en-US";
$csl = CiteProc::loadStyleSheet($bibliographyStyleName);
$citeProc = new CiteProc($csl, $lang);
$file = file_get_contents("citations.json");
$data = json_decode($file);
echo "<ul>";
foreach ($data as $item) {
echo "<li>".$citeProc->render($item)."</li>";
}
echo "</ul>";
?>
And this works as expected with a sample citations.json from citeproc-js.

Related

PHP Crawler not crawling all elements

so i'm trying to make a PHP crawler (for personal use).
What the code does is displaying "found" for each ebay auction item found that ends in less than 1 hour but there seems to be a problem. The crawler can't get all the span elements and the "remaining time" element is a .
the simple_html_dom.php is downloaded and not edited.
<?php include_once('simple_html_dom.php');
//url which i want to crawl -contains GET DATA-
$url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';
$html = new simple_html_dom();
$html->load_file($url);
foreach($html->find('span') as $part){
echo $part;
//when i echo $part it does display many span elements but not the remaining time ones
$cur_class = $part->class;
//the class attribute of an auction item that ends in less than an hour is equal with "MINUTES timeMs alert60Red"
if($cur_class == 'MINUTES timeMs alert60Red'){
echo 'found';
}
}
?>
Any answers would be useful, thanks in advance
Looking at the fetched HTML it seems as if the class alert60Red is set through JavaScript. So you couldn't find it as JavaScript is never executed.
So just searching for MINUTES timeMs looks stable as well.
<?php
include_once('simple_html_dom.php');
$url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';
$html = new simple_html_dom();
$html->load_file($url);
foreach ($html->find('span') as $part) {
$cur_class = $part->class;
if (strpos($cur_class, 'MINUTES timeMs') !== false) {
echo 'found';
}
}
If a snippet of code is included in another php file, or html is embedded in php, your browser cannot see it.
So no webcrawl api can detect it. I think your best bet is to find the location of simple_html_Dom.php and try crawl that file somehow. You may not even be able to get access to it. It's tricky.
You could also try find by Id if your api has that function?

how can read whole of page that variable address using php

how can Dynamic this code with php .my address is variable
<?php
$pagecontents = file_get_contents("http://google.com");
$html = htmlentities($pagecontents);
echo $html;
?>
I'm not sure, I understood, what the goal is, but if you want to do the same thing, that you have shown in the question but with multiple sites, then you can do it with a simple loop:
$sites = ["aa.com/a", "aa.com/b"] // array(...) with earlier PHP versions
foreach($sites as $url) {
$pagecontents = file_get_contents($url);
echo htmlentities($pagecontents);
}
If this is not what you are looking for, then please refactor the question, so it clearly explains, what you want to do!

Is it possible to parse and output an XML document to HTML in document order using PHP?

I apologize if this is something that has been discussed thoroughly here. I spent a good amount of time searching for an answer and can't seem to dig a definitive one up. Also, I'm pretty much an amateur when it comes to PHP.
I have an XML document that I'm trying to render as HTML using PHP. The XML is actually "theoretically" used for single-sourcing, meaning the hope is for using it as a source for producing both PDF and HTML output. So, the XML is actually made up of elements that serve as the parts of a manual. See below:
<topic>
<title>Some topic.</title>
<caution>
<para>Cautionary note 1.</para>
</caution>
<para>Paragraph 1</para>
<caution>
<para>Cautionary note 2.</para>
</caution>
<para>Paragraph 2</para>
<figure>
<graphic src="../some_path"/>
</figure>
</topic>
For PDF, I plan on writing an XSLT transform. For HTML, I have a script that successfully converts all the elements of the XML into HTML. The problem is, it seems to be converting the XML as parsed arrays. The above example renders with (using HTML equivalents):
All figure elements at the top.
All para elements in the middle (excluding para elements within the caution elements).
All caution elements at the bottom.
So, the document order is not preserved. Since this source is used to put together a manual, the document order is important.
My question: Is it possible to use PHP to parse XML into HTML and preserve document order? Again, I'm pretty much a PHP amateur so I apologize if my question seems obtuse. Any advice or pointers is welcome.
EDITED TO ADD SOME OF THE PHP
The PHP script is pretty lengthy so I will try to include as much of the relevant bits as possible.
In terms of the cautionary notes, the script initially defines a function:
//Display the cautions from the xml
function display_cautions($cautions)
{
foreach( $cautions as $caution )
{
foreach( $caution->para as $cautionpara )
{
printf("<p class='Text'>");
foreach( $cautionpara->{'attention-icon'} as $attentionicon )
{
printf("<img src='../../images/%s' width='25px' height='25px'>", htmlspecialchars($attentionicon['srcfile']));
}
$cautionpara = str_ireplace('<emph>', '<b>', $cautionpara);
$cautionpara = str_ireplace('</emph>', '</b>', $cautionpara);
printf("%s</p>", htmlspecialchars($cautionpara));
}
}
}
Then it constructs the document:
if (isset($_GET['ID']))
{
$introsections = $doc->{'intro-section'};
//printf("got here 1");
foreach( $introsections as $introsection )
{
//printf("got here 2");
$sectno = $introsection['sect-no'];
if ($sectno == $_GET['ID'])
{
$smtitle = $introsection->title->nodeValue;
$introgenerals = $introsection->{'intro-general'};
foreach( $introgenerals as $introgeneral )
{
//printf("got here 3");
$smtitle = $introgeneral->title;
$introid = $introgeneral['id'];
//Display the title
printf("<h2>%s</h2>", htmlspecialchars($smtitle));
//Match found
$topics = $introgeneral->topic;
foreach( $topics as $topic )
{
//Display the title
printf("<p class='Text12Bold'>%s</p>", htmlspecialchars($topic->title));
$topicparas = $topic->para;
//Get all the paragraphs
foreach( $topicparas as $topicpara )
{
//MH added: code to include xrefs here
//check for xrefs in this para
$xrefs = $topicpara->xref;
if (simple_xml_count($xrefs) > 0) {
//printf("<p class='Text'>%s</p>\n", htmlspecialchars($topicpara));
display_referral_links($xrefs, $topicpara);
} else {
$topicpara = str_ireplace('<emph>', '<b>', $topicpara);
$topicpara = str_ireplace('</emph>', '</b>', $topicpara);
?>
<p class='Text'><?php printf($topicpara) ?></p>
<?php
}
}
Then it calls the caution function:
//Get all the cautions with images where applicable
display_cautions($topic->caution);
In between the document construction and the function call, it constructs tables, subtopics and figures. Similar functions are in place for warnings, notes, and lists.

Is using PHP's explode() for HTML scraping considered a bad practice?

I have been coding for a while now but just can't seem to get my head around regular expressions.
This brings me to my question which is the following: is it bad practice to use PHP's explode for breaking up a string of html code to select bits of text? I need to scrape a page for various bits of information and due to my horrific regex knowledge (In a full software engineering degree I had to write maybe one....) I decided upon using explode().
I have provided my code below so someone more seasoned than me can tell me if it's essential that I use regex for this or not!
public function split_between($start, $end, $blob)
{
$strip = explode($start,$blob);
$strip2 = explode($end,$strip[1]);
return $strip2[0];
}
public function get_abstract($pubmed_id)
{
$scrapehtml = file_get_contents("http://www.ncbi.nlm.nih.gov/m/pubmed/".$pubmed_id);
$data['title'] = $this->split_between('<h2>','</h2>',$scrapehtml);
$data['authors'] = $this->split_between('<div class="auth">','</div>',$scrapehtml);
$data['journal'] = $this->split_between('<p class="j">','</p>',$scrapehtml);
$data['aff'] = $this->split_between('<p class="aff">','</p>',$scrapehtml);
$data['abstract'] = str_replace('<p class="no_t_m">','',str_replace('</p>','',$this->split_between('<h3 class="no_b_m">Abstract','</div>',$scrapehtml)));
$strip = explode('<div class="ids">', $scrapehtml);
$strip2 = explode('</div>', $strip[1]);
$ids[] = $strip2[0];
$id_test = strpos($strip[2],"PMCID");
if (isset($strip[2]) && $id_test !== false)
{
$step = explode('</div>', $strip[2]);
$ids[] = $step[0];
}
$id_count = 0;
foreach ($ids as &$value) {
$value = str_replace("<h3>", "", $value);
$data['ids'][$id_count]['id'] = str_replace("</h3>", "", str_replace('<span>','',str_replace('</span>','',$value)));
$id_count++;
}
$jsonAbstract = json_encode($data);
echo $this->indent($jsonAbstract);
}
I highly recommend you try out the PHP Simple HTML DOM Parser library. It handles invalid HTML and has been designed to solve the same problem you're working on.
A simple example from the documentation is as follows:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
It's not essential to use regular expressions for anything, although it'll be useful to get comfortable with them and know when to use them.
It looks like your scraping PubMed, which I'm guessing has fairly static mark-up in terms of mark-up. If what you have works and performs as you hope I can't see any reason to switch over to using regular expressions, they're not necessarily going to be any quicker in this example.
Learn regular expressions and try to use a language that has libraries for this kind of task like perl or python. It will save you a lot of time.
At first they might seem daunting but they are really easy for most of the tasks.
Try reading this: http://perldoc.perl.org/perlre.html

Parsing XML with PHP (simplexml)

Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples
Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}
<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here
Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}
$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}

Categories