Parsing big Affiliate XML feed hurting site's performance

Parsing big Affiliate XML feed hurting site's performance - php

My client requested that I implement an affiliate feed with which has the format of XML. However, the file is huge with 650k lines! I tried parsing it using a simpleXML, it worked, but it's extremely slow. As a result, the website sometimes does not load.
<?php
$html = "";
$url = "http://www.digitick.com/rss/distributeur/fluxAffiliation195_815.xml";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
$title = $xml->content->eventList->event[$i]->eventName;
$link = $xml->content->eventList->event[$i]->eventUrl;
$description = $xml->content->eventList->event[$i]->eventPresentation;
$dateStart = $xml->content->eventList->event[$i]->dateStart;
$img = $xml->content->eventList->event[$i]->pictureUrl;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "<img src=$img>";
$html .= "$description";
$html .= "<br />$dateStart<hr />";
}
echo $html;
?>
What can I do to handle the this dynamic file (which updates every morning at 5am?
Thank you in advance!

I would import the file into a SQLite database file with a cron job. Then use SQL to request parts of it in your application. As an alternative cache the generated output and only read the large file once a day.

Related

Change parametr (url and image url) generating file from XML

I have a xml rss feed that I'm using on my website, with this code I'm generating html from xml file:
$html = "";
$url = "http://books.com/new_bookss/?format=xml";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
$link = $xml->resource[$i]->book_link;
$title = $xml->resource[$i]->book_title;
$img = $xml->resource[$i]->image_url;
$html .= "<img src=\"$img\"><br>$title";
}
echo $html;
Generated $link and $img looks like this:
http://books.com/new_books/booktitle/ /*this is for $link*/
http://images.books.com/img/booktitle.jpg /* this is for $img*/
I have to change these urls that way:
http://books.com/new_books/booktitle/ to http://mywebsite/new_books/booktitle/
http://images.books.com/img/booktitle.jpg to http://mywebsite//img/booktitle.jpg
URLs structure looks same every time:
http://books.com/new_books/booktitle/
http://books.com/new_books/something/
http://books.com/new_books/else/
Stricture on my website is same:
http://mywebsite.com/new_books/booktitle/
http://mywebsite.com/new_books/something/
http://mywebsite.com/new_books/else/
Same for $img, so the only thing I have to change is books.com to mywebsite.com

This is how I did it:
$link = str_replace("books.com","mywebsite.com",$link);
Added after:
$link = $xml->resource[$i]->book_link;

integrating two small php codes together

I am using php simple dom parser. I have a list of urls (i.e. urls.txt) which I need to download in plain text. What I am trying to achieve here is that iterating urls, extracting html/text and writing extracted texts into a text file (i.e. plain.txt) incrementally. I have written two separate codes, but I need more insight about successfully integrating them into a single one in order to automate the process. Thank you.
<?php
include('simple_html_dom.php');
$Handler = fopen("urls.txt", "a+");
$Urls = fgets($Handler);
while (!feof($Handler)) {
$Urls = fgets($Handler);
echo $Urls ."<br />\n";
}
fclose($Handler);
?>
<?php
$html = file_get_html('http://example.com')->plaintext;
$Dump = fopen("plain.txt", "a+");
fwrite($Dump, $html);
fclose($Dump);
?>

You can create a function for the second script:
function func($url) {
$html = file_get_html($url)->plaintext;
$Dump = fopen("plain.txt", "a+");
fwrite($Dump, $html);
fclose($Dump);
}
and then your first script become:
include('simple_html_dom.php');
$Handler = fopen("urls.txt", "a+");
$Urls = fgets($Handler);
while (!feof($Handler)) {
$Urls = fgets($Handler);
func($Urls);
}
fclose($Handler);

Generate playlist from files recursive

I have multiple folders with small videos and i need to make a rss feed from them.
Structure of the files is like this:
MainFolder ->
-> File.Summer-> file.summer.p01f1.mp4, file.summer.p01f2.mp4......,file.summer.p02f1.mp4 ,file.summer.p02f2.mp4.... and so on.
-> File.Winter-> file.winter.p01f1.mp4, file.winter.p01f2.mp4......,file.winter.p02f1.mp4,file.winter.p02f2.mp4.... and so on.
What i need is a solution to generate those rss creating a feed like file.summer.p01.rss and the content is all the files containing p01f1, p01f2 then next to file.summer.p02.rss. In all the folders in the path.
I found a solution which makes a single rss with all the files in the folder but i need the rss to be split in parts matching the file names.
It can be a windows bat or a php script.
Thank you.
this is the script i have now:
$list="file.summer";
$part="01";
$ser_dir = 'c:\www\php\movs\\'.$list;
$rss_head = '<rss version="2.0">';
$movs_path="../php/movs/";
$iosser = "/iosser/";
$file_out = $ser_dir."\\".$list.".p".$part.".rss";
$xml = $rss_head. "\n";
$xml .="<channel>"."\n";
foreach (glob($ser_dir."\*.p".$part."*.srt" ,GLOB_NOSORT) as $filename) {
$xml .="<item>". "\n";
$xml .= "<title>".basename($filename, ".srt")."</title>\n";
$filename=basename($filename);
$xml .= "<image>".$movs_path.$list."/".substr($filename, 0,-7).".jpg</image>\n";
$xml .= '<source file="'.$iosser.$list."/".basename($filename,".srt").'.mov"/>'."\n";
$xml .="</item>". "\n";
}
$xml .="</channel>"."\n";
$xml .= "</rss>";
file_put_contents($file_out,$xml);
echo $list ."\n";
echo $part;

lacking photos from external Url php

I am using server side for getting photos from external url. I am using simple php dom library for getting this as per SO suggestion. But I am lacking performance in this. I mean for some sites I am not able to get all the photos.
$url has the example external site which is not giving me all the images.
$url
="http://www.target.com/c/baby-baby-bath-bath-safety
/-/N-5xtji#?lnk=nav_t_spc_3_inc_1_1";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $imageUrl = $tag->getAttribute('src');
echo "<br />";
}
Is this possible I can have functionality/accuracy similar to the option of Firefox
Firefox-> tools -> page info -> media
I mean I just want to be more accurate for this as the existing library is not fetching all images. Also I tried file_get_content...which is also not fetching all the images.

You need to use regular expressions to get images' src. DOMDocument build all DOM structure in memory, You needn't it. When You get URLs, use file_get_contents() and write data to files. Also add max_execution_time if You'll parse many pages.

Download images from remote server
function save_image($sourcePath,$targetPath)
{
$in = fopen($sourcePath, "rb");
$out = fopen($targetPath, "wb");
while ($chunk = fread($in,8192))
{
fwrite($out, $chunk, 8192);
}
fclose($in);
fclose($out);
}
$src = "http://www.example.com/thumbs/thumbs-t2/1/ts_11083.jpg"; //image source
$target = dirname(__FILE__)."/images/pic.jpg"; //where to save image with new name
save_image($src,$target);

Problem in creating xml file from php

I need help in creating a xml file from my php code. I created a function which forms the xml as i want it, but i dont know how to save the formed xml in a file.
Also if you could also tell me, for next time, how can i update the already created xml file.
My code looks like below:
public function create_xml()
{
$output = "<?xml version=\"1.0\" encoding=\"utf-8\" ?> ";
$output .= "<data>";
$output .= "<news>";
$news_articles = new C7_News();
$db = C7_Bootstrap::getDb();
$foxsport_sql = "SELECT headline, link FROM c7_news
WHERE source = 'foxsports' AND category = 'Breaking News'
LIMIT 0,4";
$foxsport_rowset = $db->fetchAll($foxsport_sql);
$data = array();
foreach($foxsport_rowset as $foxsport_row)
{
$output .= "<title>";
$output .= "<description>";
$output .= "<![CDATA[";
$output .= $foxsport_row['headline'];
$output .= "]]>";
$output .= "<link>";
$output .= $foxsport_row['link'];
$output .= "</link>";
$output .= "</title>";
}
$output .= "</news>";
$output .= "</data>";
}
I might be doing something wrong too, plz let me know the best way to create the xml.
Thank you
Zeeshan

I'll add some more information here.
Also if you could also tell me, for next time, how can i update the already created xml file.
You should parse the XML in order to updated, the SimpleXml extension provides and easy to use API for parsing an writing XML files.
I might be doing something wrong too, plz let me know the best way to create the xml.
There are many ways to do this, but I prefer to use PHP as a "template engine" when writing HTML or XML (or any *ML for that matter).
<?php echo "<?xml version=\"1.0\" encoding=\"utf-8\" ?> " ?>
<?php
$news_articles = new C7_News();
$db = C7_Bootstrap::getDb();
$foxsport_sql = "SELECT headline, link FROM c7_news
WHERE source = 'foxsports' AND category = 'Breaking News'
LIMIT 0,4";
$foxsport_rowset = $db->fetchAll($foxsport_sql);
?>
<data>
<news>
<?php foreach($foxsport_rowset as $foxsport_row):
<title>
<description>
<![CDATA[
<?php echo $foxsport_row['headline'] ?>
]]>
</description>
<link><?php echo $foxsport_row['link']</link>
</title>
<?php endforeach; ?>
</news>
</data>
This will output the XML and is a lot easier to read
but i dont know how to save the formed xml in a file.
As for how to save this to a file you should have another PHP file to include this "template" (suppose the template is called xml_template.php):
ob_start();
include dirname(__FILE__) . DIRECTORY_SEPARATOR . 'xml_template.php';
$output = ob_get_clean();
Then you have again the XML string on the $output variable and can do as noted by Vlad.P:
// If file is missing, it will create it.
$file = fopen("file.xml", "w");
fwrite($file, $output);
fclose($file);

Add this at the end of your function.
// If file is missing, it will create it.
$file = fopen("file.xml", "w");
fwrite($file, $output);
fclose($file);

Use
file_put_contents( $file, $output );
You might want to pass in the argument for the path of the file:
$stream->create_xml( $file );
or reason that that should be handled by another method/class, eg
$stream->create_xml();
$stream->save_xml( $file );

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing big Affiliate XML feed hurting site's performance - php

I would import the file into a SQLite database file with a cron job. Then use SQL to request parts of it in your application. As an alternative cache the generated output and only read the large file once a day.

Related

Change parametr (url and image url) generating file from XML

integrating two small php codes together

Generate playlist from files recursive

lacking photos from external Url php

Problem in creating xml file from php

Categories

Resources