I'm trying to take a list of 40,000 items and run a json request for each item. The process is super slow which I suspect is because each json request is finishing before the next request starts.
$result = mysql_query("SELECT * FROM $tableName");
while($row = mysql_fetch_array($result))
{
checkRank($row['Domain']);
}
checkRank is running something to the effect of:
$json = file_get_contents($jsonurl,0,null,null);
I'm thinking I could run 10 checkRanks at a time to speed the process up? Any other ideas / suggestions?
UPDATE:
For example this loop runs through my array in 27 secs.
for ($i=0; $i<=100; $i++) {
checkRank($domains[$i]);
$i++;
checkRank($domains[$i]);
$i++;
checkRank($domains[$i]);
$i++;
checkRank($domains[$i]);
$i++;
checkRank($domains[$i]);
$i++;
echo "checking " . $i . "<br/>";
}
The loop below takes over 40 seconds with the same array.
for ($i=0; $i<=100; $i++) {
checkRank($domains[$i]);
echo "checking " . $i . "<br/>";
}
Not sure if this will help much because I don't work with PHP but I did find this.
How can one use multi threading in PHP applications
Unless there is something else that you haven't mentioned, the best way to do this would be to do one JSON request and pass your items to it, and get the equal number of results back. That way, you minimize the server response time. I am not sure if you want to send 40,000 items in though, you might want to divide it into 2 parts, but that you can test this later.
so you checkRank() would look something like
function checkRank($domainsArray) {
$json = file_get_contents($jsonurl,$domainsArray);
}
http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
This seems to be a nice way to speed up the processing. Thanks for all the input guys.
... it's a shame, I read on Stack Overflow today that PHP could not support threading as it would require fundamental changes to the language ...
https://github.com/krakjoe/pthreads
Related
I am sure this has been asked 100 times but cannot find the form of words to get at the answer either here or on Google.
I have a variable number of messages. They arrive as
$_GET['message1']; $_GET['message2']; $_GET['messageX']; etc
where X can be 1 to 100.
I need to test if they exist and then push them out to a DB. I tried
$i=1;
while (isset(parse_str("message$i")))
{
echo parse_str("output=message$i");
echo "<h1>This is test $output </h1>";
$i++;
}
which does not work. I thought the middle part worked but just re-tested and that is wrong too.
I am new to parse_str(). I thought I understood it and I understand the problem (it is a void function so cannot be used as a test) but cannot work out a solution for getting through the variables.
parse_str parses a string. What do you expect in a string "message$i"?
If you're sure that all your messages come from $_GET, use $_GET:
$i = 1;
while (isset($_GET['message' . $i])) {
echo $_GET['message' . $i];
$i++;
}
But obviously for storing such data, arrays are move convenient.
I have made a small script which uses the Twitch API. The API only allows a maximum of 100 results per query. I would like to have this query carry on until there are no more results.
My theory behind this, is to run a foreach or while loop and increment the offset by 1 each time.
My problem however, is that I cannot change the foreach parameters within itself.
Is there anyway of executing this efficiently without causing an infinite loop?
Here is my current code:
<?php
$newcurrentFollower = 0;
$offset=0;
$i = 100;
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
foreach ($json->follows as $follow)
{
echo $follow->user->name . ' (' . $newcurrentFollower . ')' . "<br>";
$newcurrentFollower++;
$offset++;
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
}
?>
Using a While loop:
while($i < $total)
{
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
echo $json->follows->user->name . ' (' . $newcurrentFollower . ')' . "<br>";
$newcurrentFollower++;
$offset++;
$i++;
}
Ends up echoing this (No names are successfully being grabbed):
Here is the API part for $json->follows:
https://github.com/justintv/Twitch-API/blob/master/v2_resources/channels.md#get-channelschannelfollows
You can use this:
$offset = 0;
$count = 1;
do {
$response = json_decode(file_get_contents(
'https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=100&offset=' . $offset
));
foreach($response->follows as $follow) {
echo $follow->user->name . ' (' . ($count++) . ')' . "</br>";
}
$offset+=25;
} while (!empty($response->follows));
You want to use a while loop here, not just a foreach. Basically:
while (the HTTP request returns results)
{
foreach ($json->follows as $follow)
{
do stuff
}
increment offset so the next request returns the next one not already processed
}
The trickiest part is going to be getting the while condition right so that it returns false when the request gets no more results, and will depend on what the API actually returns if there are no more results.
Also important, the cleanest way would be to have the HTTP request occur as part of the while condition, but if you need to do some complicated computation of the JSON return to check the condition, you can put an initial HTTP request before the loop, and then do another request at the end of each while loop iteration.
The problem is you're only capturing the key not the value. Place it into a datastructure to access the information.
Honestly I find a recursive function much more effective than a iterative/loop approach then just update a datatable or list before the next call. It's simple, uses cursors, lightweight and does the job. Reusable if you use generics on it too.
This code will be in c#, however I know with minor changes you'll be able to get it working in php with ease.
query = //follower object get request//
private void doProcessFollowers(string query)
{
HTTPParse followerData = new HTTPParse(); //custom json wrapper. using the basic is fine. Careful with your cons though
var newRoot = followerData.createFollowersRoot(query); // generates a class populated by json
if (newRoot[0]._cursor != null)
{
populateUserDataTable(newRoot); //update dataset
doProcessFollowers(newRoot[0]._links.next); //recurse
}
}
Anyway - This just allows you to roll through the cursors without needing to worry about indexes - unless you specifically want them for whatever reason. If you're working with generics you can just reuse this code without issue. Find a generic example below. All you need to do to make it reuseable is pass the correct class within the <> of the method call. Can work for any custom class that you use to parse json data with. Which is basically what the 'createfollowerroot()' is in the above code, except that's hard typed.
Also I know it's in c# and the topic is php, with a few minor changes to syntax you'll get it working easily.
Anyway Hope this helped somebody
Generic example:
public static List<T> rootSerialize<T>(JsonTextReader reader)
{
List<T> outputData = new List<T>();
while (reader.Read())
{
JsonSerializer serializer = new JsonSerializer();
var tempData = serializer.Deserialize<T>(reader);
outputData.Add(tempData);
}
return outputData;
}
I have write a script for webscraping where i am fetching each link from the page and getting load that url in the code and this working extremely slow this is taking about 50 sec for first output and taking an age to complete about 100 links, I am not getting why this is working so slow, I am thinking about caching but don't know how this could help us.
1) Page caching OR Opcode cache.
code is :
public function searchForum(){
global $wpdb;
$sUrl = $this->getSearchUrl();
$this->logToCrawler();
$cid = $this->getCrawlId();
$html = file_get_dom($sUrl);
$c=1;
foreach($html('div.gridBlobTitle a:first-child') as $element){
$post_page = file_get_dom($element->href);
$post_meta = array();
foreach($post_page('table#mytable img:first-child') as $img){
if(isset($img->src)){
$post_meta['image_main'] = self::$forumurl.$img->src;
}
else{
$post_meta['image_main']=NULL;
}
}
foreach($post_page('table.preferences td:odd') as $elm){
$post_meta[] = strip_tags($elm->getInnerText());
unset($elm);
}
/*Check if can call getPlainText for description fetch*/
$object = $post_page('td.collection',2);
$methodVariable = array($object, 'getPlainText');
if(is_callable($methodVariable, true, $callable_name)){
$post_meta['description'] = utf8_encode($object->getPlainText());
}
else{
$post_meta['description'] = NULL;
}
$methodVariable = array($object, 'getInnerText');
if(is_callable($methodVariable, true, $callable_name)){
/*Get all the images we found*/
$rough_html = $object->getInnerText();
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $rough_html, $matches);
$images = array_map('self::addUrlToItems',$matches[1]);
$images = json_encode($images);
}
if($post_meta[8]=='WTB: Want To Buy'){
$status='buy';
}
else{
$status='sell';
}
$lastdate = strtotime(date('Y-m-d',strtotime("-1 month")));
$listdate = strtotime(date('Y-m-d',strtotime($post_meta[9])));
/*Check for date*/
if($listdate>=$lastdate){
$wpdb->query("INSERT
INTO tbl_scrubed_data SET
keywords='".esc_sql($this->getForumSettings()->search_meta)."',
url_to_post='".esc_sql($element->href)."',
description='".esc_sql($post_meta['description'])."',
date_captured=now(),crawl_id='".$cid."',
image_main='".esc_sql($post_meta['image_main'])."',
images='".esc_sql($images)."',brand='".esc_sql($post_meta[0])."',
series='".esc_sql($post_meta[1])."',model='".esc_sql($post_meta[2])."',
watch_condition='".esc_sql($post_meta[3])."',box='".esc_sql($post_meta[4])."',
papers='".esc_sql($post_meta[5])."',year='".esc_sql($post_meta[6])."',case_size='".esc_sql($post_meta[7])."',status='".esc_sql($post_meta[8])."',listed='".esc_sql($post_meta[9])."',
asking_price='".esc_sql($post_meta[10])."',retail_price='".esc_sql($post_meta[11])."',payment_info='".esc_sql($post_meta[12])."',forum_id='".$this->getForumSettings()->ID."'");
unset($element,$post_page,$images);
} /*END: Check for date*/
}
$c++;
}
Note :
1) I am using [Ganon DOM Parser][1] for parsing the HTML.
[1]: https://code.google.com/p/ganon/wiki/AccesElements
2) On windows XP with WAMP, Mysql 5.5 PHP 5.3, 1 GB of RAM.
If you need more info please comment them.
Thanks
You need to figure out what parts of your program are being slow. There are two ways to do that.
1) Put in some print statements that print out the time in various places, so you can say "Hey, look, this took 5 seconds to go from here to here."
2) Use a profiler like xdebug that will run your program and analyze it while it's running and then you can know which parts of the code are slow.
Just looking at a program you can't say "Oh, that's the slow part to speed up." Without knowing what's slow, you'll probably waste time speeding up parts that aren't the slow parts.
I have this chunk of code which works on my local XAMPP testing server. The problem is, when pushed to the production server, it breaks. This isn't an issue with the database connection, and PHP/MySQL are both at 5.3 on the production server so it isn't the use of old versions of either. I'm thinking it's the use of the foreach loop instead of the more standard while loop; if it's that, why?
<?php
$res = $mysqli->query('DESCRIBE '.$table);
$columnCount = 0;
echo '<ul>';
foreach($res as $field) {
if(in_array($field["Field"], $sheetData[0])) {
echo '<li>';
//var_dump($field);
echo $field['Field'].' - '.$field['Type'];
echo "</li>\r\n";
$columnCount++;
}
}
echo '</ul>';
?>
EDIT: To clarify, it breaks by not outputting anything at all; when inserting a simple echo statement inside the loop, it seems to not even execute it.
EDIT2: I've added an answer below which goes into slightly more detail about what the problem here actually was and why this code does actually work under certain conditions.
So since I asked this question ages ago I figure I should update it with some additional clarification by saying this: what I did first with the foreach loop does work. The caveat being that it only works in PHP 5.4+ as that's when the mysqli_result class implemented the Traversable interface. This means you can iterate over it using a foreach loop in later versions of PHP.
This change apparently wasn't super well-known at the time I posted my question (mid-2013) likely due to the fact that so many servers across the internet still use 5.3--likely because that's the latest version of PHP available to Ubuntu 12.x--which limits its utility to recently updated servers. But when you're in an environment that supports it this is a totally valid technique to use.
Do this instead:
if ($result = $mysqli->query('DESCRIBE ' . $table)) {
$columnCount = 0;
echo '<ul>';
/* fetch associative array */
while ($field = $result->fetch_assoc()) {
if (in_array($field["Field"], $sheetData[0])) {
echo "<li>$field[Field] - $field[Type]</li>\r\n";
$columnCount++;
}
}
echo '</ul>';
/* free result set */
$result->free();
}
If I have this pattern to build ID's:
CX00 where the 0's are replaceable with just a number.. but the following is already in use:
- CX00
- CX02
- CX04
- CX05
- CX07
- CX10
- CX11
- CX12
How can I easily find, either via PHP or MySQL the values CX01, CX03, CX06, CX08, CX09, CX13+ as available values?
I don't know how your data is stored, so I'll leave getting the IDs in an array. Once you do, this will find the next available one.
<?php
function make_id($n) {
$out = 'CX';
if ($n < 10) $out .= '0';
return $out . $n;
}
// Get these from some source
$items = array('CX00', 'CX02', 'CX04', 'CX05');
$id = make_id(0);
for($i=0; in_array($id, $items); $i++)
$id = make_id($i);
echo $id;
?>
This is called a brute-force method, and if you have a lot of IDs there are probably more efficient ways. With 100 maximum, there shouldn't be any problems.
In php simply count up through the ids using a for loop until you have found enough of the unused ids... how many total ids do you expect to have?