i have a question. In my PHP Firebase query i have the problem that it seems to only get 20 documents of my database collection.
I am getting all documents data and then push each entry in a separate array to finally sort the entries.
While everything is working so far - i only seem to get 20 entries each time the code runs on my server.
This is my code for fetching the data:
$tracksCount = 0;
$tracksList = $firestore->collection('lists/'.$listId.'/tracks');
$tracksDocuments = $tracksList->documents();
$sortedTracks = [];
foreach ($tracksDocuments as $track) {
if ($track->exists()) {
$trackData = $track->data();
array_push($sortedTracks, $trackData);
}
}
array_multisort( array_column($sortedTracks, "index"), SORT_ASC, $sortedTracks);
foreach ($sortedTracks as $track) {
// pushing fetched data for output....
$tracksCount = $tracksCount + 1;
}
This code is indeed working, i am getting all results that are expected - but only for 20 documents. (If there are fewer documents in the collection, it is getting fewer documents aswell. But if more than 20 documents, it has the upper limit for 20)
I cannot find the problem. Maybe somebody can help?
In my case i was fetching firebase collections by simply calling rest request and i was only getting 20 collection objects.
I was able to get all collections by adding ?pageSize=1000 to the query URL as below.
https://firestore.googleapis.com/v1/projects/<project-name>/databases/(default)/documents/<collection-name>?pageSize=1000
There is no hard limit as such on the maximum number of documents you can request, that would also be not unreasonable.There is actually no documented limit on the number of documents that can be retrieved, although there likely is a physical limit which mostly will depend on the memory and bandwidth of your app.
There is a maximum depth of functions calls in the security rules for Cloud Firestore.
If you use the list method of the Firestore REST API, you can set the “pageSize” parameter in the method to specify the maximum number of documents to return, and then paginate this data to be displayed in a readable format while being able to scroll through page navigate and access these lists of documents.
Also these can be retrieved ID can be passed as an array input , which is something similar you are trying to workaround with.
Check for similar examples below:
How to get all documents where a specific field exists
Is there a workaround for firestore query in limit to 10
Select every document in firestore
Get firestore document with query
Related
I have a programming problem that I need help solving, and that I'm hoping I can get some assistance with here. Basically, I need to find a way to merge array data that's being returned from 2 different RESTful APIs, sort it, and then paginate through it.
Note that I'm dealing with MLS data (ie: Real Estate listings), and that I'm using PHP for this project (exclusively). Also, these are the 2 different APIs that I'm using:
Spark API
https://sparkplatform.com/docs/overview/api
Bridge RESO Web API
https://bridgedataoutput.com/docs/platform/
The problem that I'm having, specifically, is that each of these APIs have different limits as to how many records can be retrieved per request. For example, the Spark API allows me to retrieve as many as 1000 records at a time, whereas the Bridge API only allows me to retrieve no more than 200. I understand why these limits are in place, and it really hasn't been a problem until now. I say this because I've been asked to try to find a way to retrieve Real Estate listings from both of these APIs, to then merge all of the returned data from both of them into a single array, .. and to then sort them based on list price (from highest to lowest price), .. and then paginate through them (50 listings per page).
This wouldn't be a problem if I was dealing with just one of the 2 different API, as both of them have features that make it quite easy to both sort and paginate through the data. And if I was working with queries that retrieve only small amounts of data (ie: less than 50 records total from both APIs combined), I've already implemented working methods that allow for me to merge the 2 different data sets and then sort them based on list price:
$mlsdata = array_merge($mlsdatamiami,$mlsdataftlauderdale);
function price_compare($a, $b) {
$t2 = $a['StandardFields']['ListPrice'];
$t1 = $b['StandardFields']['ListPrice'];
return $t1 - $t2;
}
usort($mlsdata, 'price_compare');
However, I am unfortunately NOT dealing with small data sets, and could potentially be returning as many as tens of thousands of listings from both APIs combined.
Although I've succeeded at writing some code that allows for me to paginate through my new "merged" data set, this obviously only works when I'm dealing with small amounts of data (less than 1200 records).
$finalarray = array_slice($newarray,$startposition,$perpagelimit);
foreach($finalarray as $item) {
...
}
But again, I'm not dealing with result sets of less than 1200 records. So although it might appear that I'm displaying listings from highest price to lowest price on page #1 of a paginated result set, everything starts to fall apart on page #2, where the list prices are suddenly all over the place.
I've tried running multiple different queries in a for loop, pushing the data to a new array and then merging it with the initial result set ..
$miamimlsdataraw = file_get_contents($apiurl);
$miamimlsdata = json_decode($miamimlsdataraw, true);
$number_of_miami_listings = $miamimlsdata['#odata.count'];
$miamilistingsarray = array();
if ($miamimlsdata['#odata.count'] > 200) {
$number_of_miami_queries = floor($number_of_miami_listings / 200);
$miami_listings_start_number = 200;
for ($x = 1; $x <= $number_of_miami_queries; $x++) {
$paramsextra = $params . "&\$skip=" . $miami_listings_start_number * $x;
$apiurl = $baseurl . '/' . $dataset . '/' . $endpoint . '?access_token=' . $accesstoken . $paramsextra;
$miamimlsdataraw = file_get_contents($apiurl);
$miamimlsdata_extra = json_decode($miamimlsdataraw, true);
array_push($miamilistingsarray,$miamimlsdata_extra);
}
}
$miamimlsdata = array_merge($miamilistingsarray,$miamimlsdata);
With this particular experiment, I was only dealing with about 2,700 listings (from only 1 of the APIs) .. and the performance was horrendous. And when I tried writing all of the returned data to a text file on the server (rather then trying to display it in the page), it came in at a whopping 25mb in size. Needless to say, I don't think that I can reliably use this approach at all.
I've considered perhaps setting this up as a cronjob, storing the array data in our database (the site is WordPress based), and then retrieving and paginating through it at runtime .. rather than querying the APIs in realtime. But I now strongly suspect that this would be just as inefficient.
So .. I realize that this question was rather long winded, but I honestly didn't know where else to turn. Is what I'm trying to do simply not possible? Or am I perhaps missing something obvious? I welcome all and any suggestions.
-- Yvan
There is no need to merge and sort all the listings of both MLS. Since you only need 50 listings for each page and both API's are using RESO, you can have the API return only sorted results that you need. For example, to get listings for page 1, you only need:
https://api.bridgedataoutput.com/api/v2/OData/Property?$orderby=ListPrice desc&$top=50
By looping through both arrays simultaneously in a while loop, you can pick out and stop as soon as you get 50 of the top highest price listings from both arrays.
Have a formatResults callback function that adds a "custom calculated" field into the entities post returned from a model query in my Cakephp. I would like to sort by this field and use this on a paginate is this possible?
So far i cannot accomplish this because the paginate limits the records fetched and therefore only records less than the paginator limit get sorted and not all the resultset...
Current code:
$owners = $this->Owners->find('all');
$owners->formatResults(function (\Cake\Collection\CollectionInterface $owners) {
$owners = $owners->map(function ($entity) {
$entity->random = rand(0,1);
return $entity;
});
return $owners->sortBy(function ($item){
return $item->random;
},SORT_DESC);
});
This works as expected:
$owners->toArray();
This does not:
$owners = $this->paginate($owners);
$owners->toArray();
Mainly because its "callback processing" only the first 10 records, i would like to process the whole resultset.
After diggin around ive found a similar topic opened by a previous user on the this link, it seems that is not possible to use pagination sort in other than the fields in the database.
As a result, i would suggest:
1 - Either alter your model logic, to accommodate your requirements by creating virtual fields or alter database schema to include this data.
2 - If the data requires further or live processing and it cannot be added or calculated in the database, perhaps programming a component that will replicate the paginate functionality on a cakephp collection would be a good option.The downside of this approach is that all records will be returned from the database which may present performance issues on large resultsets.
I have an array of IDs that I am doing foreach loop and searching each ID in a SOLR index using Php Apache SOLR client. Its slow like a dead turtle. Any help appreciated in optimizing this
foreach ( $f_games as $game_id ){
$game_type = BKT_PLUGIN_CLASS::tv_regions($game_id);
//Do my stuff
$count++;
}
Where
BKT_PLUGIN_CLASS::tv_regions
is my class method for SOLR API search ( which works fine, no issues there ).
So its doing what i want it to do. It takes each ID and goes to SOLR and brings the result of that item and I do what I want to do and increase count. With only 200+ IDs, it takes more than 2 minutes to spit out results.
Use Result Grouping in Solr - that way you can get x number of hits for each region, all rolled up into a single response. Tweak the number of groups and number of hits for each group to match your need.
Filter the list by having a fq with all the values in, so that it returns the documents you need, then group by the value you'd normally search for.
Why are You pinging API for each game? You are loosing lot of time just to connect there ...
Can't You just pass all IDs and just count result?
I don't know SOLR well but for me it's insane that it's not possible (so I assume that it is doable)
How to do an IN query in Solr?
How can I search on a list of values using Solr/Lucene?
I'm trying to understand the Zend Paginator and would mostly like to make sure it doesn't break my scripts.
For example, I have the following snippet which successfully loads some contacts one at a time:
$offset = 1;
//returns a paginator instance using a dbSelect;
$contacts = $ContactsMapper->fetchAll($fetchObj);
$contacts->setCurrentPageNumber($offset);
$contacts->setItemCountPerPage(1);
$allContacts = count($contacts);
while($allContacts >= $offset) {
foreach($contacts as $contact) {
//do something
}
$offset++;
$contacts->setCurrentPageNumber($offset);
$contacts->setItemCountPerPage(1);
}
However I can have hundreds of thousands of contacts in the database and matched by the SELECT I send to the paginator. Can I be sure it only loads one at a time in this example? And how does it do it, does it run a customized query with limit and offset?
From the official documentation : Zend Paginator Usage
Note
Instead of selecting every matching row of a given query, the DbSelect
adapter retrieves only the smallest amount of data necessary for
displaying the current page. Because of this, a second query is
dynamically generated to determine the total number of matching rows.
If your using Zend\Paginator\Adapter\DbSelect it will apply limit and offset to the query you're passing it, and it will just fetch the wanted records. This is done in the getItems() function of DbSelect, you could see that these lines in the source code.
You could also read this from the documentation :
This adapter does not fetch all records from the database in order
to count them. Instead, the adapter manipulates the original query to
produce a corresponding COUNT query. Paginator then executes that
COUNT query to get the number of rows. This does require an extra round-trip to the database, but this is many times faster than
fetching an entire result set and using count(), especially with
large collections of data.
We have a big index, around 1 Billion of documents. Our application does not allow users to search everything. They have subscriptions and they should be able to search only in them.
Our first iteration of the index used attributes, so a typical query looked like this (we are using PHP API):
$cl->SetFilter('category_id', $category_ids); // array with all user subscriptions
$result = $cl->Query($term,"documents");
This worked without issues, but was very slow. Then we saw this article. The analogy with un-indexed MySQL query was alarming and we decided to ditch the attribute based filter and try with a full text column. So now, our category_id is a full_text column. Indeed our initial tests showed that searching is a lot faster, but when we launched the index into production we ran into an issue. Some users have many subscriptions and we started to receive this error from Sphinx:
Error: index documents: query too complex, not enough stack (thread_stack_size=337K or higher required)
Our new queries look like this:
user_input #category_id c545|c547|c549|c556|c568|c574|c577|c685...
When there are too many categories the above error shows up. We thought it will be easy to fix, by just increasing thread_stack to higher value, but it turned out to be limited to 2MB and we still have queries exceeding that.
The question is what to do now? We were thinking about splitting the query into smaller queries, but then how will we aggregate the results with the correct limit (we are using $cl->SetLimits($page, $limit); for pagination)?
Any ideas will be welcome.
You can do the 'pagination' in the application, this is sort of how sphinx does merging when quering distrubuted indexes.
$upper_limit = ($page_number*$page_size)+1;
$cl->setLimits(0,$upper_limit);
foreach ($indexes as $index) {
$cl->addQuery(...);
}
$cl->RunQueries()
$all = array;
foreach ($results) {
foreach (result->matches) {
$all[$id] = $match['weight'];
}
}
asort($all);
$results = array_slice($all,$page,$page_size)
(I know its not completely valid PHP, its just to show the basic procedure)
... yes its wasteful, but in practice most queries are the first few pages anyway, so doesnt matter all that much. Its 'deep' results will be particully slow.