Youtube Data API v3 pageToken for arbitrary page

Youtube Data API v3 pageToken for arbitrary page - php

Another question on SO revealed that pageTokens are identical for different searches, provided that the page number and maxResults settings are the same.
Version 2 of the API let you go to any arbitrary page by setting a start position, but v3 only provides next and previous tokens. There's no jumping from page 1 to page 5, even if you know there are 5 pages of results.
So how do we work around this?

A YouTube pageToken is six characters long. Here's what I've been able to determine about the format:
char 1: Always 'C' that I've seen.
char 2-3: Encoded start position
char 4-5: Always 'QA' that I've seen.
char 6: 'A' means list items in a position greater than or equal to the start position. 'Q' means list items before the start position.
Due to the nature of character 6, there are two different ways to represent the same page. Given maxResults=1, page 2 can be reached by setting the page token to either "CAEQAA" or "CAIQAQ". The first one means to start at result number 2 (represented by characters 2-3 "AE") and list 1 item. The second means to return one item before result number 3 (represented by characters 2-3 "AI".
Characters 2-3 are a strange base 16 encoding.
Character 3 uses a list from A-Z, then a-z, then 0-9 and increments by 4 in the list for each increase of 1. The series is A,E,I,M,Q,U,Y,c,g,k,o,s,w,0,4,8. Character 2 goes from A to B to C to D and so on. For my purposes, I'm not working with large result sets, so I haven't bothered to see what happens to the second character beyond a couple hundred results. Perhaps someone working with larger sets will provide an update as to how character 2 behaves after that.
Since the string only contains a start position and an option for ">=" or "<", the same string is used in multiple cases. For instance, with 2 results per page, the start position of the second page is result 3. The pageToken for this is "CAIQAA". This is identical to the token for the third page with one result per page.
Since I'm primarily a php person, here's the function I'm using to get the pageToken for a given page:
function token($limit, $page) {
$start = 1 + ($page - 1) * $limit;
$third_chars = array_merge(
range("A","Z",4),
range("c","z",4),
range(0,9,4));
return 'C'.
chr(ord('A') + floor($start / 16)).
$third_chars[($start % 16) - 1].
'QAA';
}
$limit = 1;
echo "With $limit result(s) per page...".PHP_EOL;
for ($i = 1; $i < 6; ++$i) {
echo "The token for page $i is ".token($limit, $i).PHP_EOL;
}
Please test this function in your project and update the rest of us if you find a flaw or an enhancement since YouTube hasn't provided us with an easy way to do this.
Edit: The page token sequence for YouTube API v3 has been changed, and this system will no longer work. For an example of the most up-to-date and working page tokens, see this page.

YouTube's pagetokens can be treated as indices.
Pagetokens for the first 1000 items can be found here.
Pagetokens for every 10th item in range(1, 100000) can be found here.
The highest available pagetoken is "CJ-NBhAA" which points to the 100.000th item with position 99.999.
The highest possible value for maxresults is 50.
Use pagetoken to specify a starting point and maxresults to specify the number of items.
Examples:
1st item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=1&pageToken=CAAQAA
555th item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=1&pageToken=CKoEEAA
99999th item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=1&pageToken=CJ6NBhAA
10 items starting at 10th item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=10&pageToken=CAkQAA
30 items starting at 555th item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=30&pageToken=CKoEEAA
50 items starting at 9999th item
https://www.googleapis.com/youtube/v3/playlistItems?part=id%2Csnippet&playlistId=<PLAYLISTID>&key=<APIKEY>&maxResults=50&pageToken=CI9OEAA

Using ^ Quihico's files as a reference point, I had a little fun writing an enhancement to the previous poster's pageToken generator, in JS. If my assumption is correct about how the 4000s place encoding varies past N >= 98304, it should be able to construct a pageToken for a page starting with Nth item, provided N in [0, 4194304). It's only tested up to N = 99999, so YMMV.
Link here: https://github.com/aricearice/youtube-page-token/blob/master/index.js

An addendum to thatthatisis's answer, the linked token list is specifically for maxResults=10. I exported a complete list for the currently maximum allowed value for maxResults, 50:
https://github.com/Koushakur/randomStuff/blob/main/YouTube%20API%20pageTokens%20for%20maxResults%3D50.txt
I call it 'complete' because the API just looped around after the last token in the list to the first page even though I used a upload playlist of a channel with a million videos on it to grab the tokens, so it seems like the maximum size of a playlist is about 20700 videos.

Related

Excel - Getting the 2nd or nth matched string from your corresponding data

With my previous posts
1. PHPSpreadsheet generates an error "Wrong number of arguments for INDEX() function: 5 given, between 1 and 4 expected"
2. Excel - Getting the Top 5 data of a column and their matching title but produces duplicates
I have found out that the PHPSpreadsheet library for PHP is yet to allow the usage of the AGGREGATE() and complicated formulas/functions but I'm in dire need of their functionalities
Going back, I have 2 columns in my Excel (produced by my web applications made from CodeIgniter and Laravel)
The problem is, the Article Count column (on the right) contains 2 values of 54 which is supposed to belong to 2 different Publications (on the left) but with the use of the formula =INDEX(E$4:E$38,MATCH(M4,J$4:J$38,0)) it just fetches the 1st matched Publication.
The output should look like this:
The original Table:
My question is, what would be the right function or code in Excel so I could retrieve the SECOND Publication of my matched data?
I'm aiming to target those Publications that has the Article Count of 54, but I want to aim the SECOND ONE which is the letter D WITHOUT using the Aggregate() function of Excel
Here are my used codes
1) =LARGE(J4:J38,1) - J4:J38 is my range of raw data, I am using this to get the 5 highest numbers in descending order
2) =INDEX(E4:E38,MATCH(M4,J4:J38,0)) - I'm using this to retrieve the Publication Names that matched the Article Count

After communicating in chat, we got this correct formula:
=INDEX(E$2:E$38,IF(M4=M3,MATCH(L3,E$2:E$38,0),0)+MATCH(M4,OFFSET(J$2,IF(M4=M3,MATCH(L3,E$2:E$38,0),0),0,COUNT(J$2:J$38)-IF(M4=M3,MATCH(L3,E$2:E$38,0),0),1),0))
How this works:
This IF(M4=M3,MATCH(L3,E$2:E$38,0),0) returns the position of the previous row's publication title in the titles array (E), in case the current publication count is the same with the previous one. Let's call this number X. Instead of using J2:J38 for the results, we use J(2+X):J38. This trick is done by using offset to cut off the previous section, already used by the previous row. This way, on repeating publication counts the already mentioned titles get ignored.

You need to use AGGREGATE's SMALL sub-function to return the smallest matching row number and adjust the k argument to accommodate duplicate rankings.
'in M4
=LARGE(J$4:J$38, ROW(1:1))
'in L4
=INDEX(I:I, AGGREGATE(15, 7, ROW($4:$38)/(J$4:J$38=M4), COUNTIF(M$4:M4, M4)))
enter image description here

Subset Sum floats Elimations

I will be happy to get some help. I have the following problem:
I'm given a list of numbers and a target number.
subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20)
I need to find an algorithm that will find all numbers that combined will sum target number ex: 20.
First find all int equal 20
And next for example the best combinations here are:
11.96 + 8.04
1 + 10 + 9
11.13 + 7.8 + 1.07
9 + 11
Remaining value 15.04.
I need an algorithm that uses 1 value only once and it could use from 1 to n values to sum target number.
I tried some recursion in PHP but runs out of memory really fast (50k values) so a solution in Python will help (time/memory wise).
I'd be glad for some guidance here.
One possible solution is this: Finding all possible combinations of numbers to reach a given sum
The only difference is that I need to put a flag on elements already used so it won't be used twice and I can reduce the number of possible combinations
Thanks for anyone willing to help.

there are many ways to think about this problem.
If you do recursion make sure to identify your end cases first, then proceed with the rest of the program.
This is the first thing that comes to mind.
<?php
subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20);
function subset_sum($a,$s,$c = array())
{
if($s<0)
return;
if($s!=0&&count($a)==0)
return;
if($s!=0)
{
foreach($a as $xd=>$xdd)
{
unset($a[$xd]);
subset_sum($a,$s-$xdd,array_merge($c,array($xdd)));
}
}
else
print_r($c);
}
?>

This is possible solution, but it's not pretty:
import itertools
import operator
from functools import reduce
def subset_num(array, num):
subsets = reduce(operator.add, [list(itertools.combinations(array, r)) for r in range(1, 1 + len(array))])
return [subset for subset in subsets if sum(subset) == num]
print(subset_num([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20))
Output:
[(20,), (11.96, 8.04), (9, 11), (11, 9), (1, 10, 9), (1, 10, 9), (7.8, 11.13, 1.07)]

DISCLAIMER: this is not a full solution, it is a way to just help you build the possible subsets. It does not help you to pick which ones go together (without using the same item more than once and getting the lowest remainder).
Using dynamic programming you can build all the subsets that add up to the given sum, then you will need to go through them and find which combination of subsets is best for you.
To build this archive you can (I'm assuming we're dealing with non-negative numbers only) put the items in a column, go from top to bottom and for each element compute all the subsets that add up to the sum or a lower number than it and that include only items from the column that are in the place you are looking at or higher. When you build a subset you put in its node both the sum of the subset (which may be the given sum or smaller) and the items that are included in the subset. So in order to compute the subsets for an item [i] you need only look at the subsets you've created for item [i-1]. For each of them there are 3 options:
1) the subset's sum is the given sum ---> Keep the subset as it is and move to the next one.
2) the subset's sum is smaller than the given sum but larger than it if item [i] is added to it ---> Keep the subset as it is and move on to the next one.
3) the subset's sum is smaller than the given sum and it will still be smaller or equal to it if item [i] is added to it ---> Keep one copy of the subset as it is and create another one with item [i] added to it (both as a member and added to the sum of the subset).
When you're done with the last item (item [n]), look at the subsets you've created - each one has its sum in its node and you can see which ones are equal to the given sum (and which ones are smaller - you don't need those anymore).
As I wrote at the beginning - now you need to figure out how to take the best combination of subsets that do not have a shared member between any of them.
Basically you're left with a problem that resembles the classic knapsack problem but with another limitation (not every stone can be taken with every other stone). Maybe the limitation actually helps, I'm not sure.
A bit more about the advantage of dynamic programming in this case
The basic idea of dynamic programming instead of recursion is to trade redundancy of operations with occupation of memory space. By that I mean to say that recursion with a complex problem (normally a backtrack knapsack-like problem, as we have here) normally ends up calculating the same thing a fair amount of times because the different branches of calculation have no concept of each other's operations and results. Dynamic programming saves the results and uses them along the way to build "bigger" results, relying on the previous/"smaller" ones. Because the use of the stack is much more straightforward than in recursion, you don't get the memory problem you get with recursion regarding the maintenance of the function's state, but you do need to handle a great deal of memory that you store (sometimes you can optimise that).
So for example in our problem, trying to combine a subset that would add up to the required sum, the branch that starts with item A and the branch that starts with item B do not know of each other's operations. let's assume item C and item D together add up to the sum, but either of them added alone to A or B would not exceed the sum, and that A don't go with B in the solution (we can have sum=10, A=B=4, C=D=5 and there is no subset that sums up to 2 (so A and B can't be in the same group)). The branch trying to figure out A's group would (after trying and rejecting having B in its group) add C (A+C=9) and then add D, in which point would reject this group and trackback (A+C+D=14 > sum=10). The same would happen to B of course (A=B) because the branch figuring out B's group has no information regarding what just happened to the branch dealing with A. So in fact we've calculated C+D twice, and haven't even used it yet (and we're about to calculate it yet a third time to realise they belong in a group of their own).
NOTE:
Looking around while writing this answer I came across a technique I was not familiar with and might be a better solution for you: memoization. Taken from wikipedia:
memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

So I have a possbile solution:
#compute difference between 2 list but keep duplicates
def list_difference(a, b):
count = Counter(a) # count items in a
count.subtract(b) # subtract items that are in b
diff = []
for x in a:
if count[x] > 0:
count[x] -= 1
diff.append(x)
return diff
#return combination of numbers that match target
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
# check if the partial sum is equals to target
if s == target:
print "--------------------------------------------sum_is(%s)=%s" % (partial, target)
return partial
else:
if s >= target:
return # if we reach the number why bother to continue
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i+1:]
rest = subset_sum(remaining, target, partial + [n])
if type(rest) is list:
#repeat until rest is > target and rest is not the same as previous
def repeatUntil(subset, target):
currSubset = []
while sum(subset) > target and currSubset != subset:
diff = subset_sum(subset, target)
currSubset = subset
subset = list_difference(subset, diff)
return subset
Output:
--------------------------------------------sum_is([11.96, 8.04])=20
--------------------------------------------sum_is([1, 10, 9])=20
--------------------------------------------sum_is([7.8, 11.13, 1.07])=20
--------------------------------------------sum_is([20])=20
--------------------------------------------sum_is([9, 11])=20
[15.04]
Unfortunately this solution does work for a small list. For a big list still trying to break the list in small chunks and calculate but the answer is not quite correct. You can see it o a new thread here:
Finding unique combinations of numbers to reach a given sum

Scratchcard PHP script

I'm working on a scratchcard script and I was wondering if someone could help me out, if you don't understand odds this may melt your brain a little!
So, the odds can vary: 1/$x; Let's say for now: $x = 36;
So here's what I am trying to understand...
I want to generate 9 random numbers between 1 and 5.
I want the odds of 3 numbers matching equivalent to 1/36.
It must be impossible to generate over 3 duplicate numbers at a time.
I can imagine an array loop of some kind would probably be the correct way of passage?

Sometimes - and this is one of those times - cheating is the best way to do what you want to do.
a) Set up an array of your 9 numbers, and a 2nd frequency array (5 elements) that counts which number occurs how often.
b) Generate a random number 1-5. Set the 1st and 2nd card to this number, and mark this number with 2 in your freqency array.
c) If random(36) < 1 (1/36 probability), set your 3rd card to the same number and mark this number with 3 in your frequency array.
d) Generate the rest of the cards - find a random number, repeat while frequency of the found number >=2, set the next card to number, increase frequency of the found number.
e) When finished, shuffle the cards (generate 2 random numbers between 1 and 9, swap the 2 cards, repeat 20-30 times).
Part d) is what i call cheating - you've put your 1/36 probabilty in step c), and in d) you just make sure you don't generate another match. e) is used to hide that from the user.

Displaying results hyperlink numbers like a search engine

Does anyone know of a good resource on how to create the hyperlink numbers at the bottom of a results page as search engines do to load the next number of results?
The page would load the first 10 results. And then if you click on the number, it loads corresponding results in that 10 number range.
Example:
0-10 -> show no numbers
11-20 -> show 1, 2
21-30 -> 1, 2, 3
up to 50
anything more than 50 does 1,2,3,4,5.....67 [last number].
My thoughts so far (I'm doing this in PHP/mysqli but the logic is more important than the code):
$total = mysqli_num_rows($result) //total number of reqults from sql query
if ($total>10) {
$last = intval($total/10) + 1 //get the last number of the results
if($last <= 5){
for ($i, $i<$last, $i++){
//print the numbers as hyperlinks
} else {
//print 1 through 5 ... then $last
}
}
This though is static from only 1-5...last number while the search engines have it so if you click on the number, it remembers that number and bases the new logic on it. So if I click on the 5 in my formula, it should change to something like:
[previous] 3,4,5,6,7....67 [next]
And then I would just pass the number to the page itself again and limit the results based on what number was passed. Any suggestions also on the best way to pass the info?

You are looking for a pagination script. Visit this link The page is in Arabic but forth post is of pagination and you can download source for english or arabic version of pagination

Basically, you need two values to create a pagination, a limit and a offset.
1.The limit is the amount of items your are displaying at the same time.
2.The offset is from where you started your query.
So, let's say you have 5 items in each page and 25 items total.
In your query, you have to limit 5,0 (the amount of items and the position the query will start).
Now, if you divide 5(limit)/25(total) and you'll get 5 (amount of pages).
Now in page 0 (the start) you can get the offset by multiplying the page number with the limit, so 0 (page) * 5 (limit) gives you 0 (in the first page you start from the offset 0).
Now in the 3rd page, you multiply 3(page) * 5 (limit) it gives you 15, which means in page 3 (or four if you take into account that you actually started at page 0) you will display from offset 16 to 20.
Finally in page 4 (which to your users will be page 5 because they started at page 1, not page 0) you will display from offset 21 to 25 which are all the items in your query.

Permutations of Varying Size

I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off:
$my_array = array(1,1,2,3);
Possible permutations of varying size:
1
1 // * See Note
2
3
1,1
1,2
1,3
// And so forth, for all the sets of size 2
1,1,2
1,1,3
1,2,1
// And so forth, for all the sets of size 3
1,1,2,3
1,1,3,2
// And so forth, for all the sets of size 4
Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted.
What I have so far in PHP:
function getPermutations($my_array){
$permutation_length = 1;
$keep_going = true;
while($keep_going){
while($there_are_still_permutations_with_this_length){
// Generate the next permutation and return it into an array
// Of course, the actual important part of the code is what I'm having trouble with.
}
$permutation_length++;
if($permutation_length>count($my_array)){
$keep_going = false;
}
else{
$keep_going = true;
}
}
return $return_array;
}
The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient.
Any pseudocode algorithms would be greatly appreciated.
Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next?
For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static).
The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.

Sorry no php code, but I can give you an algorithm.
It can be done with small amounts of memory and since you don't care about dupes, the code will be simple too.
First: Generate all possible subsets.
If you view the subset as a bit vector, you can see that there is a 1-1 correspondence to a set and a binary number.
So if your array had 12 elements, you will have 2^12 subsets (including empty set).
So to generate a subset, you start with 0 and keep incrementing till you reach 2^12. At each stage you read the set bits in the number to get the appropriate subset from the array.
Once you get one subset, you can now run through its permutations.
The next permutation (of the array indices, not the elements themselves) can be generated in lexicographic order like here: http://www.de-brauwer.be/wiki/wikka.php?wakka=Permutations and can be done with minimal memory.
You should be able to combine these two to give your-self a next_permutation function. Instead of passing in numbers, you could pass in an array of 12 elements which contains the previous permutation, plus possibly some more info (little memory again) of whether you need to go to the next subset etc.
You should actually be able to find very fast algorithms which use minimal memory, provide a next_permutation type feature and do not generate dupes: Search the web for multiset permutation/combination generation.
Hope that helps. Good luck!

The best set of functions I've come up with was the one provided by some user at the comments of the shuffle function on php.net Here is the link It works pretty good.
Hope it's useful.

The problem seems to be trying to give an index to every permutation and having a constant access time. I cannot think of a constant time algorithm, but maybe you can improve this one to be so. This algorithm has a time complexity of O(n) where n is the length of your set. The space complexity should be reducible to O(1).
Assume our set is 1,1,2,3 and we want the 10th permutation. Also, note that we will index each element of the set from 0 to 3. Going by your order, this means the single element permutations come first, then the two element, and so on. We are going to subtract from the number 10 until we can completely determine the 10th permutation.
First up are the single element permutations. There are 4 of those, so we can view this as subtracting one four times from 10. We are left with 6, so clearly we need to start considering the two element permutations. There are 12 of these, and we can view this as subtracting three up to four times from 6. We discover that the second time we subtract 3, we are left with 0. This means the indexes of our permutation must be 2 (because we subtracted 3 twice) and 0, because 0 is the remainder. Therefore, our permutation must be 2,1.
Division and modulus may help you.
If we were looking for the 12th permutation, we would run into the case where we have a remainder of 2. Depending on your desired behavior, the permutation 2,2 might not be valid. Getting around this is very simple, however, as we can trivially detect that the indexes 2 and 2 (not to be confused with the element) are the same, so the second one should be bumped to 3. Thus the 12th permutation can trivially be calculated as 2,3.
The biggest confusion right now is that the indexes and the element values happen to match up. I hope my algorithm explanation is not too confusing because of that. If it is, I will use a set other than your example and reword things.

Inputs: Permutation index k, indexed set S.
Pseudocode:
L = {S_1}
for i = 2 to |S| do
Insert S_i before L_{k % i}
k <- k / i
loop
return L
This algorithm can also be easily modified to work with duplicates.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.