Twitter api 1.1 is not loading all followers - php

I have a program to get followers details of a person. the code is working fine upto the follower count 3000. i tried with another person which have 200000 Followers. Unfortunately it is showing only 300 followers why this happen? is there any way to fix this?
Here is my code
<?php
ini_set('max_execution_time', '50000000');
ini_set('post_max_size', '100M');
require_once('TwitterAPIExchange.php');
$consumerKey = 'xxxxxxxxxxxxxxxxxx';
$consumerKeySecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx';
$accessToken = 'xxxxxxxxxxxxxxxxxxxxxxxxx';
$accessTokenSecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
$settings = array(
'oauth_access_token' => $accessToken,
'oauth_access_token_secret' => $accessTokenSecret,
'consumer_key' => $consumerKey,
'consumer_secret' => $consumerKeySecret
);
$i = 0;
$cursor = -1;
do {
$url = 'https://api.twitter.com/1.1/followers/list.json';
$getfield = '?cursor='.$cursor.'&screen_name=BeingSalmanKhan&skip_status=true&include_user_entities=false';
$requestMethod = 'GET';
$twitter = new TwitterAPIExchange($settings);
$response = $twitter->setGetfield($getfield)
->buildOauth($url, $requestMethod)
->performRequest();
$response = json_decode($response, true);
$errors = $response["errors"];
if (!empty($errors)) {
foreach($errors as $error){
$code = $error['code'];
$msg = $error['message'];
echo "<br><br>Error " . $code . ": " . $msg;
}
$cursor = 0;
}
else {
$users = $response['users'];
print_r($users);
echo'<table>';
echo '<tr>';
echo '<td>No:</td>';
echo '<td>Name</td>';
echo '<td>Profile Description</td>';
echo '<td>Location</td>';
echo '<td>Followers Count</td>';
echo '<td>Website Url</td>';
echo '<td>Screen Name</td>';
echo '<td>Favourited Tweets</td>';
echo '<td>Language</td>';
echo '<td>Friends Count</td>';
echo '<td>Status</td>';
echo '<td>Image</td>';
echo '</tr>';
foreach($users as $user){
$thumb = $user['profile_image_url'];
$url = $user['screen_name'];
$name = $user['name'];
$description = $user['description'];
$location = $user['location'];
$followers_count = $user['followers_count'];
$url = $user['url'];
$screen_name = $user['screen_name'];
$favourites_count = $user['favourites_count'];
$language = $user['lang'];
$listed_count = $user['listed_count'];
$friends_count = $user['friends_count'];
$status = $user['status'];
echo '<tr>';
echo '<td>'.$i.'</td>';
echo '<td>'.$name.'</td>';
echo '<td>'.$description.'</td>';
echo '<td>'.$location.'</td>';
echo '<td>'.$followers_count.'</td>';
echo '<td>'.$url.'</td>';
echo '<td>'.$screen_name.'</td>';
echo '<td>'.$favourites_count.'</td>';
echo '<td>'.$language.'</td>';
echo '<td>'.$friends_count.'</td>';
echo '<td>'.$status.'</td>';
echo '<td><img src="'.$thumb.'"></td>';
echo '</tr>';
$i++;
}
$cursor = $response["next_cursor"];
}
}
while ( $cursor != 0 );
if (!empty($users)) {
echo '<br><br>Total: ' . $i;
}
?>
Here is the output what i get
0 Rao.navneet101#gmail 0
1 shivangi darshan dos 0
2 Renuka.R.K 0
3 md samiullah khan bokaro steel city 0
4 monusahu 0
5 Vivek mishra 0
6 Mezba Alam Dhaka, Bangladesh. 20
7 shiva krishnam raju 0
8 shankar rupani 0
9 Vasu patil 0
10 keerthi 0
11 love_guru 1
12 abhishek tiwari 0
13 Future Care 0
14 harikumar sreekumar 3
15 Shahnawaz Khan 0
16 prakhar bhushan 0
17 Binita Chhaparia 0
18 venkatesan 1
19 Rahul RJ 0
20 emty Abuja Nigeria 12
21 Nil-Akash Chy smart 0
22 ashok kumar 1
23 azhar 0
24 Prarthana 0
25 Anu bibu 0
26 SAMIR SINGH 0
27 Deemag ki Maa Behen. Saket, New Delhi 35
28 abel leo 0
29 Dhananjay Pawar 0
30 Anuradha Choudhary 1
31 maiome.maiome 0
32 rahul hussey 1
33 vishnupriya 3
34 anggi12345 5
35 farheen naaz haora 0
36 aman 4
37 Shubham Verma Varanasi(UP) 0
38 satish kumar jaiswar 0
39 sheikhwasi 0
40 MUHAMMADUMAR 0
41 Gaurav tiwari India 0
42 arjun malviy 0
43 prashanth 0
44 saloni 0
45 Tanvir,Hridoy 0
46 Mahesh Sharma 0
47 Deepak Aswathnarayan 2
48 devender kumar 0
49 Awal 0
50 Sanketa Kamble 0
51 faraz azhar 1
52 Avinash singh 0
53 KUMAR SUMIT 0
54 Mahuya sultana 0
55 hemant chawla 0
56 Hanii andiraa 0
57 mahendra shah AHMEDABAD 5
58 Angel Preet 0
59 kumar gaurav 0
60 atul kumar bangalore 1
61 saurabh singh 0
62 ajaygadhavi 0
63 Prajkta Waditwar Mumbai 21
64 Shruti 1
65 Prabhakar Gupta 1
66 waseem abbas 0
67 Malik Zulqarnain 0
68 Sk Azharuddin 0
69 MOHIT VIJAY 0
70 RadhaKrishnan chennai 22
71 Ruchita Chaudhari 0
72 MANISH RAWAT 0
73 vyasronit vansba 0
74 SURAJ YADAV 1
75 Akanksha Pratik 0
76 Sandeep goswami 0
77 Rupinder kaur 0
78 abhishek pandey 3
79 imad 2
80 Sandeep rao 0
81 sahil khan 0
82 abdulbari 95
83 Binal Chitroda. 0
84 Sexy boy 0
85 Akash chauhan 1
86 qawserftgyhujik 8
87 dhruvil patel 0
88 Barada sahoo 0
89 Banu 2
90 Uddipta kashyap 0
91 Mitul sharma Jammu 0
92 pankaj singh faridabad, haryana 0
93 Sanjeev krishnan 0
94 adnan ahmed 0
95 Ahad sheikh 0
96 manish shah Rajkot 5
97 VISHAL SINGH 1
98 aksahy 1
99 satya prasanna kkd 23
100 rajesh rana 0
101 Jatt Boys 1
102 Zeel Doshi 22
103 nabin regmi 1
104 aneeta awasthi 0
105 navjit k chopra 1
106 Ashim Mallick 0
107 Rajesh Kumar Mishra 3
108 Rahul pagare 0
109 Lingam k 0
110 Abishek bagmar chennai 32
111 Trang Weinman 0
112 muktadesai 0
113 mansur.ali2009#gmail 0
114 Angel Urvashi amritsar 1
115 rangga nurmansyah 2
116 Rajesh Shetty 1
117 Muhammad Sohail Akba 1
118 waroeng sehat HI 0
119 Montibohra 0
120 siddarth 1
121 SHRAVANI KURRA 0
122 suven sarkar 0
123 ajit suryawanshi 0
124 pappu rakade patil Babra Aurangabad Maharastra 0
125 shiyad shereef 8
126 Sachin Ingale Pune 0
127 archana mishra 0
128 vijayjaware 1
129 Alive Soul 0
130 aakash malhotra 0
131 sheikh mohsin 0
132 Sheryll Franca 0
133 Manjeet Mundhe 6
134 khan sania hong kong 0
135 vishaldev 0
136 grewal laddi 1
137 Sanjay kumar 0
138 j aishwarya rao 2
139 didar khan oman.sadah 0
140 SONI SINGH 0
141 mohit khandelwal 2
142 sunny verma 1
143 Mohinurgazi 0
144 Jitender Kumar 1
145 Vinay jayakumar 0
146 solomonrajesh chennai 2
147 k.nagalakshmi 1
148 jeevankiran 9
149 Raghu 0
150 Alive Chatulistiwa 10
151 ses dubey 0
152 Sumit Zadafiya 1
153 majid abass hassanabad rainawari srinagar 3
154 shubh jain 0
155 M Fuad 0
156 poojashadhijha 0
157 manu garg 0
158 Imran Hussain 0
159 Zain jutt 0
160 k.seven new delhi 0
161 Lakhan Wanole 1
162 Olympia Verrell 0
163 majuanwar 0
164 Page 29 Kolkata, India 0
165 aayushmaan 19
166 prashant gupta 3
167 jose santha seelan 19
168 Mahar Ejaz 1
169 kabeer magsi 0
170 PriNcE SaMeeR 0
171 prashant shrivastava 0
172 Vinod Ghorpade 0
173 eno apriliani 1
174 Pratik Wadkar 0
175 FiRu Kuwait 3
176 Jagtap 0
177 Ravi kumar Giridih 0
178 Ajay Khandelwal 1
179 poojamaurya 0
180 Rajat Chouhan 0
181 Ayanna Nelms 0
182 shrimanth kumar 2
183 SATYANARAYNANDWANA 0
184 AJEESH Ayoor,kollam 10
185 Shah Alam 1
186 Flywell2India ON L5A 1W7, Canada 3
187 banti kashyap jaipur 1
188 ayesha pathan India 10
189 Ranjan SP 44
190 AmarDeep 0
191 Manish Chamling Rai 0
192 Arun kumar 0
193 Mitesh Baranwal Varanasi 11
194 AKHTAR ABBAS 1
195 Toufiq Alahi 0
196 Rajbir Singh 2
197 Syed Irfan Hussain A 1
198 Syed Anwaar Ali indore, India 0
199 MD.SOWKAT Bangladesh 1
200 srinivasan 1
201 Pradeep Sheler 1
202 sagar shergill 0
203 mohanvamsi 0
204 Keshav Singh 0
205 Ankit Verma 0
206 vinod kachare 5
207 Faraz Imam 3
208 Prateek Pathak 16
209 Kumar Abhishek 1
210 rishita gupta 8
211 Krishna Kumar Tiwari 1
212 KALIM KHAN 4
213 VIPIN 0
214 mukund vishwakarma 0
215 jitendra mishra 1
216 Amit kumar 0
217 Tariq 1
218 Sonu Jani 0
219 Naveen Malethiya Sri Ganganagar 0
220 shyam rajput 0
221 Progressive Dental 0
222 Aryan sid 0
223 simran galhotra 6
224 jot singh 0
225 vishnukumar.merugu 1
226 sujal khandelwal 1
227 shashank patel 0
228 suhail khan 3
229 Vedant Jogdand Pune 8
230 Irfanvali 0
231 sakshi vinayak 2
232 Amrapali R. Sarodey 0
233 ????? 1
234 Purnendu Sharma delhi 1
235 lhomingdolma lama Kathmandu, Nepal 0
236 humaira khanam 1
237 mohammadjawed akhter 1
238 Himanshu Rao 0
239 sandeep nargunde 1
240 yuvarajc 3
241 rajesh 0
242 Vithika Sheru 0
243 himanshu mumbai, maharashtra 6
244 seedtan 0
245 nobaiah143billa 0
246 saurav dhakal Nepal 0
247 Deep Narayan Rai 1
248 sajjad hussain 1
249 deepak chandra 2
250 Naresh Gorre 0
251 priyamaina assam india 1
252 goravsaini 1
253 sherlock 0
254 Nileshsingh 1
255 Aashiqurrahman 0
256 Anushka Singh 1
257 sumit chhari mumbai 1
258 waquarthakur Gulaothi 0
259 rama karki 1
260 pushpa 0
261 hasrat 0
262 pawan sharma 0
263 josey joo 0
264 govinda kr. yadav 0
265 gurnam singh 1
266 MyBusTickets.in India 216
267 vicarsonmilco 0
268 ANKIT GUPTA 0
269 alamin 0
270 Kishore 1
271 aifazz jr sayed 0
272 Salman khan 0
273 Dipesh Jha 0
274 premkumar 1
275 KD Rahees Saifi 2
276 Shafiq mastoi 0
277 Manvi Agarwal 1
278 sumit 1
279 manishkumar h patel 1
Error 88: Rate limit exceeded
Total: 300
280 ajay desai 0
281 Umang Pandita New Delhi 11
282 anu rathour 1
283 Srinivas Kamath 1
284 Pranay Gavhale 1
285 pranjal Guwahati 1
286 Nani_Tasser 0
287 Aryan siddiqui 0
288 Karina Noren 0
289 samyok subba 0
290 sajid malik 0
291 Ritesh kanwar 0
292 kameshnayak2 raipur 1
293 Ashik Babu 1
294 Jenifer Bubak 0
295 divyansh verma kanpur 53
296 Tothi Monsang Manipur, India. 0
297 vikas pathak 1
298 Mohit Rule 1
299 asgar ali 3
At last i can see a "Error 88: Rate limit exceeded" message.
Thanks.

At last i got solution for this. This is the only way i found,
Save the last cursor and sent this cursor after 15 minutes as first cursor. it will show another 300 followers.
Thanks

https://dev.twitter.com/docs/api/1.1/get/followers/ids
Technically, the limit of followers returned each request is 5000.
Are you sure that the count is really 300 in any cases ? Did you display the count of the list you have ?

Related

How to preserve html character encodings in URL's

So I have these long URL's that enter a PHP script through a GET veriable.
<?php
$given_url = $_GET['url'];
echo $given_url;
?>
Lets say I do
http://example.com/index.php?url=http://www.falstad.com/circuit/circuitjs.html?cct=%24+1+0.000005+10.200277308269968+33+5+43%0Ai+112+320+112+192+0+4%0Ai+416+192+416+320+0+6%0Ar+192+320+192+192+0+1%0Ar+208+192+320+192+0+2%0Ar+336+192+336+320+0+3%0Aw+112+192+192+192+0%0Aw+192+192+208+192+0%0Aw+320+192+336+192+0%0Aw+336+192+416+192+0%0Aw+416+320+336+320+0%0Aw+336+320+192+320+0%0Aw+192+320+112+320+0%0A
then how do I let the page echo literaly what is behind the http://example.com/index.php?url= ?
corrently is returns www.falstad.com/circuit/circuitjs.html?cct=$ 1 0.000005 10.200277308269968 33 5 43 i 112 320 112 192 0 4 i 416 192 416 320 0 6 r 192 320 192 192 0 1 r 208 192 320 192 0 2 r 336 192 336 320 0 3 w 112 192 192 192 0 w 192 192 208 192 0 w 320 192 336 192 0 w 336 192 416 192 0 w 416 320 336 320 0 w 336 320 192 320 0 w 192 320 112 320 0 as if it should process the encoded(?) characters in the URL..
Use urlencode()
echo urlencode($given_url);
Try
<?php
$given_url = $_GET["url"];
$pieces = explode("?", $given_url);
$part1 = $pieces[0];
$part2 = urlencode($pieces[1]);
$new_url = $part1.$part2;
echo $new_url
?>
Caveat here is that we are assuming there isn't another "?" somewhere else in the string.

mysql --> yearly max value with infomation of record id

I want to get the yearly max values and the record id information for further joins with other tables.
Consider the following table:
tur_id Datum SZ Art VW StV TV NSP
189 23.06.2010 09:40:00 S 1 -37 -35 46
7 11.05.2012 08:40:00 S 1 -19,9 -21 45
140 02.07.2011 10:30:00 S 1 -25 -26 45
62 31.07.2013 31.07.2013 S 1 -16 -16 42
136 12.07.2011 11:20:00 S 1 -21,4 -23 41
181 04.08.2010 10:00:00 S 1 -30,1 -28 41
195 24.10.2009 09:40:00 S 1 -45 -47 41
90 22.10.2013 22.10.2013 S 1 -14,2 -16 40
11 16.06.2012 10:50:00 S 1 -17 -18 40
153 13.05.2011 09:25:00 S 1 -27,4 -29 40
1 23.07.2014 23.07.2014 S 1 -13,6 -14 39
56 15.06.2013 15.06.2013 S 1 -17,3 -18 39
45 26.10.2012 26.10.2012 S 1 -17,4 -18 39
.....
The following query returns the yearly max values without record id (in my case turid).
SELECT year(datum) rok, max(nsp) FROM turniere GROUP BY year(datum)
Result:
rok max(nsp)
2009 41
2010 46
2011 45
2012 45
2013 42
2014 39
How can I get the info of the turid or the datum value?
You are half-way there. Join the original data back in:
SELECT t.*
FROM turniere t JOIN
(SELECT year(datum) as rok, max(nsp) as maxnsp
FROM turniere
GROUP BY year(datum)
) tt
ON year(t.datum) = tt.rok and t.nsp = tt.maxnsp;

My While Statement is not clearing mysqli_query on second run. Is this normal?

I am trying to create an un-ordered list of weekly dates. But these dates should only populated what is applicable. In my db i have entries with dates ranging from 10/17 - 11/5.
the idea is to get the first date available then search for it and create the 1st week
<li>date - date</li>
Then add a week and search for a date in that week and print the following week sequence, then add a week, etc...
Below is my code:
$fromDt_info = mysqli_query($con,"SELECT * FROM " . $rec_table . " WHERE `date`='".$from_dt."'" );
echo '<ul>';
while($row = mysqli_fetch_array($fromDt_info)){
echo $row['0'];
$to_dt = date('Y-m-d',strtotime($from_dt . '+ 6 Days'));
echo '<li>'.date('m/d/Y',strtotime($from_dt)).' - '.date('m/d/Y',strtotime($to_dt)).'</li>';
$from_dt = date('Y-m-d',strtotime($from_dt . '+ 7 Days'));
//empty array somehow? Prints as many emtries for first date only. But prints updated dates in sequance.
}
echo '</ul>';
This is the result i am getting:
44 10/17/2014 - 10/23/2014
45 10/24/2014 - 10/30/2014
46 10/31/2014 - 11/06/2014
47 11/07/2014 - 11/13/2014
48 11/14/2014 - 11/20/2014
you can see it's printing two extra lines that i don't have entries for.
When i added the echo $row['0']; to print the id, then i noted it's printing all of the entries for the first date, which is five entries on that first date, but the dates printed are squential not relating to the id entry.
Do i need to somehow empty the array at the end of the while statement?
This is the Data in my db:
id date
44 2014-10-17
45 2014-10-17
46 2014-10-17
47 2014-10-17
48 2014-10-17
51 2014-10-20
52 2014-10-20
53 2014-10-20
55 2014-10-20
56 2014-10-20
57 2014-10-20
58 2014-10-21
59 2014-10-21
60 2014-10-21
61 2014-10-21
62 2014-10-21
63 2014-10-21
64 2014-10-21
65 2014-10-22
66 2014-10-22
67 2014-10-22
68 2014-10-22
69 2014-10-23
70 2014-10-23
71 2014-10-23
72 2014-10-24
73 2014-10-24
278 2014-10-27
279 2014-10-27
280 2014-10-27
281 2014-10-27
282 2014-10-27
283 2014-10-27
284 2014-10-27
285 2014-10-28
286 2014-10-28
287 2014-10-28
288 2014-10-29
289 2014-10-29
290 2014-10-29
291 2014-10-29
293 2014-10-30
294 2014-10-30
295 2014-10-30
296 2014-10-30
297 2014-10-30
298 2014-10-30
299 2014-10-31
300 2014-10-31
301 2014-10-31
302 2014-10-31
303 2014-10-31
304 2014-10-31
305 2014-11-03
306 2014-11-03
307 2014-11-03
308 2014-11-03
309 2014-11-04
310 2014-11-04
311 2014-11-04
312 2014-11-04
313 2014-11-04
314 2014-11-05
315 2014-11-05
316 2014-11-05
317 2014-11-05
318 2014-11-05
319 2014-11-05
320 2014-11-05

How can I monitor memory consumption during webserver stress test?

I want to do a stress test on an Apache webserver that I have running on localhost. The test will request the webserver to execute a PHP application that I wrote. I want to see how much memory (RAM) the webserver (and/or the associated PHP process) consumes during the test. Or to see how much it consumed after the test is done.
My OS is Ubuntu 13.10.
I looked at Apache Bench, Apache JMeter, Siege and httperf. None of them seem to provide such information. At most, I can see some CPU load in httperf (which in most cases is 100 %, so not too relevant).
Is there some tool that can provide me with memory consumption information ? It doesn't have to be a webserver benchmarking tool, could also be another Linux software that runs in parallel with the benchmarking tool. I just think that manually monitoring the test via the top command is kind of innacurate/ammateurish. Thank you in advance.
htop may be exactly what you're looking for.
Personally, I recently discovered something called byobu - which gives you a handy readout on the bottom (which you can configure by pressing F9) --
And that has become my personal favorite for exactly what you're describing.
Although, you could also look into xdebug -- and use something like xdebug_memory_usage() -- in the php script you're testing to dump info into a log file at key points in your script
I've put up a few PHP cronjobs, too, when I manually start the script through console I want to see debug and stuff, too.
I put in a method like this:
protected $consoleUpdate;
protected function printMemoryUsage() {
if ((time() - $this->consoleUpdate) >= 3) {
$this->consoleUpdate = time();
echo "Memory: ",
round(memory_get_usage(true) / (1024 * 1024)),
" MB",
"\r";
}
}
Call this method as often as you like to print the scripts memory usage.
Notice the final \r in the console, which returns the cursor to the line beginning and overwrites the line. If you don't have any other output, this has the effect of your screen not moving, instead, it gets updated.
Things like top, htop, memstat, iotop, mysqltop. All these tools are excellent to see what is thoroughly cooking your server while you throw siege (and its friend apachebench) at it.
I use vmstat for memory, disk and CPU monitoring. Below are some measurements whilst copying files on a bottom of the heap Linux based Raspberry Pi. I first used vmstat in the 1980’s, monitoring DB activity on early Unix systems. More details in:
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Stress%20Tests.htm
vmstat was either run from a separate terminal or in a combined script file.
pi#raspberrypi /mnt $ time sudo sh -c "cp -r 256k /mnt/new2 && sync"
40 samles at 1 second intervals
vmstat 1 40 > vmstatRes.txt
real 0m38.781s
user 0m0.400s
sys 0m8.400s
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 304684 15208 65952 0 0 0 0 1109 319 39 2 59 0
1 0 0 282788 15208 87308 0 0 10292 0 4018 2994 5 57 9 29
1 1 0 256996 15208 112768 0 0 12380 0 4687 3863 3 53 0 44
2 2 0 231452 15212 138028 0 0 12288 40 4781 4024 5 55 0 40
0 1 0 216576 15216 152476 0 0 7004 10512 5649 3580 5 50 0 46
2 2 0 201688 15216 167288 0 0 7144 17488 5341 3527 2 52 0 46
1 0 0 195064 15216 173808 0 0 3192 9016 5909 3214 2 34 0 64
3 0 0 169520 15216 199152 0 0 12304 0 4704 3914 2 60 0 38
2 3 0 149988 15220 218288 0 0 9252 9892 5003 3614 2 52 0 45
0 2 0 131008 15224 237072 0 0 9112 10324 5086 3568 2 54 0 44
1 0 0 120160 15224 247784 0 0 5232 0 4976 2935 0 34 0 66
0 1 0 110424 15224 257404 0 0 4628 12864 5097 3034 4 36 0 60
1 0 0 86556 15224 281120 0 0 11536 0 4965 3874 3 54 0 43
1 1 0 73784 15224 293816 0 0 6188 11592 5545 3514 2 46 0 52
1 1 0 63252 15232 304132 0 0 4968 10320 4617 2748 2 34 0 64
0 1 0 43148 15232 323960 0 0 9652 7184 5126 3749 2 54 0 43
0 1 0 29336 15232 337560 0 0 6596 10036 4311 2796 2 38 0 59
1 1 0 23944 11696 346276 0 0 7480 0 5465 3455 2 46 0 52
2 1 0 23076 9580 349184 0 0 2860 10524 4521 2323 1 35 0 64
2 1 0 24440 5300 351508 0 0 8864 5188 4586 3215 1 66 0 33
0 1 0 24500 3900 352704 0 0 4896 11448 5974 3308 2 49 0 49
1 1 0 24432 3772 352700 0 0 10424 6208 4851 3682 2 60 0 38
1 1 0 23764 3772 353736 0 0 6568 5184 5970 3526 1 45 0 53
1 1 0 24068 3776 353500 0 0 4900 11388 5449 3142 0 40 0 60
0 1 0 24400 3780 352552 0 0 10068 8848 4821 3531 2 57 0 40
1 1 0 24152 3772 352588 0 0 8292 2784 5207 3588 2 50 0 48
1 1 0 23516 3772 353620 0 0 6800 7816 5475 3475 1 49 0 49
0 1 0 24260 3772 352940 0 0 7004 7424 5042 3284 4 43 0 52
2 1 0 24068 3776 353060 0 0 4624 10292 4798 2801 0 39 0 61
2 0 0 23820 3780 353340 0 0 8844 5508 5251 3609 0 56 0 44
2 1 0 24252 3772 352528 0 0 4552 12000 5053 2841 2 44 0 54
1 1 0 23696 3772 353120 0 0 10880 2176 4908 3694 2 58 0 40
1 0 0 24260 3772 352212 0 0 3748 11104 5208 2904 2 34 0 63
3 2 0 24136 3780 352084 0 0 10148 1628 4637 3568 1 55 0 44
0 1 0 24192 3780 352120 0 0 4016 10260 4719 2613 1 31 0 68
1 1 0 24392 3772 352076 0 0 6804 10972 5386 3473 1 52 0 47
1 1 0 24392 3772 351704 0 0 8568 8788 5101 3502 2 61 0 36
0 1 0 24376 3780 351764 0 0 0 30036 6711 1888 0 36 0 64
0 1 0 24252 3780 351928 0 0 28 2072 5629 1354 0 10 0 90
0 0 0 24768 3780 351968 0 0 40 20 1351 579 9 6 13 72
1 0 0 24768 3780 351968 0 0 0 0 1073 55 1 1 98 0

Finding and removing outliers in PHP

Suppose I sample a selection of database records that return the following numbers:
20.50, 80.30, 70.95, 15.25, 99.97, 85.56, 69.77
Is there an algorithm that can be efficiently implemented in PHP to find the outliers (if there are any) from an array of floats based on how far they deviate from the mean?
Ok let's assume you have your data points in an array like so:
<?php $dataset = array(20.50, 80.30, 70.95, 15.25, 99.97, 85.56, 69.77); ?>
Then you can use the following function (see comments for what is happening) to remove all numbers that fall outside of the mean +/- the standard deviation times a magnitude you set (defaults to 1):
<?php
function remove_outliers($dataset, $magnitude = 1) {
$count = count($dataset);
$mean = array_sum($dataset) / $count; // Calculate the mean
$deviation = sqrt(array_sum(array_map("sd_square", $dataset, array_fill(0, $count, $mean))) / $count) * $magnitude; // Calculate standard deviation and times by magnitude
return array_filter($dataset, function($x) use ($mean, $deviation) { return ($x <= $mean + $deviation && $x >= $mean - $deviation); }); // Return filtered array of values that lie within $mean +- $deviation.
}
function sd_square($x, $mean) {
return pow($x - $mean, 2);
}
?>
For your example this function returns the following with a magnitude of 1:
Array
(
[1] => 80.3
[2] => 70.95
[5] => 85.56
[6] => 69.77
)
For a normally distributed set of data, removes values more than 3 standard deviations from the mean.
<?php
function remove_outliers($array) {
if(count($array) == 0) {
return $array;
}
$ret = array();
$mean = array_sum($array)/count($array);
$stddev = stats_standard_deviation($array);
$outlier = 3 * $stddev;
foreach($array as $a) {
if(!abs($a - $mean) > $outlier) {
$ret[] = $a;
}
}
return $ret;
}
Topic: Detecting local, additive outliers in unordered arrays by walking a small window through the array and calculating the standard deviation for a certain range of values.
Good morning folks,
here is my solution much to late, but since I was looking for detecting outliers via PHP and could'nt find anything basic, I decided somehow smoothing a given dataset in a timeline of 24 h by simply moving a range of 5 items in a row through an unordered array and calculate the local standard deviation to detect the additive outliers.
The first function will simply calculate the average and deviation of a given array, where $col means the column with the values (sorry for the freegrades, this means that in an uncomplete dataset of 5 values you only have 4 freegrades - I don't know the exact english word for Freiheitsgrade):
function analytics_stat ($arr,$col,$freegrades = 0) {
// calculate average called mu
$mu = 0;
foreach ($arr as $row) {
$mu += $row[$col];
}
$mu = $mu / count($arr);
// calculate empiric standard deviation called sigma
$sigma = 0;
foreach ($arr as $row) {
$sigma += pow(($mu - $row[$col]),2);
}
$sigma = sqrt($sigma / (count($arr) - $freegrades));
return [$mu,$sigma];
}
Now its time for the core function, which will move through the given array and create a new array with the result. Margin means the factor to multiply the deviation with, since only one Sigma detects to many outliers, whereas more than 1.7 seems to high:
function analytics_detect_local_outliers ($arr,$col,$range,$margin = 1.0) {
$count = count($arr);
if ($count < $range) return false;
// the initial state of each value is NOT OUTLIER
$arr_result = [];
for ($i = 0;$i < $count;$i++) {
$arr_result[$i] = false;
}
$max = $count - $range + 1;
for ($i = 0;$i < $max;$i++) {
// calculate mu and sigma for current interval
// remember that 5 values will determine the divisor 4 for sigma
// since we only look at a part of the hole data set
$stat = analytics_stat(array_slice($arr,$i,$range),$col,1);
// a value in this interval counts, if it's found outside our defined sigma interval
$range_max = $i + $range;
for ($j = $i;$j < $range_max;$j++) {
if (abs($arr[$j][$col] - $stat[0]) > $margin * $stat[1]) {
$arr_result[$j] = true;
// this would be the place to add a counter to isolate
// real outliers from sudden steps in our data set
}
}
}
return $arr_result;
}
And finally comes the test function with random values in an array with length 24.
As for margin I was curious and choose the Golden Cut PHI = 1.618 ... since I really like this number and some Excel test results have led me to a margin of 1.7, above which outliers very rarelly were detected. The range of 5 is variable, but for me this was enough. So for every 5 values in a row there will be a calculation:
function test_outliers () {
// create 2 dimensional data array with items [hour,value]
$arr = [];
for ($i = 0;$i < 24;$i++) {
$arr[$i] = [$i,rand(0,500)];
}
// set parameter for detection algorithm
$result = [];
$col = 1;
$range = 5;
$margin = 1.618;
$result = analytics_detect_local_outliers ($arr,$col,$range,$margin);
// display results
echo "<p style='font-size:8pt;'>";
for ($i = 0;$i < 24;$i++) {
if ($result[$i]) echo "♦".$arr[$i][1]."♦ "; else echo $arr[$i][1]." ";
}
echo "</p>";
}
After 20 calls of the test function I got these results:
417 140 372 131 449 26 192 222 320 349 94 147 201 ♦342♦ 123 16 15
♦490♦ 78 190 ♦434♦ 27 3 276
379 440 198 135 22 461 208 376 286 ♦73♦ 331 358 341 14 112 190 110 266
350 232 265 ♦63♦ 90 94
228 ♦392♦ 130 134 170 ♦485♦ 17 463 13 326 47 439 430 151 268 172 342
445 477 ♦21♦ 421 440 219 95
88 121 292 255 ♦16♦ 223 244 109 127 231 370 16 93 379 218 87 ♦335♦ 150
84 181 25 280 15 406
85 252 310 122 188 302 ♦13♦ 439 254 414 423 216 456 321 85 61 215 7
297 337 204 210 106 149
345 411 308 360 308 346 ♦451♦ ♦77♦ 16 498 331 160 142 102 ♦496♦ 220
107 143 ♦241♦ 113 82 355 114 452
490 222 412 94 2 ♦480♦ 181 149 41 110 220 ♦477♦ 278 349 73 186 135 181
♦39♦ 136 284 340 165 438
147 311 246 449 396 328 330 280 453 374 214 289 489 185 445 86 426 246
319 ♦30♦ 436 290 384 232
442 302 ♦436♦ 50 114 15 21 93 ♦376♦ 416 439 ♦222♦ 398 237 234 44 102
464 204 421 161 330 396 461
498 320 105 22 281 168 381 216 435 360 19 ♦402♦ 131 128 66 187 291 459
319 433 86 84 325 247
440 491 381 491 ♦22♦ 412 33 273 256 331 79 452 314 485 66 138 116 356
290 190 336 178 298 218
394 439 387 ♦80♦ 463 369 ♦104♦ 388 465 455 ♦246♦ 499 70 431 360 ♦22♦
203 280 241 319 ♦34♦ 238 439 497
485 289 249 ♦416♦ 228 166 217 186 184 ♦356♦ 142 166 26 91 70 ♦466♦ 177
357 298 443 307 387 373 209
338 166 90 122 442 429 499 293 ♦41♦ 159 395 79 307 91 325 91 162 211
85 189 278 251 224 481
77 196 37 326 230 281 ♦73♦ 334 159 490 127 365 37 57 246 26 285 468
228 181 74 ♦455♦ 119 435
328 3 216 149 217 348 65 433 164 473 465 145 341 112 462 396 168 251
351 43 320 123 181 198
216 213 249 219 ♦29♦ 255 100 216 181 233 33 47 344 383 ♦94♦ 323 440
187 79 403 139 382 37 395
366 450 263 160 290 ♦126♦ 304 307 335 396 458 195 171 493 270 434 222
401 38 383 158 355 311 150
402 339 382 97 125 88 300 332 250 ♦86♦ 362 214 448 67 114 ♦354♦ 140 16
♦354♦ 109 0 168 127 89
450 5 232 155 159 264 214 ♦416♦ 51 429 372 230 298 232 251 207 ♦322♦
160 148 206 293 446 111 338
I hope, this will help anyone in the present or future.
Greetings
P.S. To further improve this algorithm you may add a counter, which makes sure, that a certain value must for instance be found at least 2 times, that means in 2 different intervals or windows, before it is labeled as outlier. So a sudden jump of the following values does not make the first value the villain. Let me give you an example:
In 3,6,5,9,37,40,42,51,98,39,33,45 there is an obvious step from 9 to 37 and an isolated value 98. I would like to detect 98, but not 9 or 37.
The first interval 3,6,5,9,37 would detect 37, the second interval 6,5,9,37,40 not. So we would not detect 37, since there is only one problematic interval or one match. Now it should be clear, that 98 counts in 5 intervals and is therefore an outlier. So lets declare a value an outlier, if it "counts" at least 2 times.
Like so often we have to look closely the borders, since they have only one interval, and make for these values an exception.

Categories