I have a unique problem, I need to pull specific attributes for every game that is being played every 5 minutes, the two main issues I have are:
Phrasing data from a website that displays it interactively i.e. MLB.com, ESPN, CBS Sports.
Finding a source that would perhaps show the box scores that are updated live and in a text format.
I have done significant Googling as well as looking at possible solutions for scraping data off of MLB and CBS Sports. I havn't had such luck, it's a bit difficult right now because I don't have any fresh data to play with however I've been looking for possible solutions and havn't came to any resolution.
To my knowledge there isn't an open database that I can query that contains live updates scores otherwise I could piggyback off of that or obtain a similar system.
check out this forum question on another site. Looks like there are a few out there that will allow you to get csv's of their data. Not sure how much of it could be automated.
http://ask.metafilter.com/120399/MLB-API
Another is http://www.baseball-reference.com/ I'm not sure if they do box scores but they have stats on all the players, games, etc. They might have something you can use as well.
Finally you could check out http://www.strat-o-matic.com/ they might have something or be willing to create an API for you.
If you notice on Yahoo, they get their stats from STATS LLC. I have no idea what it costs, but you should check-out their real-time data delivery service.
Scrape the MLB gameday server. It is updated in realtime during games. If you want the boxscore, scrape boxscore.xml (for example)
Related
I have a dilemma that I need to figure out.
So I am building a website, where people can go watch a competitive game (Such as Counter Strike: Global Offensive), perhaps using either a Twitch TV stream, or actually through the matchmaking streaming services that the game may offer (In the case of this example, CS: GO TV). While playing, members can place "bets" on which teams will win, using some form of credits with no real value. Of course, the issue here, is that the site will need to be able to pull the score from the game, and update in real time. So sticking with the example of CS:GO, is there a portion of the Steamworks API, that would allow for real-time pulling of a game's score, through some kind of PHP or JavaScript method?
I'm sorry to tell you that you can't, for now.
In the API description of the CS:GO Competitive Match Information says:
It would be interesting to be able to find out competitive match information -- exactly like what DOTA 2 has. It could contain all the players in the map, with their steamids and competitive ranks, the score at half time/full time. There are probably a few more bits of info that could also be included. Pigophone2 16:54, 14 September 2013 (PDT)
To answer your question, there is no Steam developed API that does this.
However many websites still do exactly what you are looking for.
My guess is that they use a regularly updated script which parses websites like ESEA and ESL and pull data about those matches. After all, they are the ones who host almost all big games that people care about.
You'll need to keep up-to-date with private leagues though, as they don't typically publish live stats in an easily parse-able format. GOSU Gamers can help you track any new players that come to the big-league table.
I'm looking for algorithm in php which permit to get most searched terms (articles) which are not yet in wikipedia (red links) (or one of the subprojects) using wikipedia API or wikipedia pagecounts dumps. I know already about statsgrok statistics (maintainer of this project Henrik do not respond on his page on wikipedia) , but it do not provide any information about "red links". I would like to obtain statistics about situation where user put some word in search page in wikipedia and wikipedia propose to create this page, because this word are not yet in wikipedia.
EDIT: Actually, wikimedia bugzilla already has this bug reported: Bug 6373 — Provide a list of unsuccessful searches registered in 2006, but last activity on this bug was registered in 2012-04-02 18:58 UTC... So, it's gonna be a long way to fixing this problem, i think Perhaps, somebody found something palliative to resolve this problem?
You should file a bug to request that this information be exposed somewhere on wikistats.
Alternatively, start a discussion on wikitech-l as I'm sure other people are interested in getting this sort of data.
How about keeping track of "searched but not found" searches in a DB table and the number of times they're searched in a separate field?
This can be done very easily. But then you have to handle the difference in titles people will search for, or simply split them into words and keep track of the words only (ignoring grammatical propositions etc)
There is a list maintained by User:West.andrew.g, which for the time being may be the best resource to get that information. The page is updated every week. You can extract data from that page, or implement the same approach as he did if you want different parameters (a higher update frequency, red links with less than 1k views/week, etc.). He seems to be getting the data from the Wikimedia dumps and querying the servers for each entry above the 1k views/week threshold.
By the way, it turns out stats.grok.se does collect stats on red links (example), although it presents no compiled list of such pages.
I was thinking about an idea of auto generated answers, well the answer would actually be a url instead of an actual answer, but that's not the point.
The idea is this:
On our app we've got a reporting module which basically show's page views, clicks, conversions, details about visitors like where they're from, - pretty much a similar thing to Google Analytics, but way more simplified.
And now I was thinking instead of making users select stuff like countries, traffic sources and etc from dropdown menu's (these features would be available as well) it would be pretty cool to allow them to type in questions which would result in a link to their expected part of the report. An example:
How many conversions I had from Japan on variant (one page can have many variants) 3.
would result in:
/campaign/report/filter/campaign/(current campaign id they're on)/country/Japan/variant/3/
It doesn't seem too hard to do it myself, but it's just that it would take quite a while to make it accurate enough.
I've tried google'ing but had no luck to find an existing script, so maybe you guys know anything alike to my idea that's open source and well reliable/flexible enough to suit my needs.
Thanks!
You are talking about natural language processing - an artificial intelligence topic. This can never be perfect, and eventually boils down to the system only responding to a finite number of permutations of one question.
That said, if that is fine with you - then you simply need to identify "tokens". For example,
how many - evaluate to count
conversations - evaluate to all "conversations"
from - apply a filter...
japan - ...using japan
etc.
I look after a large site and have been studying other similar sites. In particular, I have had a look at flickr and deviantart. I have noticed that although they say they have a whole lot of data, they only display up to so much of it.
I persume this is because of performance reasons, but anyone have an idea as to how they decide what to show and what not to show. Classic example, go to flickr, search a tag. Note the number of results stated just under the page links. Now calculate which page that would be, go to that page. You will find there is no data on that page. In fact, in my test, flickr said there were 5,500,000 results, but only displayed 4,000. What is this all about?
Do larger sites get so big that they have to start brining old data offline? Deviantart has a wayback function, but not quite sure what that does.
Any input would be great!
This is type of perfomance optimisation. You don't need to scan full table if you already get 4000 results. User will not go to page 3897. When flickr runs search query it finds first 4000 results and then stops and don't spend CPU time and IO time for finding useless additional results.
I guess in a way it makes sense. Upon search if the user does not click on any link till page 400 (assuming each page has 10 results) then either the user is a moron or a crawler is involved in some way.
Seriously speaking if no favorable result is yielded till page 40, the concerned company might need to fire all their search team & adopt Lucene or Sphinx :)
What I mean is they will be better off trying to improve their search accuracy than battling infrastructure problems trying to show more than 4000 search results.
I recently started a new community. The forum software is phpBB3, and so far so good. In an attempt to make my community more unique and interesting, I had to idea of having user achievements. Let me give you a quick run-down.
Each user has achievements that they can earn (these will probably be across all users), for example an achievement for when a user hits 1,000 posts, when they upload an avatar, when one of their topics gets 1,000 views and so on. Each achievement has points, for example an achievement like uploading an avatar will be 10 points and reaching 10,000 points will grant 50 achievement points. If anyone here plays World of Warcraft you may be seeing where I'm getting the ideas from. :)
What I'm struggling to get my head around though is how exactly to code this... I could keep a record of all users activity and add it to a special database table possibly, and then check via cron every minute or so if any user has met achievement criteria... but then I also want it controllable through the ACP so I can easily add new achievements and change their points etc. My mind is pretty blank when it comes to anything but the most simple things.
What I really posted here for was feedback on the idea and how you all think I should go about doing this. The coding part should be pretty simple for me once I get my head around how phpBBB MODs need to be written.
Thanks for reading, and I look forward to your replies. :)
Have you checked out this mod?
http://www.phpbb.com/community/viewtopic.php?f=70&t=1696785
It's in beta at the moment but it looks like it's sorta what you're trying to accomplish. Even if it isn't, you can always take it and make something else out of it. I have heavily modified existing mods to suite my site. It takes a little while to get your head around how things are done with phpbb3 but it is easy when you start doing it.
In regards to creating your own, I don't think this has to be done on crontab. You could simply inject a function into the relevant parts of code.
With post counts, there is already a function that updates the description under the avatar of users based on certain post numbers, you could probably put an extra function update_achievement() there. Same goes with the avatar being updated. Unfortunately, taking this approach you will not be able to edit the achievements completely from the ACP but you could possibly have an interface that could enable/disable certain achievements.
You will obviously need an extra table or two for this. Without thinking too much, I would have 1 table that has 2 columns, user and acheivement_id. Then another table which just lists the acheivements ids and descriptions etc..