I’m working on a personal project that organizes songs uploaded by the frequency of votes in a certain amount of time, and maybe some sort of base score (probably how many likes something has, or plays). I’ve seen some trending/hot algorithms, but the problem is that they have downvoting, while my system does not. Can anyone give me insight on how to perform something like this in PHP?
Simple answer: Figure out the system first, then figure out how to do it in PHP.
Write down your system in English (or your chosen language. You know what i mean.). Make all decisions (even if you change them later, make A decision for now).
Once you’ve defined your rules, it’s basic word-problem solving. Take each step/rule and reduce it to code. When you get to the end, you’ve got your code.
Well, the Max just ensures that the minimum score is 0 (log(1)… log(0) is invalid because ln x for 0 is undefined)…
flooring the ln though is interesting, because it essentially sets major thresholds - e^y.
Translation: Your $s values get flattened to groups of e^y, so… 2, 7, 20, 54, 148, 403, 1096, 2980, 8103, 22026, 59874, etc.
Essentially it becomes harder (exponentially so) to reach the next ‘level’ of score.
This behavior has a couple of interesting points: when you get above group 44 on a 64 bit system, your integers now flow over into floats. (At this point however, you’re dealing with a needed increase of $s in the range of 3x10^18 points… not likely to happen, If every person who viewed, also downloaded, and voted, and only did so once, you would need 1,194,980,949 people to do so - in other words, the entire population of China or India having clicked the three buttons. Also you’d have a song that was more popular than any youtube video ever except for Gangnam Style - and youtube counts multiple views.) Once you reach group 48, you’ve recorded a view, download, and vote for every man, woman, and child on the planet - and to reach group 49, they’d all have to have done so 3 times over. So there’s significant limitations in place. (Generally you can say 0 <= $TrendScore < 49.)
I feel like if I stored the dateTime of each play download and vote, it would take up way too much space. I feel like I should store one dateTime, which would be when it was recently liked.
As Mittineague says, a single timestamp is not a trend. # of timestamps/time can be a trend (considered a popularity trend), but comparing a single timestamp isnt a trend - it’s a “freshness”. If 100,000,000 people view a video yesterday, and 1 person views another video today, which is more trending?
Trending can only be done by taking snapshots of data at particular times over time. Some refer to the process as data warehousing (OLAP). Of course, the Wikipedia article is talking about enterprise kind of DW, but the concept is the same. To get a trend, you have to have a “picture of the moment” and store it. Then you can analyse those pictures of data to see if what you are looking at is changing (for good or bad). Straight OLTP data won’t get you this alone. You need OLAP.