Game Statistics |
Posted: 06 Oct 2020 19:59 | |||
Posts: 16 Joined: 2020 | Hey there, I am very new to development and on my journey I have discovered webscraping. But I am doubting that what I am doing is entirely legal according to the terms and conditions of scoreboard.com I am not doing anything for profit - I am just intending on scraping outcomes of various sporting events. At the moment, I just went popular and chose the English Premier League (thinking this would have the most data available). I am happy to keep scraping away - as I doubt I am causing scoreboard.com any issues. But I figured there may be a more legal avenue and here I stumbled about The Sports DB. The type of data I am looking for is up to 20 statistics at the end of the game and could include: Goals (home) Goals (away) Possession Goal Attempts Shot on Goal Shots off Goal Blocked Shots Free kicks Corner Kicks Offsides Goalkeeper Saves Fouls Red Cards Yellow Cards Total Passes Tackles Attacks Dangerous Attacks Goals Against This is stuff I've easily scraped - but if I can receive this data in other legal methods, I'd be happy to learn how. Is there any such thing here? Or that you could direct me to? As this is not a profit situation, it's a project for my course I am on, I am on an extreme budget of £0 (which is €0 as well as $0). Maybe that changes at a later stage, who knows. But right now, I am in that financial situation. Happy to hear if you know of something that could suit my needs. Thanks kindly. sorewinner | ||
Posted: 06 Oct 2020 20:14 | |||
zag Posts: 3,515 Joined: 2020 | Welcome, We don't offer those stats yet for soccer, but they are easy enough to add(and on the todo list) and could probably add them to the free tier. Let me have a look at our source API's. I don't think there's anything wrong with scraping personally, but an API is more solid as it should never change. | ||
Posted: 06 Oct 2020 20:16 | |||
zag Posts: 3,515 Joined: 2020 | Something like this? | ||
Posted: 06 Oct 2020 20:20 | |||
Posts: 16 Joined: 2020 | Yes, exactly like that one | ||
Posted: 06 Oct 2020 20:26 | |||
Posts: 16 Joined: 2020 | The only thing with scraping is it may go against the ToCs of the site. For example, the ToCs of scoreboard.com Also, their headers change extremely regularly - making scraping a manual exercise sometimes. | ||
Posted: 06 Oct 2020 21:50 | |||
zag Posts: 3,515 Joined: 2020 | OK since this was already on the todo list and I like new users requesting things (in the hope they stick around and help out) I've made some progress: Event Example https://www.thesportsdb.com/event/1032723 API Example https://www.thesportsdb.com/api/v1/json/1/lookupeventstats.php?id=1032723 NOTE: For Moderators, you can manually sync the event using the sync icon next to the Event Statistics header on the event page for Soccer. This is all still alpha and subject to change if needed, but it looks good to me. pauld0051 | ||
Posted: 07 Oct 2020 07:13 | |||
Posts: 16 Joined: 2020 | Great! Thank you for this! I will try and get this integrated as soon as I can (will aim for before this weekend's matches). I will definitely try to stick around, but as I am quite new to this journey of coding, I might not be much help to begin with. Would love to learn though! Thanks again - looking forward to checking this out. | ||
Posted: 07 Oct 2020 07:32 | |||
Posts: 16 Joined: 2020 | As a side, and this is where my noobyness comes in, I'm not quite sure how I am meant to get the data. I have two ways in mind. First, I can just manually hit the "world" button by each of the events and scrape from that (this is fine for me, I like this idea, it's a little manual, but at least I am sitting within copyright rules). Or, what is option B? I have to be honest, I have done very little on APIs outside of Google Maps and that was being lead like a baby the whole way. If I went with the scraping idea, is this ok to begin with - honestly, I will get better | ||
Posted: 07 Oct 2020 08:19 | |||
Posts: 346 Joined: 2020 | I have to be honest, I have done very little on APIs outside of Google Maps and that was being lead like a baby the whole way. If I went with the scraping idea, is this ok to begin with - honestly, I will get better What language are you using? Be happy to give some pointers for api fetching | ||
Posted: 07 Oct 2020 08:36 | |||
Posts: 346 Joined: 2020 | Event Example https://www.thesportsdb.com/event/1032723 API Example https://www.thesportsdb.com/api/v1/json/1/lookupeventstats.php?id=1032723 NOTE: For Moderators, you can manually sync the event using the world icon next to the Event Statistics header on the event page for Soccer. This is all still alpha and subject to change if needed, but it looks good to me. Amazing, been wanting this for ages! Doesn't look to be working with my apiKey, is this because its in alpha and locked down to "1"? | ||
Posted: 07 Oct 2020 08:53 | |||
zag Posts: 3,515 Joined: 2020 | Yes only test key "1" until the data is finalized. Normal users such as yourself @pauld0051 cannot force a manual event stats sync, only moderators can. I'll try to remember to do the premiership matches this weekend but others can also do it. I'll also see if I can write a script to do previous events. Depending on what language you use, it should have a JSON_Decode function which is a one liner to turn the data into a normal array that you can process. In PHP is JSON_Decode and in python you need to 'import json' then use json.loads function. It should be pretty easy, much easier that scraping anyway | ||
Posted: 07 Oct 2020 08:58 | |||
Posts: 16 Joined: 2020 | Great. I will look up how to do this. I am going to use Python for this. If it is a one liner, that will be easier than I am considering to do Unless you feel like giving me a hint on how to do this in Python (which, I realise I am asking too much already). My last call for API was in JS - so this will be my first in Python. Will be happy to try and test this for you too | ||
Posted: 07 Oct 2020 09:07 | |||
Posts: 16 Joined: 2020 | What language are you using? Be happy to give some pointers for api fetching Sorry... I didn't see this message here. Oh, yes, I would LOVE some pointers! I am going to use Python - coding on VS Code. So I don't actually know what modules or libraries I need yet. Haven't looked anything up at all. But very keen to get this started for the coming weekend's games as a test run. | ||
Posted: 07 Oct 2020 09:17 | |||
Posts: 346 Joined: 2020 | https://repl.it/repls/SpecializedDevotedApplet I've made a quick sample here of how to get the JSON response from that api in Python in the link above. let me know if thats enough to get you started! pauld0051, zag | ||
Posted: 07 Oct 2020 09:50 | |||
Posts: 16 Joined: 2020 | I've made a quick sample here of how to get the JSON response from that api in Python in the link above. let me know if thats enough to get you started! Oh, that's really nice and simple. So - when it comes to the weekend's games how do I get all 10 games after the last one finishes? Do I do a separate page for each game and use the unique ID? Or is there a set way that would acquire all the IDs of the games and it would just find the data for each? I have to yet learn how to use that data I'm getting in JSON format - at the moment I am not yet sure even how I plan on displaying this data. But getting each to a value would be good. Such as: aston_villa-shots_on_goal = 11 For example. Then I can print that to the data for the team along with the other stats (will also need goals for and against too which isn't on this particular list). | ||
Posted: 07 Oct 2020 10:04 | |||
Posts: 346 Joined: 2020 | You'd first need to pull the list of events from that day from the https://www.thesportsdb.com/api/v1/json/1/eventsday.php?d=2020-10-10 api. Then filter that json object so you've just got the idEvent left over. Then create a for loop, and loop over the new stats api passing in a different id each time. Some useful links below: https://www.w3schools.com/python/numpy_array_filter.asp https://www.w3schools.com/python/python_for_loops.asp Feel free to dm me on twitter (@cydalby) if you need any more help! GOAviator | ||
Posted: 07 Oct 2020 10:21 | |||
Posts: 16 Joined: 2020 | Then filter that json object so you've just got the idEvent left over. Then create a for loop, and loop over the new stats api passing in a different id each time. Some useful links below: https://www.w3schools.com/python/numpy_array_filter.asp https://www.w3schools.com/python/python_for_loops.asp Feel free to dm me on twitter (@cydalby) if you need any more help! Thanks... I'm not much a twitter buff to be fair. But I will look at those links and see what I can do there. This is really awesome stuff. Do you discord? paulyd#7399 | ||
Posted: 07 Oct 2020 18:05 | |||
Posts: 16 Joined: 2020 | So I tried a few things. And a couple have worked and this one hasn't (as of yet). event_day = requests.get( f'https://www.thesportsdb.com/api/v1/json/{apiKey}/eventsday.php?id={date}') I have apiKey = 1 and date = 2020-10-10 but I can not seem to get this to print out. Any ideas? The other ones work (for a given ID): r = requests.get( f'https://www.thesportsdb.com/api/v1/json/{apiKey}/lookupeventstats.php?id={id}') arr = np.array([r.json()]) | ||
Posted: 07 Oct 2020 18:12 | |||
Posts: 16 Joined: 2020 | My second issue is the massive amount of data - just to sort out the bits I want. I have done a little in the past on sorting JSON files in JS, but never really in python. And all the W3 schools examples seem to be un-nested arrays. But I am sure it is pretty simple. So for example, nested here I have 'strEvent': 'Aston Villa vs Liverpool' So I'd like to take that, split it so Aston Villa gives the variable home_team = "Aston Villa" and away_team = "Liverpool" This seems pretty straight forward. I can probably build the rest if I get a push in that direction | ||
Posted: 07 Oct 2020 18:22 | |||
Posts: 50 Joined: 2020 | So for example, nested here I have 'strEvent': 'Aston Villa vs Liverpool' So I'd like to take that, split it so Aston Villa gives the variable home_team = "Aston Villa" and away_team = "Liverpool" This seems pretty straight forward. I can probably build the rest if I get a push in that direction If you go ahead in the event json reply, you will find splitted teams in home team and away team | ||
Posted: 07 Oct 2020 18:25 | |||
Posts: 16 Joined: 2020 | If you go ahead in the event json reply, you will find splitted teams in home team and away team Is that the one I am currently having troubles getting? | ||
Posted: 07 Oct 2020 19:08 | |||
Posts: 16 Joined: 2020 | So it is not the one I am having trouble with, I can see home and away team, but I don't know the code to extract that. I experimented with this: r2 = requests.get( f'https://www.thesportsdb.com/api/v1/json/{apiKey}/lookupevent.php?id={id}') arr2 = np.array([r2.json()]) #print(arr2) home_team = arr2['events'][0][1]['strHomeTeam'] print("Home team is " + home_team) But it really didn't work - I kept getting: IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices Any advice? | ||
Posted: 08 Oct 2020 14:43 | |||
Posts: 16 Joined: 2020 | Yay, I got it! But the way I am doing it feels long winded. Is there a way to take this and make it a bit shorter? event_id = arr_events[0]['events'][0]['idEvent'] home_team = arr_events[0]['events'][0]['strHomeTeam'] away_team = arr_events[0]['events'][0]['strAwayTeam'] home_score = arr_events[0]['events'][0]['intHomeScore'] away_score = arr_events[0]['events'][0]['intAwayScore'] home_shots_on_goal = arr_stats[0]['eventstats'][0]['intHome'] away_shots_on_goal = arr_stats[0]['eventstats'][0]['intAway'] home_shots_off_goal = arr_stats[0]['eventstats'][1]['intHome'] away_shots_off_goal = arr_stats[0]['eventstats'][1]['intAway'] home_total_shots = arr_stats[0]['eventstats'][2]['intHome'] away_total_shots = arr_stats[0]['eventstats'][2]['intAway'] home_blocked_shots = arr_stats[0]['eventstats'][3]['intHome'] away_blocked_shots = arr_stats[0]['eventstats'][3]['intAway'] home_shots_inside_box = arr_stats[0]['eventstats'][4]['intHome'] away_shots_inside_box = arr_stats[0]['eventstats'][4]['intAway'] home_outside_box = arr_stats[0]['eventstats'][5]['intHome'] away_outside_box = arr_stats[0]['eventstats'][5]['intAway'] home_corners = arr_stats[0]['eventstats'][6]['intHome'] away_corners = arr_stats[0]['eventstats'][6]['intAway'] home_offsides = arr_stats[0]['eventstats'][7]['intHome'] away_offsides = arr_stats[0]['eventstats'][7]['intAway'] home_possession = arr_stats[0]['eventstats'][8]['intHome'] away_possession = arr_stats[0]['eventstats'][8]['intAway'] home_yellow_cards = arr_stats[0]['eventstats'][9]['intHome'] away_yellow_cards = arr_stats[0]['eventstats'][9]['intAway'] home_red_cards = arr_stats[0]['eventstats'][10]['intHome'] away_red_cards = arr_stats[0]['eventstats'][10]['intAway'] home_saves = arr_stats[0]['eventstats'][11]['intHome'] away_saves = arr_stats[0]['eventstats'][11]['intAway'] home_total_passes = arr_stats[0]['eventstats'][12]['intHome'] away_total_passes = arr_stats[0]['eventstats'][12]['intAway'] home_accurate_passes = arr_stats[0]['eventstats'][13]['intHome'] away_accurate_passes = arr_stats[0]['eventstats'][13]['intAway'] home_passes_percent = arr_stats[0]['eventstats'][14]['intHome'] away_passes_percent = arr_stats[0]['eventstats'][14]['intAway'] home_fouls = arr_stats[0]['eventstats'][15]['intHome'] away_fouls = arr_stats[0]['eventstats'][15]['intAway'] That's pretty much all the data I need. zag | ||
Posted: 08 Oct 2020 20:07 | |||
Posts: 50 Joined: 2020 | I am sorry I can’t help with python. I would like to learn it. | ||
Posted: 08 Oct 2020 21:13 | |||
Posts: 16 Joined: 2020 | You're welcome to copy and paste - the rest of the code is further up. Happy to share. | ||
Who is Online? In total there are 68 users online :: 3 registered, 0 hidden and 65 guests (based on users active over the past 5 minutes) Most users ever online was 424 on Fri Nov 10, 2017 9:02 pm | About Us Discussion forum for TheSportsDB.com site and related topics | Rules - Be Polite - Respect other users - Always post log files with issues - Try to be helpful - No Piracy discussion |