Scraping football matches data using Python

144 views Asked by At

I am extracting football results with their odds, I have created 2 for-loops, one for the extraction of team names and results and one for the odds. Every single for loop works well, but I don't know how I can combine them together to get the correct output.

I'll show you the code, the current output and the desired output.

This is my code:

from requests_html import HTMLSession
import tabulate
from tabulate import tabulate
 
matchlink = 'https://www.betexplorer.com/football/england/league-one/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)
 
 
allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
matchodds = r.html.find('.table-main__odds')
 
odds = [matchodd.text for matchodd in matchodds]
 
 
 
for match, res in zip(allmatch, results):   #works
    for i in range(0, len(odds), 3): 
        if res.text == 'POSTP.':
            continue
 
    print(match.text, res.text, odds[i:i+3])

This is a part of my current output:

Wigan - Peterborough 2:1 ['2.09', '3.45', '3.37']
Reading - Bristol Rovers 1:1 ['2.09', '3.45', '3.37']
Shrewsbury - Bolton 0:2 ['2.09', '3.45', '3.37']
Fleetwood - Blackpool 3:3 ['2.09', '3.45', '3.37']
Derby - Northampton 4:0 ['2.09', '3.45', '3.37']
Lincoln - Oxford Utd 0:2 ['2.09', '3.45', '3.37']

and this is my desired output:

Wigan - Peterborough 2:1 ['2.97', '3.57', '2.23']
Reading - Bristol Rovers 1:1 ['2.57', '3.51', '2.54']
Shrewsbury - Bolton 0:2 ['3.71', '3.47', '1.98']
Fleetwood - Blackpool 3:3 ['3.26', '3.25', '2.22']
Derby - Northampton 4:0 ['1.54', '3.91', '6.24']
Lincoln - Oxford Utd 0:2 ['3.28', '3.18', '2.25']

is it also possible to have odds without square brackets? I'd like to delete them

Thanks

1

There are 1 answers

9
Matt Pitkin On BEST ANSWER

Try switching:

for match, res in zip(allmatch, results):   #works
    for i in range(0, len(odds), 3): 
        if res.text == 'POSTP.':
            continue
 
    print(match.text, res.text, odds[i:i+3])

to be:

idx = 0
for match, res in zip(allmatch, results):   #works
    if res.text == "POSTP.":
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    # increment index by 3
    idx += 3

This uses f-strings and the join method of a string.

Update

Just to note, with requests_html v0.10.0, I have to do the following to get this to work:

from requests_html import HTMLSession
 
matchlink = 'https://www.betexplorer.com/football/england/league-one/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)

allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')

odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

idx = 0
for match, res in zip(allmatch, results):
    if res.text == 'POSTP.':
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    idx += 3