I am trying to scrape the MLB daily lineup information from here: https://www.rotowire.com/baseball/daily-lineups.php
I am trying to use python with requests, BeautifulSoup and pandas.
My ultimate goal is to end up with two pandas data frames.
First is a starting pitching data frame:
| date | game_time | pitcher_name | team | lineup_throws |
|---|---|---|---|---|
| 2024-03-29 | 1:40 PM ET | Spencer Strider | ATL | R |
| 2024-03-29 | 1:40 PM ET | Zack Wheeler | PHI | R |
Second is a starting batter data frame:
| date | game_time | batter_name | team | pos | batting_order | lineup_bats |
|---|---|---|---|---|---|---|
| 2024-03-29 | 1:40 PM ET | Ronald Acuna | ATL | RF | 1 | R |
| 2024-03-29 | 1:40 PM ET | Ozzie Albies | ATL | 2B | 2 | S |
| 2024-03-29 | 1:40 PM ET | Austin Riley | ATL | 3B | 3 | R |
| 2024-03-29 | 1:40 PM ET | Kyle Schwarber | PHI | DH | 1 | L |
| 2024-03-29 | 1:40 PM ET | Trea Turner | PHI | SS | 2 | R |
| 2024-03-29 | 1:40 PM ET | Bryce Harper | PHI | 1B | 3 | L |
This would be for all game for a given day.
I've tried adapting this answer to my needs but can't seem to get it to quite work: Scraping Web data using BeautifulSoup
Any help or guidance is greatly appreciated.
Here is the code from the link I am trying to adapt, but can't seem to make progress:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/baseball/daily-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
weather = []
for tag in soup.select(".lineup__bottom"):
header = tag.find_previous(class_="lineup__teams").get_text(
strip=True, separator=" vs "
)
rain = tag.select_one(".lineup__weather-text > b")
forecast_info = rain.next_sibling.split()
temp = forecast_info[0]
wind = forecast_info[2]
weather.append(
{"Header": header, "Rain": rain.text.split()[0], "Temp": temp, "Wind": wind}
)
df = pd.DataFrame(weather)
print(df)
The information I want seems to be contained in lineup__main and not in lineup__bottom.
You have to iterate the boxes and select all your expected features.