I know very little about programming. I just know that scrape exists, but it's impossible to write code for it, so I tried it with the help of chatGPT.
I want to scrap this information in https://www.fangraphs.com/tools/wpa-inquirer Follow the 5 dropdowns from Run environment to Run differential. The gray background values below change. I would like to collect those values that vary depending on five conditions.
I asked ChatGPT a question, got the code, and ran it. An example of the result I want is as shown in the following photo.enter image description here
However, even though I ran the code, I could not get the desired result. The most positive result was that there were no errors in code execution, but the Home/Away win rate and LI values remained unchanged, possibly because the drop-down value could not be changed.
The code using Selenium was roughly similar to the following.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import pandas as pd
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.fangraphs.com/tools/wpa-inquirer'
driver.get(url)
def get_leverage_index(run_env, base_situation, inning, outs, run_differential):
select_dropdown('rcbRun', run_env)
select_dropdown('rcbBase', base_situation)
select_dropdown('rcbInning', inning)
select_dropdown('rcbOuts', outs)
select_dropdown('rcbScore', run_differential)
leverage_index = driver.find_element(By.XPATH, '//td[text()="Leverage Index"]/following-sibling::td').text
return leverage_index
def select_dropdown(dropdown_id, value):
input_element = driver.find_element(By.CSS_SELECTOR, f"#{dropdown_id}_Input")
driver.execute_script("arguments[0].value = arguments[1];", input_element, value)
data = []
run_env_values = ['3.0', '3.5', '4.0', '4.5', '5.0', '5.5', '6.0', '6.5']
base_situation_values = ['_ _ _', '1 _ _', '_ 2 _', '1 2 _', '_ _ 3', '1 _ 3', '_ 2 3', '1 2 3']
inning_values = ['1 (Top)', '1 (Bottom)', '2 (Top)', '2 (Bottom)', '3 (Top)', '3 (Bottom)',
'4 (Top)', '4 (Bottom)', '5 (Top)', '5 (Bottom)', '6 (Top)', '6 (Bottom)',
'7 (Top)', '7 (Bottom)', '8 (Top)', '8 (Bottom)', '>= 9 (Top)', '>= 9 (Bottom)']
outs_values = ['0', '1', '2']
run_differential_values = ['-10', '-9', '-8', '-7', '-6', '-5', '-4', '-3', '-2', '-1', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
progress_count = 0
for run_env in run_env_values:
for base_situation in base_situation_values:
for inning in inning_values:
for outs in outs_values:
for run_differential in run_differential_values:
progress_count += 1
print(f'({progress_count}/{len(run_env_values) * len(base_situation_values) * len(inning_values) * len(outs_values) * len(run_differential_values)})')
leverage_index = get_leverage_index(run_env, base_situation, inning, outs, run_differential)
data.append([run_env, base_situation, inning, outs, run_differential, leverage_index])
driver.quit()
df = pd.DataFrame(data, columns=['Run Environment', 'Base Situation', 'Inning', 'Outs', 'Run Differential', 'Leverage Index'])
df.to_excel('leverage_index_data.xlsx', index=False)
When you are ready to extract data you can get the html content from the table.
Please let me know if this is what you are looking for :