Interactive chart scraping

44 views Asked by At

I want to obtain historical data from charts, that are available online to turn the data into features for model training. I am aware of APIs on various platforms, but free APIs usually do not offer fetching long periods of historical data. The data I want to scrap is on this website: https://coincodex.com/market-cap/ I was able to inspect it and literally find the data, copy it and paste into some sort of file for future processing, e.g. real data

But I would like to automate this process and (hopefully, some beautiful day) I would love to be able to repeat it for multiple different charts.

import requests, json
import pandas as pd

params = {'charts':'ALL',
'samples':'md',
'assets':'SUM_ALL_COINS',
'include':'market_cap',
't':'5693598'}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0'}
data = requests.get(url=coincodex_url, params=params, headers=headers)
data.text

In this data.text I was able to find ...SUM_ALL_COINS.... The data inside this dict contained small amount of points and I assume this is not the right approach. When I tried to get the chart url via requests I did not work, because I think it is calling the api, which I cannot reach without having a key.

data = requests.get(url=r'https://coincodex.com/api/v1/assets/get_charts?charts=ALL&samples=md&assets=SUM_ALL_COINS&include=market_cap&t=5693614', params=params, headers=headers)

Would somebody suggest me a way of getting that data?

1

There are 1 answers

0
Andrej Kesely On

Here is an example how you can load the data from the URL to a pandas dataframe:

import pandas as pd
import requests

api_url = "https://coincodex.com/api/v1/assets/get_charts"

params = {
    "charts": "ALL",
    "samples": "md",
    "assets": "SUM_ALL_COINS",
    "include": "market_cap",
    "t": "5693725",
}


data = requests.get(api_url, params=params).json()
df = pd.DataFrame(data["BTC"]["ALL"], columns=["Date", "Value", "Cap"])
df["Date"] = pd.to_datetime(df["Date"], unit="s")

print(df.tail())

Prints:

                   Date         Value           Cap
295 2023-12-13 00:00:00  42775.265180  8.372833e+11
296 2023-12-25 00:00:00  43159.144454  8.452821e+11
297 2024-01-18 00:00:00  41201.735785  8.078289e+11
298 2024-01-30 00:00:00  43823.356362  8.597468e+11
299 2024-02-16 21:09:41  51790.060000  1.016620e+12