I stumbled upon the oEmbed spec, and found they also have a providers.json file where you can find all their known oEmbed providers. It's basically one big array, with objects like these:
{
"provider_name": "Vimeo",
"provider_url": "https://vimeo.com/",
"endpoints": [
{
"schemes": [
"https://vimeo.com/*",
"https://vimeo.com/album/*/video/*",
"https://vimeo.com/channels/*/*",
"https://vimeo.com/groups/*/videos/*",
"https://vimeo.com/ondemand/*/*",
"https://player.vimeo.com/video/*"
],
"url": "https://vimeo.com/api/oembed.{format}",
"discovery": true
}
]
},
{
"provider_name": "YouTube",
"provider_url": "https://www.youtube.com/",
"endpoints": [
{
"schemes": [
"https://*.youtube.com/watch*",
"https://*.youtube.com/v/*",
"https://youtu.be/*",
"https://*.youtube.com/playlist?list=*",
"https://youtube.com/playlist?list=*",
"https://*.youtube.com/shorts*"
],
"url": "https://www.youtube.com/oembed",
"discovery": true
}
]
},
}
I'd like to make use of this in my Javascript project, but very unsure how to use it efficiently. Say you have a function which is given some URL, and now you need to find which provider (if any) this URL matches. How would you do that?
A brute-force way could of course be to just loop through each block, convert each entry of schemes into a regex, and test it, until you either find a match, or reach the end of the list. This feels like it's going to be very slow though. Are there ways to speed it up somehow? Are there for example more efficient ways of matching those wildcard schemes than creating regex instances and testing with those?
Looks like '*' cannot contain '/' and in your examples it always are subdomains when at front. -> You could extract the domain form all schemes and use them as keys in a HashMap to reduce checking for a specific match to only a few tests.