I want to use js beautify on some source but there isn't a way to detect what type of source it is. Is there any way, crude or not, to detect if the source is css, html, javascript or none?
Looking at their site they have this that looks like it'll figure out if it's html:
function looks_like_html(source) {
// <foo> - looks like html
// <!--\nalert('foo!');\n--> - doesn't look like html
var trimmed = source.replace(/^[ \t\n\r]+/, '');
var comment_mark = '<' + '!-' + '-';
return (trimmed && (trimmed.substring(0, 1) === '<' && trimmed.substring(0, 4) !== comment_mark));
}
just need to see if it's css, javascript or neither. This is running in node.js
So this code would need to tell me it's JavaScript:
var foo = {
bar : 'baz'
};
where as this code needs to tell me it's CSS:
.foo {
background : red;
}
So a function to test this would return the type:
function getSourceType(source) {
if (isJs) {
return 'js';
}
if (isHtml) {
return 'html';
}
if (isCss) {
return 'css';
}
}
There will be cases where other languages are used like Java where I need to ignore but for css/html/js I can use the beautifier on.
It depends if you are allowed to mix languages, as mentioned in the comments (i.e. having embedded JS and CSS in your HTML), or if those are separate files that you need to detect for some reason.
A rigorous approach would be to build a tree from the file, where each node would be a statement (in Perl, you can use HTML::TreeBuilder). Then you could parse it and compare with the original source. Then proceed by applying eliminating regexes to weed out chunks of code and split languages.
Another way would be to search for language-specific patterns (I was thinking that CSS only uses " *= " in some situations, therefore if you have " = " by itself, must be JavaScript, embedded or not). For HTML you for sure can detect the tags with some regex like
Basically then you would need to take into account some fancy cases like if the JavaScript is used to display some HTML code
Then again it really depends on the situation you are facing, and how the codes mix.