How to get the content or title of a wikipedia page using erlang?

144 views Asked by At
-module(wikipedia).
-export([main/0]).
-define(Url, "http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=Chicago").
-define(Match, "^[A-Za-z]+[A-Za-z0-9]*$").

main() ->
    inets:start(),
    %% Start ssl application
  ssl:start(),
    {ok, {_Status, _Header, Body}} = httpc:request(?Url),
    T = re:run(Body, ?Match, [{capture, all_but_first, binary}]),
    io:format("~s~n",[T]).

I want to store the content of the wikipedia page in "T" using the reqular expression Match. And then I was going to fetch the title. But this above code says nomatch. I am not getting how to fetch the title of a wikipedia page using erlang. Please help.(I am new to erlang). [I want something like :https://stackoverflow.com/questions/13459598/how-to-get-titles-from-a-wikipedia-page]

1

There are 1 answers

0
codeadict On

First, I think the title is already in your URL: "Chicago", if that the case just pattern match the URL to Obtain the title. If not that the case I suggest that you should use an XML parsing module like xmlerl:

-module(parse_title).
-include_lib("xmerl/include/xmerl.hrl").

-export([main/0]).

main() ->
  inets:start(),
  ssl:start(),
  U =  "http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=Chicago",
  {ok, {_, _, Body}} = httpc:request(U),
  {Xml,_} = xmerl_scan:string(Body),
  [Title|_] = [Value || #xmlAttribute{value = Value} <- xmerl_xpath:string("//api/parse/@title", Xml)],
  Title.