A shell script that I run in IPython returns the following object:
results = ['{"url": "https://url.com", "date": "2020-10-02T21:25:20+00:00", "content": "mycontent\nmorecontent\nmorecontent", "renderedContent": "myrenderedcontent", "id": 123, "username": "somename", "user": {"username": "somename", "displayname": "some name", "id": 123, "description": "my description", "rawDescription": "my description", "descriptionUrls": [], "verified": false, "created": "2020-02-00T02:00:00+00:00", "followersCount": 1, "friendsCount": 1, "statusesCount": 1, "favouritesCount": 1, "listedCount": 1, "mediaCount": 1, "location": "", "protected": false, "linkUrl": null, "linkTcourl": null, "profileImageUrl": "https://myprofile.com/mypic.jpg", "profileBannerUrl": "https://myprofile.com/mypic.jpg"}, "outlinks": [], "outlinks2": "", "outlinks3": [], "outlinks4": "", "replyCount": 0, "retweetCount": 0, "likeCount": 0, "quoteCount": 0, "conversationId": 123, "lang": "en", "source": "<a href=\\"mysource.com" rel=\\"something\\">Sometext</a>", "media": [{"previewUrl": "smallpic.jpg", "fullUrl": "largepic.jpg", "type": "photo"}], "forwarded": null, "quoted": null, "mentionedUsers": [{"username": "name1", "displayname": "name 1", "id": 345, "description": null, "rawDescription": null, "descriptionUrls": null, "verified": null, "created": null, "followersCount": null, "friendsCount": null, "statusesCount": null, "favouritesCount": null, "listedCount": null, "mediaCount": null, "location": null, "protected": null, "linkUrl": null, "link2url": null, "profileImageUrl": null, "profileBannerUrl": null}]}', ...]
whereas the ...
indicates more entries akin to the previous one. According to type(), this is an slist. According to the documentation of the aforementioned shell script, this is a jsonlines file.
Ultimately, I would like to convert this into a csv object where the keys are the columns and the values are the values, where each entry (like the one shown above) is a row. So something like:
url date content ...
https://url.com 2020-10-02T21:25:20+00:00 mycontent ...
I have tried the solution proposed here but I receive a data frame with key-value pairs like thus:
import pandas as pd
df = pd.DataFrame(data=results)
df = df[0].str.split(',',expand=True)
df = df.rename(columns=df.iloc[0])
Although your example data contains several issues, if you fix those, this works:
Note: the example data has been fixed in this example, changes:
"
was properly escaped in 'source'\n
was escaped as\\\\n
, could be\\n
as well, but I don't think you want the newlines in your csvIf results is a list of these:
If the errors in your input are limited to the above, you could fix them like this:
This unescapes the previously escaped
\\"
inside<>
and then replaces all"
inside<>
with\\"
and it also 'fixes' the newlines. If you have trouble understanding why the regexes work the way they do, that's probably a separate question.The whole thing:
Note: this uses the third party
regex
instead ofre
since it uses a variable length lookbehind.