Im working with the Confluence loader in LangChain. I need to change the Confluence output from it's API by passing a value for it to expand the desired output like explained here.
The confluence loader has a function called load
def load(
self,
space_key: Optional[str] = None,
page_ids: Optional[List[str]] = None,
label: Optional[str] = None,
cql: Optional[str] = None,
include_restricted_content: bool = False,
include_archived_content: bool = False,
include_attachments: bool = False,
include_comments: bool = False,
content_format: ContentFormat = ContentFormat.STORAGE,
limit: Optional[int] = 50,
max_pages: Optional[int] = 1000,
ocr_languages: Optional[str] = None,
keep_markdown_format: bool = False,
keep_newlines: bool = False,
) -> List[Document]:
and i access it like this
documents = loader.load(
space_key="KB",
include_attachments=False,
keep_newlines=True,
keep_markdown_format=True,
)
the load function also has a if-condition
if space_key:
pages = self.paginate_request(
self.confluence.get_all_pages_from_space,
space=space_key,
limit=limit,
max_pages=max_pages,
status="any" if include_archived_content else "current",
expand=content_format.value,
)
docs += self.process_pages(
pages,
include_restricted_content,
include_attachments,
include_comments,
content_format,
ocr_languages=ocr_languages,
keep_markdown_format=keep_markdown_format,
keep_newlines=keep_newlines,
)
The expand argument use the content_format value from this class
class ContentFormat(str, Enum):
"""Enumerator of the content formats of Confluence page."""
EDITOR = "body.editor"
EXPORT_VIEW = "body.export_view"
ANONYMOUS_EXPORT_VIEW = "body.anonymous_export_view"
STORAGE = "body.storage"
VIEW = "body.view"
def get_content(self, page: dict) -> str:
return page["body"][self.name.lower()]["value"]
The if statement above send us to the get_all_pages_from_space function to call the Confluence API. The function looks like this
def get_all_pages_from_space(
self,
space,
start=0,
limit=50,
status=None,
expand=None,
content_type="page",
):
"""
Get all pages from space
:param space:
:param start: OPTIONAL: The start point of the collection to return. Default: None (0).
:param limit: OPTIONAL: The limit of the number of pages to return, this may be restricted by
fixed system limits. Default: 50
:param status: OPTIONAL: list of statuses the content to be found is in.
Defaults to current is not specified.
If set to 'any', content in 'current' and 'trashed' status will be fetched.
Does not support 'historical' status for now.
:param expand: OPTIONAL: a comma separated list of properties to expand on the content.
Default value: history,space,version.
:param content_type: the content type to return. Default value: page. Valid values: page, blogpost.
:return:
"""
return self.get_all_pages_from_space_raw(
space=space, start=start, limit=limit, status=status, expand=expand, content_type=content_type
).get("results")
I dont understand how I can set a custom value for the expand argument further down the chain, there is no **args/**kwargs I can set from the initial load function.
Update: Until someone gives me a better solution I have created a new class mimic-in ContentFormat like this
class _ContentFormat(str, Enum):
"""Enumerator of the content formats of Confluence page."""
EDITOR = "body.editor"
EXPORT_VIEW = "body.export_view"
ANONYMOUS_EXPORT_VIEW = "body.anonymous_export_view"
STORAGE = "body.storage,version"
VIEW = "body.view"
def get_content(self, page: dict) -> str:
return page["body"][self.name.lower()]["value"]
It is not beautiful, but works for now.