Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get PDFs to jump to their pages again #1283

Merged
merged 4 commits into from
Feb 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion app/backend/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,11 @@ async def content_file(path: str):
This is also slow and memory hungry.
"""
# Remove page number from path, filename-1.txt -> filename.txt
# This shouldn't typically be necessary as browsers don't send hash fragments to servers
if path.find("#page=") > 0:
path_parts = path.rsplit("#page=", 1)
path = path_parts[0]
logging.info("Opening file %s at page %s", path)
logging.info("Opening file %s", path)
blob_container_client = current_app.config[CONFIG_BLOB_CONTAINER_CLIENT]
try:
blob = await blob_container_client.get_blob_client(path).download_blob()
Expand Down
2 changes: 1 addition & 1 deletion app/backend/approaches/chatapproach.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class ChatApproach(Approach, ABC):
Make sure the last question ends with ">>".
"""

query_prompt_template = """Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge.
query_prompt_template = """Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was caught by a reader of my blog post today-https://blog.pamelafox.org/2024/02/rag-techniques-cleaning-user-questions.html

Doubt it affects anything as GPTs dont mind lil errors.

You have access to Azure AI Search index with 100's of documents.
Generate a search query based on the conversation and the new question.
Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
Expand Down
9 changes: 8 additions & 1 deletion app/frontend/src/components/AnalysisPanel/AnalysisPanel.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,19 @@ export const AnalysisPanel = ({ answer, activeTab, activeCitation, citationHeigh
const fetchCitation = async () => {
const token = client ? await getToken(client) : undefined;
if (activeCitation) {
// Get hash from the URL as it may contain #page=N
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense since we download the full object url. Appreciate your fix

// which helps browser PDF renderer jump to correct page N
const originalHash = activeCitation.indexOf("#") ? activeCitation.split("#")[1] : "";
const response = await fetch(activeCitation, {
method: "GET",
headers: getHeaders(token)
});
const citationContent = await response.blob();
const citationObjectUrl = URL.createObjectURL(citationContent);
let citationObjectUrl = URL.createObjectURL(citationContent);
// Add hash back to the new blob URL
if (originalHash) {
citationObjectUrl += "#" + originalHash;
}
setCitation(citationObjectUrl);
}
};
Expand Down
4 changes: 2 additions & 2 deletions app/frontend/src/components/AnalysisPanel/ThoughtProcess.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ interface Props {
export const ThoughtProcess = ({ thoughts }: Props) => {
return (
<ul className={styles.tList}>
{thoughts.map(t => {
{thoughts.map((t, ind) => {
return (
<li className={styles.tListItem}>
<li className={styles.tListItem} key={ind}>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were getting a bunch of React errors locally from missing key= values. An index isnt the best thing to use but it should be okay for our situation.

<div className={styles.tStep}>{t.title}</div>
{Array.isArray(t.description) ? (
<SyntaxHighlighter language="json" wrapLongLines className={styles.tCodeBlock}>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,23 @@ interface Props {
supportingContent: string[] | { text: string[]; images?: { url: string }[] };
}

interface SupportingItemProps {
title: string;
content: string;
}

export const SupportingContent = ({ supportingContent }: Props) => {
const textItems = Array.isArray(supportingContent) ? supportingContent : supportingContent.text;
const imageItems = !Array.isArray(supportingContent) ? supportingContent?.images : [];
return (
<ul className={styles.supportingContentNavList}>
{textItems.map(c => {
{textItems.map((c, ind) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, this was just fixing index. And I moved the HTML inside the class as I couldn't deal with writing the correct Typescript for the separate component. :)

const parsed = parseSupportingContentItem(c);
return <TextSupportingContent {...parsed} />;
return (
<li className={styles.supportingContentItem} key={ind}>
<h4 className={styles.supportingContentItemHeader}>{parsed.title}</h4>
<p className={styles.supportingContentItemText} dangerouslySetInnerHTML={{ __html: parsed.content }} />
</li>
);
})}
{imageItems?.map(i => {
return <img className={styles.supportingContentItemImage} src={i.url} />;
{imageItems?.map((img, ind) => {
return <img className={styles.supportingContentItemImage} src={img.url} key={ind} />;
})}
</ul>
);
};

export const TextSupportingContent = ({ title, content }: SupportingItemProps) => {
return (
<li className={styles.supportingContentItem}>
<h4 className={styles.supportingContentItemHeader}>{title}</h4>
<p className={styles.supportingContentItemText} dangerouslySetInnerHTML={{ __html: content }} />
</li>
);
};