You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 23, 2024. It is now read-only.
Not sure if I will go with this project in the end but currently I've got an idea for an application which I've got since a while. I want to create a chatbot which helps researcher to query through vast amounts and "Difficult-to-Process" sources.
Context
More specific, I am experimenting with historical newspapers as the "Jüdische Zeitung" a pre-war Jewish newspaper from Vienna. The documents are printed in "Fraktur" typeface (screenshot). They offer a rich resource for historians but are really hard to work with due to readability for modern German speakers.
Goals
Accessibility: Studying newspapers in a antiquated typeface takes a lot mental effort and time. Read throughs take up a large chunk of the work of a historian. A tool that could support this, would make a huge difference.
Search by topics: Resources as the "Jüdische Zeitung" were published in weekly frequency. As historians read these sources with a specific question in mind, vast amounts of the information won't be relevant. The task is to find the articles which are.
Uncover hidden gems: Newspapers suffered under censorship during wartimes. Editorials had to find more subtle ways to report on sensitive issues. Historians are aware of these "codes" but in the original sources they are easily overlooked. A LLM-RAG based chatbot would be very helpful to find these.
Potential Challenges
(Optical) Character Recognition: I am not sure how deal with "Fraktur" typeface yet. Documents are uploaded but I don't know how to check if they text was detected fully. Are there ways to evaluate and improve the text inputs?
Checking for Quality and Utilisation: When I query for the articles about the region "Galicia", I expect that the app provides me the most relevant articles first but still provide me all articles which fit that theme. What methods are there to achieve this?
Preference for sources: Next to sources, I use literature/excerpts about the topic. When I query, I always prefer original sources though. The literature should only aid with additional context. How can I set up such preferences?
If you have any ideas or potential solution to tackle these challenges, comment below! Would be very appreciated!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Not sure if I will go with this project in the end but currently I've got an idea for an application which I've got since a while. I want to create a chatbot which helps researcher to query through vast amounts and "Difficult-to-Process" sources.
Context
More specific, I am experimenting with historical newspapers as the "Jüdische Zeitung" a pre-war Jewish newspaper from Vienna. The documents are printed in "Fraktur" typeface (screenshot). They offer a rich resource for historians but are really hard to work with due to readability for modern German speakers.
Goals
Potential Challenges
If you have any ideas or potential solution to tackle these challenges, comment below! Would be very appreciated!
Beta Was this translation helpful? Give feedback.
All reactions