-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have a html link in a paragraph that cannot be converted in PDF link #505
Comments
As already stated in this comment:
If you do not provide us with some minimal Python code, we won't be able to help you much. I was able to execute the following code without reproducing the issue you mentioned: from fpdf import fpdf, html
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.add_font("Kanit", style="I", fname="fonts/Kanit-Italic.ttf")
pdf.set_font("Kanit", size=24)
pdf.write_html('<p class="text_obisnuit">Intr-un articol precedent, <a href="https://neculaifantanaru.com/dupa-toate-regulile-artei.html"><em>Dupa toate regulile artei</em></a>, v-am povestit despre tanarul print Hamlet</p>')
pdf.output("issue_498.pdf") |
This is the complete PYTHON code. 1. It must also be taken into account that the signs : are lost in PDF, also the uppercase letter at the beginning of the line: For exemple:
in PDF looks like this:
2. Link problem as I showed above. 3. The tag into the paragraph, as I showed in the previous bug. HTML: IN PDF the second tag is still there, like this.
Here is an example of one of my html pages. Copy it on a html file, and test it. You can duplicate this html code in many pages you want, because I made a merge PDF also in python code (that works great) https://hastebin.com/puxecelivi.http MY PYTHON CODE: from fpdf import fpdf, html
import os
import re
from PyPDF2 import PdfFileMerger
def read_text_from_file(file_path):
"""
Aceasta functie returneaza continutul unui fisier.
file_path: calea catre fisierul din care vrei sa citesti
"""
with open(file_path, encoding='utf8', errors='ignore') as f:
text = f.read()
f.close()
return text
def write_to_file(text, file_path):
"""
Aceasta functie scrie un text intr-un fisier.
text: textul pe care vrei sa il scrii
file_path: calea catre fisierul in care vrei sa scrii
"""
with open(file_path, 'wb') as f:
f.write(text.encode('utf8', 'ignore'))
f.close()
dict_simboluri = dict()
dict_simboluri['ă'] = 'a'
dict_simboluri['â'] = 'a'
def save_to_pdf(directory_path):
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".html"):
file_path = root + os.sep + file_name
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
pdf.set_font('helvetica', size=12)
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.set_font('helvetica', size=14, style="B")
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.set_font('helvetica', size=12)
# extras data
date = re.search('<td class="text_dreapta">(.*?), in <a', file_content)
if (date == None):
print("Nu am gasit --- date --- in fisierul --- {} ---.".format(file_path))
else:
date = date.group(1)
pdf.set_text_color(0, 102, 204) # albastru
pdf.set_font('helvetica', size=8, style="B")
pdf.cell(txt=date)
pdf.ln()
pdf.ln()
pdf.ln()
pdf.ln()
pdf.set_text_color(0, 0, 0) # negru (default)
pdf.set_font('helvetica', size=12)
# extras text
articol = re.search('<!-- ARTICOL START -->([\s\S]*?)<!-- ARTICOL FINAL -->', file_content)
if (articol == None):
print("Nu am gasit --- ARTICOL START/FINAL --- in fisierul --- {} ---.".format(file_path))
else:
articol = articol.group(1)
articol = articol.replace(""", "\"")
articol = articol.replace("’", "'")
# paragraphs
par_regex = re.compile('<p class="text_obisnuit.*?">.*?</p>')
pars = re.findall(par_regex, articol)
pars_text = list()
if (len(pars) == 0):
print("Nu am gasit -- paragrafe text_obisnuit -- in fisierul --- {} ---.".format(file_path))
else:
for i in range(0, len(pars)):
if ('<p class="text_obisnuit">' in pars[i]):
# identificam clasa text_obisnuit si preluam textul
content = re.findall('<p class="text_obisnuit">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
# adaugam linie goala intre paragrafe
pdf.ln();
elif ('<p class="text_obisnuit2">' in pars[i]):
# identificam clasa text_obisnuit2 si preluam textul
content = re.findall('<p class="text_obisnuit2">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# setam fontul cu bold
pdf.set_font('helvetica', size=12, style="B")
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
# adaugam linie goala intre paragrafe
pdf.ln();
# resetam fontul
pdf.set_font('helvetica', size=12)
else:
continue
# adaugare link
pdf.ln()
pdf.ln()
pdf.set_font('helvetica', size=12, style="B")
pdf.cell(txt="Source:")
pdf.set_font('helvetica', size=12)
pdf.set_text_color(0, 102, 204) # albastru
pdf.cell(w=40, txt="https://neculaifantanaru.com/{}".format(file_name), link="https://neculaifantanaru.com/{}".format(file_name))
den_fisier = file_path.split('.')[0] + '.pdf'
pdf.output(den_fisier)
# break;
# functie care face merge la mai multe fisiere pdf
def merge_pdf_files(directory_path):
merger = PdfFileMerger()
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".pdf"):
print("PDF: ", file_name)
file_path = root + os.sep + file_name
merger.append(file_path)
merger.write(root + os.sep + "articles.pdf")
merger.close()
break;
save_to_pdf("c:\\Folder5\\")
merge_pdf_files("c:\\Folder5\\") |
Hi @me-suzy! If I understood correctly, the issue arises because you don't seem to use def issue():
file_path = "./issue.html"
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.set_font('Kanit', size=14)
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.output("issue.pdf") the header is shown like this pdf.multi_cell(w=190, txt=den_articol) to pdf.write_html(text=f'<h1 class="den_articol" itemprop="name">{den_articol}</h1>') the header seems to be shown correctly Pay attention also that with the |
Thank you for jumping in with this great answer @RedShy! @all-contributors please add @RedShy for question |
I've put up a pull request to add @RedShy! 🎉 |
I've put up a pull request to add @RedShy! 🎉 |
Providing a screenshot of your IDE with a line of code in red is not very helpful... Also, you did not provide any minimal code associated with the last errors you faced: Other As for myself, I'm sorry but I won't try to figure out what the problem is without seeing any code, nor take the time to read through all the previous 150+ lines of code you provided. I'll be glad to help you if you take the time to provide a minimal reproducible example and the associated full stacktrace |
|
So, I change all styles ARIAL, TIMES, KANIT, I get the same error:
|
AFTER UPDATE MY CODE WITH NEW FONT and modify those 2 lines, I get this error (I didn't have thise error before the change):
THIS IS MY LAST VERSION OF PYTHON CODE:
|
When you add a bold version of a font, you need to put also Also add |
ALMOST PERFECT !! Except one thing. The bold font does not stand out In my python code, I setup The bold font does not stand out, maybe because of the Kanit style font itself? In html, the first line is like this:
THE CODE VERSION 5 (almost perfect)
|
If you want both Bold and Italic you need to add the corresponding font. Also it doesn't work that you set You could modify |
ok, works. One more thing. I also have another kind of tag, into paragraph. I have a Example:
Must look like this in PDF (My name is James with BOLD and the rest of words to be normal text): My name is James: and I want to go home by Night. Please tell me where, and how to change my code as to work? |
Currently, as written in the documentation, For example you could use
I would put that line before everything else, just after opening the file, because I view it as pre-processing the file before using it with |
Brilliant. Thanks. |
I made a short tutorial with my code, that you helped me finnish it. Thanks for your help. Maybe some one needs a complete code for fpdf library. |
<p class="text_obisnuit">Intr-un articol precedent, <a href="https://neculaifantanaru.com/dupa-toate-regulile-artei.html"><em>Dupa toate regulile artei</em></a>, v-am povestit despre tanarul print Hamlet
shoult look like this in PDF
Intr-un articol precedent, Dupa toate regulile artei, v-am povestit despre tanarul print Hamlet
Instead of that, this is how it looks in PDF (also, in PDF, as you se below, the signs of
href=https
disappeared://
The text was updated successfully, but these errors were encountered: