I am trying to use multithreading to insert raw_html into postgres but I can't get the data to insert and keep getting
LINE 1: INSERT INTO raw_html VALUES b'<!DOCTYPE html><html lang="en"...
^
Error: syntax error at or near "b'<!DOCTYPE html
I can't figure out what the b is doing in front of each of these strings (nor can I find anything online explaining what the b is) and I am assuming that's why the insert is failing. I also am trying to get the raw_html string to convert to a tuple because I think it needs to be a tuple for sql to accept it. I can't figure out what to try next to fix these two issues. Any direction appreciated:
UPDATE:
def execute_values(LST):
DB_HOST = "localhost"
DB_NAME = "yellow_pages_scraper"
DB_USER = "justinbenfit"
DB_PASS = "postgres"
DB_URL = "mydburl"
conn = psycopg2.connect(dbname=DB_NAME, user=DB_USER, password=DB_PASS, host=DB_HOST)
cur = conn.cursor()
try:
LST = tuple(LST)
for L in LST:
cur.execute("INSERT INTO raw_html VALUES (?)", (L.decode('utf-8'),))
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
return 1
print(f"the dataframe is inserted")
def main(LINKS):
t0 = time.time()
download_raw(LINKS)
execute_values(LST)
t1 = time.time()
print(f"{t1-t0} seconds to download {len(LST)} raw_html.")