I am currently working on a personal project and utilizing the chessdotcom Public API Package. I am currently able to store in a variable the PGN from the daily puzzle (Portable Game Notation) which is a required input to create a chess gif (https://www.chess.com/gifs).
I wanted to use requests and html parsers to essentially fill out the form on the gifs site and create a gif through my python script. I made a request to the gif website and the response.text returns a huge html string (thousands of lines) which I am parsing using html5lib. I am currently getting a "html5lib.html5parser.ParseError: Unexpected character after attribute value." I can't seem to figure out where in this giant response the issue is. What are some tips/tricks to debug this issue? Where do I even begin looking for this unexpected character?
import requests as req
import html5lib
from datetime import datetime
from chessdotcom import Client, get_player_profile, get_player_game_archives,get_player_stats, get_current_daily_puzzle, get_player_games_by_month
Client.request_config['headers']['User-Agent'] = 'PyChess Program for Automated YouTube Creation'
class ChessData:
def __init__(self, name):
self.player = get_player_profile(name)
self.archives = get_player_game_archives(name)
self.stats = get_player_stats(name)
self.games = get_player_games_by_month(name, datetime.now().year, datetime.now().month)
self.puzzle = get_current_daily_puzzle()
self.html_parser = html5lib.HTMLParser(strict=True, namespaceHTMLElements=True, debug=True)
def organize_puzzles(self, puzzles):
#dict_keys(['title', 'url', 'publish_time', 'fen', 'pgn', 'image'])
portableGameNotation = puzzles['pgn']
html_data = req.get('https://www.chess.com/gifs')
print(html_data.text)
self.html_parser.parse(html_data.text.replace('&', '&'))
def get_puzzles(self):
self.organize_puzzles(self.puzzle.json['puzzle'])
I had initially had issues with "Name Entity Expected. Got None" error which I temporarily bypassed by replacing all instances of & with & entity.
Traceback (most recent call last):
File "C:/ChessProgram/ChessTop.py", line 17, in <module>
main()
File "C:/ChessProgram/ChessTop.py", line 14, in main
ChessResults.get_puzzles()
File "C:\ChessProgram\ChessData.py", line 32, in get_puzzles
self.organize_puzzles(self.puzzle.json['puzzle'])
File "C:\ChessProgram\ChessData.py", line 29, in organize_puzzles
self.html_parser.parse(html_data.text.replace('&', '&'))
File "C:\ChessProgram\lib\site-packages\html5lib\html5parser.py", line 284, in parse
self._parse(stream, False, None, *args, **kwargs)
File "C:\ChessProgram\lib\site-packages\html5lib\html5parser.py", line 133, in _parse
self.mainLoop()
File "C:\ChessProgram\lib\site-packages\html5lib\html5parser.py", line 216, in mainLoop
self.parseError(new_token["data"], new_token.get("datavars", {}))
File "C:\ChessProgram\lib\site-packages\html5lib\html5parser.py", line 321, in parseError
raise ParseError(E[errorcode] % datavars)
html5lib.html5parser.ParseError: Unexpected character after attribute value.
I tried replacing the & with & to fix the entity name issue and manually searched through this html response for the different attributes and looking for anything out of place.
htmlmay use&in many places - ie.>,<,©, etc. - so replacing all&with&may create wrong values.request.post()to send data to page - to simulate filled form. And later you may need to parse page with result. But maybe it will need only to find link to GIF using standard functions for string - without parsing all HTML.cookiesand it send uniquetokenin form. It may need to userequests.Sessionandgetfirst page with form to getcookies. But it may have more complex system to block scripts/bots and it may need Selenium to control real web browser.beautifulsoupto get token, and later I used normaltext.find()to get url to image.html5lib.HTMLParser()without parameters (or withstricte=False) and it will work without.replace('&', '&').