1

I am trying to download data which is returned in an xml file from an api with the following url

URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'

When I use the url in my web browser the xml file downloaded looks like this

<?xml version="1.0" encoding="UTF-8"?>
<OASISReport xmlns="http://www.caiso.com/soa/OASISReport_v1.xsd">
<MessageHeader>
<TimeDate>2018-04-06T15:17:51-00:00</TimeDate>
<Source>OASIS</Source>
<Version>v20131201</Version>
</MessageHeader>
<MessagePayload>
<RTO>
<name>CAISO</name>
<REPORT_ITEM>
<REPORT_HEADER>
<SYSTEM>OASIS</SYSTEM>
<TZ>PPT</TZ>
<REPORT>PRC_FUEL</REPORT>
<UOM>US$</UOM>
<INTERVAL>ENDING</INTERVAL>
<SEC_PER_INTERVAL>3600</SEC_PER_INTERVAL>
</REPORT_HEADER>
<REPORT_DATA>
<DATA_ITEM>FUEL_PRC</DATA_ITEM>
<RESOURCE_NAME>CISO</RESOURCE_NAME>
<OPR_DATE>2013-09-19</OPR_DATE>
<INTERVAL_NUM>24</INTERVAL_NUM>

However, when I download using a python script it is something very different.

Python script:

r=requests.get(URL)
r.encoding="UTF-8"
with open ('data.xml','wb') as file:
    file.write(r.content)

Downloaded file:

PKEL520130919_20130928_PRC_FUEL_N_20180406_08_44_40_v1.xmlíÝïOǹàïþ+Pt¤ó)ÝåQ*SdJ§_,âlMÿþÇK{Í£yV{í;ï¶vRâ¹XÞÌ<÷=»ûþ×?lü<sy}õ§Ï&O·>Û_½»þöòê»?}öõ¿|þì³?<Ù?}q~rþzþÓõûÛ»ÿÇÕÍ>ûþöö§/67ùå§ï..o®¾»þqóæúbsáï}ûóäé¿n¾ýìî+¼ßÜ\|7ÿëüâÛùû»_¿¹üqþòâv~0Ý<û|kûó­Ý7/¶·¿ØÞú|kë­­ýÍ?þ'ûç×ÿ|ÿn~ðákïoþþ«'ûûíO~ðóÝWMîþcóão=Ùßü÷æï¿>»øõëoï~ãõÓ»ÿ¼ºøq~pøâäütóÃÿ¾ûGg§¯ß¼=ysôêÓ¯þzôâåÑëû?Ìÿßÿß~u·¤¿½¹ûcÿýÿÏÁÙë÷ùúèËýÍßãÉþ×§¯¾>ÿ¯ýÍûÿñdÿä«7G¯ÿöâË£¯^|u¼¿ùÇoÜýß½~ûÇoÍvï]þã·|üòþ¿ÿúå7/î~uÿ_¿­æþóöîOµ¿ùé÷îÿîóÓ¯_½ýêÅ«£Ãïî5pöúþËÝÇfo=ÿ|ò|óßü´·_}ýê`ºýi!~c᯿yq÷';~õæ¯4Ýz³µûÅïúÇïýÿów/|;«ÿø{|ïÝ«åÅ_l?ݹûÃýö¿?Áýµj¶Ym§í1÷ubDÔ&Ï­T©ÝgPFÕ[tµÚó¨~BK®Ul
@-ô¯Ò¢«Õ&PÛª=¶èjµéÔv£j-ºZm6µ½¨Úc®VÛÚ³¨Úc®V{l÷NjÏ£jM»Üû/0]îV­éLuÿp¦DO®ºm§Iôxð誫Ùp<DÏ®ºm:óÁ$z@xtÕÕlC8 L¢'GW]Í6Â$zDxtÕÕlC8"L¢gGW]=ij-tH(­ºm϶Ð)¡´êj¶!<Û¦¡SBiÕÕlCx¶MC§Òª«Ù0ÿN   ¥UW³¥@>þòaÂÛiÞ{v|4óÞU°u\&¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁeì×:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2áFplFplkÁ­­Ã«vlÑìŸ46æ½ßòóãɮ¥üð$¨Ñ~VX5:¸dÕèU£¬½bÕèÊ«f÷ó\6úè²Ñ#    ¹lô¸e³û.%¹ltpé²Ñ1¹ËF2\6ºä²Ñ3®7ºltÖe£«Û,}Oàqµ1ï}ø2t]sº¬A·5]5yêªÉPWMÞºjòÔUw ¬¼eÑäz&·óX4¹Ç¢ÉÝ<M®æ±hr3Ey,ÜËcÑèZ«&·ò\5¹çªÉ<WM®ä±jt#ÏUy®Üz\mÌu_/}/dI?=    lòÔeÏ®<pÕäÉ«&Y]5yïªÉÑ«&§®»jr÷ÂU£{>0\*Ùä#Ì&×ea6¹
³ÉÓVM¾u³É×N`69HÙäÔÒe£#rMîcÀlrù§À6æ½_cápjrnÉ¢É&ço,¿±hrúÆ¢Éá&go,½±htòæªÉÁ«&çn®»¹jrêæªÉ¡«&gn®¹¹jrâæªÉ«Fçm®·¹jrÚæªÉê\µhÌB¼YÍë.|ÇOÎO+^÷ÿK±Þ½òÛf¬÷_`å Ú´ß/

I am assuming it's an encoding issue, but I am struggling with the solution.

Thanks in advance for your help!

2
  • I get a zip using your url and not an xml file (inside the zip), so you need to first unzip it then retrieve your xml Commented Apr 6, 2018 at 15:52
  • Thanks! that was what was wrong. Commented Apr 6, 2018 at 16:18

2 Answers 2

2

This should help. The url you mentioned gives a zip. You can download that and extract it to get your XML.

EX:

import requests 
import zipfile 
import StringIO

URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
r = requests.get(URL, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
Sign up to request clarification or add additional context in comments.

Comments

1

You could also use urllib

import  urllib

urllib.urlretrieve(
"http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1", 
"oasis.zip")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.