0

I can currently scrape Javascript data from a post request I sent using requests then Soup. But I only want to scrape the product plu, sku, description and brand. I am struggling to find a way in which I can just print the data I need rather then the whole script. This is the text that is printed after I extract the script using soup. I will be scraping more than one product from multiple post requests, so the chunk idea is not really suitable.

<script type="text/javascript">
var dataObject = {

platform: 'desktop',
pageType: 'basket',
orderID: '',
pageName: 'Basket',
orderTotal: '92.99',
orderCurrency: 'GBP',
currency: 'GBP',
custEmail: '',
custId: '',
items: [

                {


                        plu: '282013',
                        sku: '653460',
                    category: 'Footwear',
                     description: 'Mayfly Lite Pinnacle Women&#039;s',
                     colour: '',
                     brand: 'Nike',
                     unitPrice: '90',
                     quantity: '1',
                     totalPrice: '90',
                     sale: 'false'
                }                                                       ]

};

As you can see it is far too much information.

1 Answer 1

1

How about this:

  1. You assign the captured text to a new multiline string variable called "chunk"
  2. Make a list of keys you are looking for
  3. Loop over each line to check if the line has a term that you want, and then print out that term:

    chunk = '''
    <script type="text/javascript">
    var dataObject = {
    .........blah blah.......
      plu: '282013',
      sku: '653460',
      category: 'Footwear',
      description: 'Mayfly Lite Pinnacle Women&#039;s',
      colour: '',
      brand: 'Nike',
      ..... blah .......
      };'''
    
    keys = ['plu', 'sku', 'description', 'brand']
    
    for line in chunk.splitlines():
      if line.split(':')[0].strip() in keys:
        print line.strip()
    

Result:

plu: '282013',
sku: '653460',
description: 'Mayfly Lite Pinnacle Women&#039;s',
brand: 'Nike',

You could obviously clean up the result using similar applications of split, strip, replace, etc.

Sign up to request clarification or add additional context in comments.

3 Comments

That does work, but I will be extracting multiple products so the names will change ect. If that makes sense
In your terms, is "name" a key in my list of keys? or the text after the colon for each key?
Can I send you the full script to have a look at? may help you understand what the problem is. Thanks for helping!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.