2

It's well known, that 'array of objects' format of data storage is well suitable for data persisting. I'd be extremely grateful if a JavaScript guru helps me with finding the answer to how do I read this HTML-table with vanilla JavaScript and transport data from it into the following array of objects.

I have produced tons of code, mostly comparing two arrays of objects. Unfortunately, I didn't come even closer to a solution.

The table to scrape data from:

<table>
  <tbody>
    <tr>
      <td colspan="3">Canada</td>
    </tr>
    <tr>
      <td>Toronto</td>
      <td>Montreal</td>
      <td>Vancouver</td>
    </tr>
    <tr>
      <td colspan="3">USA</td>
    </tr>
    <tr>
       <td>New York</td>
       <td>Chicago</td>
       <td>Boston</td>
    </tr>
    <tr>
       <td>Washington</td>
       <td>Detroit</td>
       <td>Los Angeles</td>
    </tr>
  </tbody>
</table>

Expected outcome to be like so:

 [
 {"country":"Canada","city":"Toronto"},
 {"country":"Canada","city":"Montreal"},
 {"country":"Canada","city":"Vancouver"},
 {"country":"USA","city":"New York"},
 {"country":"USA","city":"Chicago"},
 {"country":"USA","city":"Boston"},
 {"country":"USA","city":"Washington"},
 {"country":"USA","city":"Detroit"},
 {"country":"USA","city":"Los Angeles"}
 ]

The code is valid, unlike the approach:

let theResult = [];
    arrayOfCountriesAndCitiesObjects.forEach((item, iIndex) => {
        arrayOfCitiesObjects.forEach((elem, eIndex) => {
            if(item.city !== elem.city && item.iIndex < elem.eIndex) theResult.push(copy(elem, item)); 
        });
    });
    function copy(firstObj) {
      for (let i = 1; i < arguments.length; i++) {
        let arg = arguments[i];
        for (let key in arg) {
          firstObj[key] = arg[key];
        }
      }
      return firstObj;
    }
4
  • 1
    is it possible to change the markup for you? i.e. adding css classes to that table rows? Commented May 6, 2019 at 12:53
  • Absolutely, actually, I have simplified the table, the true one is full of css' classes around both countries and cities. Commented May 6, 2019 at 13:08
  • does @Nina Scholz's answer fit your needs? Otherwise i can provide one using some class-selector logic Commented May 6, 2019 at 13:13
  • @MaksymDudyk : If you happen to deal with much larger data and performance considerations do matter, you might want to check out my answer below as it gives you certain advantage in that regard, while for loop solution can be much faster on a small input table, though. Commented May 7, 2019 at 4:24

6 Answers 6

3

You could store the value of colSpan === 3 as country and push all other values as city to the result set.

This works with plain Javascript without any libraries.

var result = [],
    country = '';

document
    .querySelectorAll('table td')
    .forEach(td => {
        if (td.colSpan === 3) {
            country = td.innerHTML;
            return;
        }
        result.push({ country, city: td.innerHTML.trim() });
    });

console.log(result);
<table>
  <tbody>
    <tr>
      <td colspan="3">Canada</td>
    </tr>
    <tr>
      <td>Toronto</td>
      <td>Montreal</td>
      <td>Vancouver</td>
    </tr>
    <tr>
      <td colspan="3">USA</td>
    </tr>
    <tr>
       <td>New York</td>
       <td>Chicago</td>
       <td>Boston</td>
    </tr>
    <tr>
       <td>Washington</td>
       <td>Detroit</td>
       <td>Los Angeles</td>
    </tr>
  </tbody>
</table>

Sign up to request clarification or add additional context in comments.

3 Comments

Looks clean and elegant, Nina! I will try out your solution in real life. Thank you very much, so far.
Nina, let's suppose, I need to address each last 'td' to make an extra key/value pair record in object. Taking Eddie's code, I'd just add var lastCell = td[2].innerHTML inside the second loop. How would I do that in your code?
then you need a row and iterate the row. it would be easier, if you add the problem to the question, or better ask a new question, because this question is already answered as it is.
2

You can use for to loop thru each tr. Find the td on each tr, If there is only 1, store the text on currentCountry variable. If more than one, push the object to the result variable.

var currentCountry = "";
var result = [];

var tr = document.querySelectorAll('table tr');

for (var i = 0; i < tr.length; i++) {
  var td = tr[i].querySelectorAll('td');

  if (td.length === 1) currentCountry = td[0].innerHTML;
  else if (td.length > 1) {
    for (var a = 0; a < td.length; a++) {
      result.push({country: currentCountry,city: td[a].innerHTML});
    }
  }
}

console.log(result);
<table>
  <tbody>
    <tr>
      <td colspan="3">Canada</td>
    </tr>
    <tr>
      <td>Toronto</td>
      <td>Montreal</td>
      <td>Vancouver</td>
    </tr>
    <tr>
      <td colspan="3">USA</td>
    </tr>
    <tr>
      <td>New York</td>
      <td>Chicago</td>
      <td>Boston</td>
    </tr>
    <tr>
      <td>Washington</td>
      <td>Detroit</td>
      <td>Los Angeles</td>
    </tr>
  </tbody>
</table>

4 Comments

and what happens if there appears an entry with a country where only a single city is given? according to your logic this city would be identified as a country
OMG. I missed that there is no jQuery. I convert the jQuery code to vanilla. I guess we let OP decide if that scenario is possible.
I am trying to scrape data from the page overloaded with tags and css. Your solution seems to be very easy to readjust to my case. Thank you, Eddie.
Happy to hear about that :)
1

var country = null, result = [];
var tds = Array.from(document.querySelectorAll("#myTable tbody tr td"));
for (var i = 0; i < tds.length; i++) {
	let item = tds[i];
	if (item.getAttribute("colspan") == "3") {
		country = item.innerText;
		continue;
	}
	
	result.push({ country: country, city: item.innerText });
}
console.log(result);
<table id="myTable">
	<tbody>
		<tr>
			<td colspan="3">Canada</td>
		</tr>
		<tr>
			<td>Toronto</td>
			<td>Montreal</td>
			<td>Vancouver</td>
		</tr>
		<tr>
			<td colspan="3">USA</td>
		</tr>
		<tr>
			<td>New York</td>
			<td>Chicago</td>
			<td>Boston</td>
		</tr>
		<tr>
			<td>Washington</td>
			<td>Detroit</td>
			<td>Los Angeles</td>
		</tr>
	</tbody>
</table>

1 Comment

Thank you, Marcelo, I'll try out your solution.
1

Using reduce

 const items = document.querySelectorAll('table tbody td')

 const results = [...items].reduce((allItems, item)=>{
   if(item.getAttribute('colspan') === '3'){
     allItems['country'] = item.textContent
     return allItems
   }
   allItems.push({country: allItems['country'],city:item.textContent})
   return allItems
 },[])

2 Comments

Dennis, your script isn't working correctly, spitting out extra "country:USA"; The problem, I guess, is here: allItems['country'] = item.textContent
Oh yeah!!! I was lazy to create the country variable outside the reduce loop but @U25lYWt5IEJhc3RhcmQg answer below does that. I chose this because all loops skip named indices in the array and they are not included in the length computation of the array
1

You need to assign all <tr> which contain country names a special class. Then use querySelectorAll and use forEach loop.

const tr = document.querySelectorAll('tr');

const arr = []
let count = '';

tr.forEach(x => {
  if(x.classList.contains('head')){
    count = x.children[0].innerHTML
  }
  else{
    let child = [...x.querySelectorAll('td')]
    arr.push(...child.map(a => ({country:count,city:a.innerHTML})))
  }
})

console.log(arr)
<table>
  <tbody>
    <tr class="head">
      <td  colspan="3">Canada</td>
    </tr>
    <tr>
      <td>Toronto</td>
      <td>Montreal</td>
      <td>Vancouver</td>
    </tr>
    <tr class="head" >
      <td colspan="3">USA</td>
    </tr>
    <tr>
       <td>New York</td>
       <td>Chicago</td>
       <td>Boston</td>
    </tr>
    <tr>
       <td>Washington</td>
       <td>Detroit</td>
       <td>Los Angeles</td>
    </tr>
  </tbody>
</table>

2 Comments

@MaksymDudyk Fixed. This was because I was pushing result of map() with out ...
Maheer, your code has resulted in five container arrays, instead of one. But the idea with using classList.contains as a check-mark is cool!
1

Not that elegant, but to me slightly more comprehensive (while being the fastest for larger input data samples) reduce() solution:

const result = [...document.getElementsByTagName('td')].reduce((res, item) => (item.getAttribute('colspan') == 3 ? res.country = item.textContent : res.obj = [...(res.obj || []), {country: res.country, city: item.textContent}], res), {}).obj;

console.log(result);
<table>
  <tbody>
    <tr>
      <td colspan="3">Canada</td>
    </tr>
    <tr>
      <td>Toronto</td>
      <td>Montreal</td>
      <td>Vancouver</td>
    </tr>
    <tr>
      <td colspan="3">USA</td>
    </tr>
    <tr>
       <td>New York</td>
       <td>Chicago</td>
       <td>Boston</td>
    </tr>
    <tr>
       <td>Washington</td>
       <td>Detroit</td>
       <td>Los Angeles</td>
    </tr>
  </tbody>
</table>

1 Comment

Super! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.