28

I wanted to query for answers where the user account has been deleted, but the answer is still up. According to this link (https://stackoverflow.com/help/deleting-account), deleting your account will make all content associated to you (like your answers) show up as an anonymous user.

I thought it would be possible to query for surviving answers with deleted users in the PostsWithDeleted table.

I found a question whose submitter had their account deleted (How do BitTorrent magnet links work?). However, when I tried to query for this question on PostsWithDeleted, shockingly enough, the question didn't even show up.

So, instead, I tried to query for the accepted answer, which was by an account that was still alive. The query returned a result for the answer.

So now, I am super confused. I can query for answers to questions, but I can't query those questions themselves? But I can access the question link? That doesn't make sense. If the question was outright deleted, I would understand that, but in this case, the question is up and available on the site, but I can't query for it via PostsWithDeleted.

Further testing has confirmed that the same thing is happening to answers where the answer itself is still up and alive, but the answerer's account has since been deleted, for whatever reason.

3
  • @Jeremy Ty vm. I am about to sleep now, but I will try to get a bug post made tomorrow or the day after, unless someone else beats me to it. Commented Jan 7 at 6:50
  • 2
    Since this post has already been tagged for moderator review, I've edited it a bit to re-tag as a bug report. You may want to un-mark my answer as accepted, and save that for a staff response later. (Or you can roll-back by edit if you'd prefer to leave this as a support question and create a different bug post instead.) Commented Jan 7 at 17:00
  • @Jeremy Thanks for the time and effort. I have unaccepted your answer. Commented Jan 7 at 18:50

3 Answers 3

12

We had already investigated and fixed back in January (fixes were deployed between January 7th and 13th). While the actual fix took less than a week from reporting, we missed giving a public response, so here I am.

Background information

When a user is removed, for example, due to deletion or a DSR (Data subject request), we remove the row from the Users table and set the UserId values in related tables to one of the following, depending on whether NULLs are allowed:

  • If not allowed, we set it to -1 (which is the community user/bot).
  • If allowed, we set to NULL (like in Posts, for example).

The problem

When updating how we filter out content, we incorrectly set those filters and missed the fact that Posts could have a NULL OwnerUserId. Because of the way SQL Server handles NULLs, we also filtered out NULL values. So, in the next SEDE refresh, we inadvertently filtered out 1.62% of the non-deleted posts (~1MM) since they had a NULL value for OwnerUserId. The first report of this problem we had was this post, so thank you for that.

The data dumps

Because the Data Dumps are generated based on the SEDE refreshes, this also affected the data dump at the time (December 2024). We’ve double-checked and no other data dumps were affected by this issue. There have been some questions about Post 34 and the Q3 data dump, and the post is there:

A screenshot of windows powershell showing that postId 34 was found in a 2024-10-03 data dump

I don’t know what led to this being considered missing, but as far as we can tell, Post 34 is there, and no other data dumps have been affected by this error. In fact, it’s kind of impossible for other data dumps to be affected by this error since nothing was changed in filtering userIDs other than this change we made in December 2024.

We’ve since fixed the mistake. The missing posts have been available on SEDE since January and will be present in the next data dumps.

1
  • 1
    Confirmed this appears to be fixed in the March 2025 data dumps. Commented Apr 10 at 17:47
23
+1000

Update: as pointed out by an anonymous suggestion, this appears to have been fixed in the Data Explorer. However, there hasn't been a fixed re-release of the Data Dump yet.


This seems like a serious bug, causing a lot of expected data to be missing. The process for the Data Explorer and the Data Dumps is similar, and it looks like these posts are missing from both.

To validate your report, here's a query that can find the answers but not the missing question.

How many posts are affected?

The undocumented deactivatedusers search filter on the site can find us many other questions with deleted users. It says there that there are "525,715 questions from deactivated users" on Stack Overflow, before we even count answers or other sites on the network.

Update: It was difficult to confidently determine the full impact while the data was missing, but now that this is apparently fixed we can query the Data Explorer for the number of unowned posts and answers to unowned posts, which should pretty closely reflect what the impact was.

  • Stack Overflow:
    • 525,640 unowned questions of 24,235,690 total (2.2%)
    • 450,467 unowned answers of 35,952,729 total (1.3%)
    • 854,499 owned answers to unowned questions of 35,952,729 total (2.4%)
    • 1,830,606 affected questions and answers of 60,188,419 total (3.0%)
  • Mathematics:
    • 79,878 unowned questions of 1,681,206 total (4.8%)
    • 88,808 unowned answers of 2,193,826 total (4.0%)
    • 109,134 owned answers to unowned questions of 2,193,826 total (5.0%)
    • 277,820 affected questions and answers of 3,875,032 total (7.2%)
  • Super User:
    • 16,631 unowned questions of 512,140 total (3.2%)
    • 22,320 unowned answers of 742,592 total (3.0%)
    • 28,641 owned answers to unowned questions of 742,592 total (3.6%)
    • 67,592 affected questions and answers of 1,254,732 total (5.4%)
  • Meta Stack Exchange
    • 4,824 unowned questions of 100,140 total (4.8%)
    • 7,411 unowned answers of 151,456 total (4.9%)
    • 8,753 owned answers to unowned questions of 151,456 total (5.8%)
    • 20,988 affected questions and answers of 251,596 total (8.3%)
  • Meta Stack Overflow:
    • 1,734 unowned questions of 50,466 total (3.4%)
    • 2,265 unowned answers of 66,483 total (3.4%)
    • 3,140 owned answers to unowned questions of 66,483 total (4.7%)
    • 7,139 affected questions and answers of 116,949 total (6.1%)

Answers to unowned questions are considered affected because, although they weren't missing, an answer on its own loses most of its value without the context of its question. This doesn't consider other post types or comments because those are less significant and the Data Explorer was giving me timeouts when I tried. This doesn't consider deleted questions (the PostsWithDeleted table) because I don't think the necessary information is available.

For anyone who saw the estimates I previously made while the bug was still present: my approach there was flawed, and although the results were in the right ballpark in some cases, they were very far off for others.

What tables are affected?

Here's a query that identifies how many rows using a given post ID are present in each table. It suggests that these are missing from Posts, PostsWithDeleted, and PostHistory but the associated rows are still present in PostFeedback, PostLinks, PostNotices, PostTags, Votes, SuggestedEdits, and ReviewTasks (there are a few other less-frequently-used tables I'm unsure of). Comments seems to be inconsistent - they're present for an answer I looked at, but not a question. Here are some other posts from deleted users for testing:

How long has this been happening?

Because the Stack Overflow data dumps are very large and slow to work with, I'm checking for the presence of questions 5138 and 21000 on 3D Printing (a small site) in recent Data Dump releases to see where this started happening. The post is not present in the latest (December 2024) release, but it is present in every other recent release I've checked, even though the user is not, suggesting that this is a new/recent change.

# December 2024
$ curl -LSs https://archive.org/download/stackexchange_20241231/stackexchange_20241231/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'


# September 2024
$ curl -LSs https://archive.org/download/stackexchange_20240930/stackexchange_20240930/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
8333:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="404" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="&lt;3d-design&gt;&lt;abs&gt;&lt;quality&gt;&lt;mechanics&gt;" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
9685:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="344" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.sstatic.net/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.sstatic.net/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-06-23T21:00:45.183" Title="Vertical repeating pattern" Tags="&lt;print-quality&gt;" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

# June 2024 (fixed)
$ curl -LsS https://archive.org/download/stackexchange_20240630_revised/stackexchange_20240630_revised/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
8245:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="404" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="&lt;3d-design&gt;&lt;abs&gt;&lt;quality&gt;&lt;mechanics&gt;" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
9573:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="319" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.sstatic.net/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.sstatic.net/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-06-23T21:00:45.183" Title="Vertical repeating pattern" Tags="&lt;print-quality&gt;" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

# June 2024 (original)
$ curl -LsS https://archive.org/download/stackexchange_20240630/stackexchange_20240630/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
3587:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="403" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="&lt;3d-design&gt;&lt;abs&gt;&lt;quality&gt;&lt;mechanics&gt;" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
14380:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="304" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.sstatic.net/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.sstatic.net/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-06-23T21:00:45.183" Title="Vertical repeating pattern" Tags="&lt;print-quality&gt;" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

# April 2024 (fixed)
$ curl -LsS https://archive.org/download/stackexchange_20240402_bis/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
3580:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="394" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="|3d-design|abs|quality|mechanics|" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
14386:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="224" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-02-24T20:07:10.807" Title="Vertical repeating pattern" Tags="|print-quality|" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

# April 2024 (original)
$ curl -LsS https://archive.org/download/stackexchange_20240402/3dprinting.stackexchange.com.7z/Posts.xml |
    iconv -f utf-16 -t utf-8 | egrep -n '<row Id="(5138|21000)"'
4142:<row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="394" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#x0A;&#x0A;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#x0A;&#x0A;&lt;ul&gt;&#x0A;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#x0A;&lt;/ul&gt;&#x0A;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="|3d-design|abs|quality|mechanics|" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0"/>
17010:<row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="219" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#x0A;&lt;p&gt;&lt;a href=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#x0A;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-02-24T20:07:10.807" Title="Vertical repeating pattern" Tags="|print-quality|" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0"/>

# March 2024
$ curl -LsS https://archive.org/download/stackexchange_20240305/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
3580:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="388" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="&lt;3d-design&gt;&lt;abs&gt;&lt;quality&gt;&lt;mechanics&gt;" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
14399:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="0" ViewCount="197" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerDisplayName="user31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2024-02-24T20:07:10.807" Title="Vertical repeating pattern" Tags="&lt;print-quality&gt;" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

# December 2024
$ curl -LsS https://archive.org/download/stackexchange_20231208/3dprinting.stackexchange.com.7z/Posts.xml |
    egrep -n '<row Id="(5138|21000)"'
3580:  <row Id="5138" PostTypeId="1" CreationDate="2017-12-16T15:41:51.290" Score="7" ViewCount="375" Body="&lt;p&gt;Are 3D printed gears applicable for industrial use? &lt;/p&gt;&#xA;&#xA;&lt;p&gt;I want to print some gears with ABS. &lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What will their lifespan be? How long will they last if I use them, for example, every day? &lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerDisplayName="user9350" LastEditorUserId="4762" LastEditDate="2018-08-08T00:10:26.543" LastActivityDate="2018-08-08T00:10:26.543" Title="Are 3D printed gears applicable for industrial use?" Tags="&lt;3d-design&gt;&lt;abs&gt;&lt;quality&gt;&lt;mechanics&gt;" AnswerCount="2" CommentCount="8" ContentLicense="CC BY-SA 4.0" />
14428:  <row Id="21000" PostTypeId="1" CreationDate="2023-05-30T11:04:53.463" Score="1" ViewCount="91" Body="&lt;p&gt;I've got these strange vertical grooves on my print. They are, at some locations well feelable and at some locations they are completely gone. The printer is an Ender 5 Plus or based on it at least. I use a MicroSwiss Dual Gear Bowden extruder instead of the stock one. Are they from the extruder or from the belts? Or something completely different?&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; rel=&quot;nofollow noreferrer&quot;&gt;&lt;img src=&quot;https://i.stack.imgur.com/2zXcP.jpg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;&#xA;" OwnerUserId="31236" LastEditorUserId="5740" LastEditDate="2023-05-30T17:51:02.050" LastActivityDate="2023-10-27T19:05:30.643" Title="Vertical repeating pattern" Tags="&lt;print-quality&gt;" AnswerCount="1" CommentCount="1" ContentLicense="CC BY-SA 4.0" />

However, in the comments below Zoe indicates that they've found Stack Overflow posts that are missing in these same releases, so maybe it was already happening but to a more limited extent. (Update: this appears to have been a mix-up; previous data dumps were not affected.)

40
  • 5
    Rhetorical question: is it a bug, or is it just another way that SE, inc. is trying to cut-off our data dump rights? Good answer, by the way. Commented Jan 7 at 6:41
  • 2
    @security_paranoid normally I'd say you are living up to your name, but I've seen other companies do worse for less, then hide behind their priority schedule when people call it out. I'll hold no assumptions for now, and just focus on raising the issue. Commented Jan 7 at 6:44
  • 2
    This affects the December 2024 data dump as well. Got a copy pending for the September data dump to verify if it's there too, but due to Life:tm:, it'll take a couple hours to verify if it's a problem there too. Commented Jan 7 at 8:31
  • 7
    Never mind the couple hours, the copy completed ahead of time. The September 2024 data dump is partly bad as well; Id="34" is missing, but Id="3844502" is not. Commented Jan 7 at 8:47
  • 5
    @davidalayachew that, or more simply a case of "仏の顔も三度まで" - "Buddha's face up to three times". First time you assume good faith, second time you assume couldn't-care-less attitude, third time you are left with malice. AKA don't be surprised by the userbase paranoia, that was carefully cultivated from years since Monica's incident. Commented Jan 7 at 10:14
  • 6
    June 2024 (revised) is also bad, same problem as September 2024. June 2024 (original) lacks Id=34 and ParentId=34, and lacks Id=3844502 but has ParentId=3844502. All data dumps following the refactor contain bad data Commented Jan 7 at 12:15
  • 12
    fwiw this doesn’t look like malice to me: it’s easy to see how this kind of bug could have created by accident. But it’s certainly reflective of the lack of resourcing and QA that has been put towards community things. Given the company’s overt malevolence in the past I can’t blame folks for assuming the worst, but I think the individuals who’ve been working on the data dump and SEDE have good intentions, but aren’t well-supported. Commented Jan 7 at 14:16
  • 7
    Given it's present in all four data dumps since the refactor, odds are good it's accidental. That said, I don't believe any of their intentions are good; this particular bug does not appear to have been intentional, but a system that worked and that was improved rather dramatically back in April was ripped out for no other reason than redoing everything and moving hosting inhouse. The first round of improvements in april were in good faith (and aside one bad dump using UTF-16 or whatever, problem-free), and did lead to measurable improvements, but (1/2) Commented Jan 7 at 15:43
  • 5
    the whole of the data dump restrictions and related "improvements" for the june data dump are an outright insult to the community. Another mod has spent the past 6 months trying to request the entire data dump, and not gotten anything in spite of SE's pinkie promise that they'll offer the full data dump on request, aside weekly (or however frequent the pings were) promises that it'll surely be Soon:tm:. They ripped out an improved system and made it objectively worse on all levels just so they have taken action against AI scrapers on paper, while those restrictions do nothing in practice Commented Jan 7 at 15:45
  • 4
    Back to data dump testing news, PostHistory has the same problem. PostHistory for the December 2024 data dump lacks PostId=34 and PostId=3844502, meaning there's no path to recovery using just the data available. Commented Jan 7 at 16:10
  • 6
    Last update: Comments.xml has also been confirmed affected. Paradoxically, Votes.xml is not affected, and posts appear there just fine. The bug has also been reproduced on a network site, though the weird behaviour around SO/3844502 remains unclear. This means all the files containing actual content (posts, posthistory, and comments) are broken by this bug. Commented Jan 7 at 16:49
  • 3
    @ꓢPArcheon Thanks for the context. I am ignorant about the Monica incident, but I did hear about community staff members being treated poorly, as well as Codidact being created. I've kind of drifted away from this site as a result, and only really stick around to help people/myself, as opposed to trying to support this site and help it grow, like I used to. I'll probably do more, purely because this site has almost become a public service for the programming community, so its maintennce benefits literal millions, but it sucks that its spear-headed by people willing to do rotten things. Commented Jan 7 at 19:06
  • 1
    @davidalayachew my pleasure. All I wanted to get thru is that despite what others sometime seem to think, most frustrated comments don't come from user that get some weird sort of pleasure by being "antagonistic" or just plain evil to seed discord. They are just.. frustrated, their faith tired out by mishap after mishap, empty promises of fast resolutions or more often... no reply altogether. Please, just bear with them, I know it may be noisy but being noisy sometime is all they have left. I don't know if this is a deliberate move... but I can't bring me to blame them for thinking so. Commented Jan 8 at 9:03
  • 2
    @davidalayachew The problematic major changes they're referring to were announced here. The new system introduced lots of bugs, some of which were addressed here. The last release before the new system also introduced some bugs, some of which are discussed here, but those resulted from an effort to make data dumps available more quickly, so folks were relatively understanding, while they're less understanding of the new bugs introduced while restricting the data. Commented Jan 8 at 17:26
  • 3
    @MetaAndrewT. Considering the large number of users involved, and the fact a good chunk of the tests ran involve users that have been deleted for several years, that's exceedingly unlikely. It's certainly possible, but I strongly doubt it applies to this many users, and especially users who self-deleted before GDPR was even a thing. Many users coming back years later to delete their questions fully is too much of a coincidence. Also, I would imagine GDPR removals to also apply to main, not just the data dump. Applying it just to the data dump but not the site proper is just a weird decision Commented Jan 9 at 22:40
6

PostHistory table also seems to miss MANY records for deleted users.

According to the schema wiki, the convention for PostHistory is that UserId is populated for extant users. For a deleted user, UserId is NULL and UserDisplayName is set like "user1234567". This query on math.SE counts the number of "Initial Title"-type records where one or both of the fields are NULL:

uid_null udn_null neither_null both_null 
-------- -------- ------------ --------- 
100      1000511  887          0         

I compare this with my DB, which I built from a couple of Dumps, most recent being Sep/Oct 2024:

 uid_null | udn_null | neither_null | both_null 
----------+----------+--------------+-----------
    50733 |  1001649 |          888 |         5

Even with allowances for data discrepancy, only 100 deleted users looks very strange. They become 174 users, if one extends the range, query.

Here is an example of a post, which is not deleted but has no PostHistory records: post, query

Update (2025-01-23). Seems to be fixed as of now. SEDE query output:

uid_null udn_null neither_null both_null 
-------- -------- ------------ --------- 
63197    1184380  888          5         
3
  • 1
    Thanks. I hope to hear back from the StackExchange team soon. This seems to be a massive problem, so I hope this gets triaged soon and given top priority. Commented Jan 14 at 3:32
  • 2
    @davidalayachew This is a bugfix, not a hypable AI feature release. This was never going to get top priority. Getting any priority above 0 alone would be a miracle. We're 1.5 weeks in, and it hasn't as much as been acknowledged as seen by staff. Best-case scenario, it'll be fixed in time for the next data dump. Realistically, it'll probably take longer Commented Jan 18 at 18:14
  • 3
    @Zoe-Savethedatadump I was pleasantly surprised by the index add from last month. They got on that very quickly. Commented Jan 19 at 0:55

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.