Show older

memtest completed and now I'm going full on stress testing: playing fullscreen video, 3D games, processing a bunch of shit in the background, some VMs. you know, the usual stuff I do on an average day

I probably shouldn't try to run the VM that has 16gb of RAM allocated to it, though

Well that lasted all of 30 minutes before the system hung.
Fuck. It's not just the RAM.

So, it's probably one of motherboard, cpu, or PSU. At a stretch, it could be the GPU.

I have another spare GPU I could swap in. I have a near-identical CPU that I could swap in (it's in use, but I can temporarily borrow it).
PSU and mobo are trickier.
So, I'll have to try the easy ones first. Swap the GPU and see if windows still hard crashes like that, then the cpu, then start working on the others.

If you'd like to help me get back online (and gay cats, of course), donations would help. I'm kinda broke and not having a working computer is not going to help.

ko-fi.com/fooneturing

okay today's first test: I yanked out my GPU and I'm running on just the internal GPU. I'm gonna load up some videos, VMs, 3D games, and a bunch of browser tabs. See if this falls over too

I am not getting a large number of frames, and I only have one game running at the moment.

somehow I got my youtube video playing over my actual speakers but one of the games playing out the HDMI and the little speakers on the monitor. that's weird.

I AM STRESS TESTING THIS MACHINE AT NOT A LOT OF FRAMES A SECOND

okay I've made it an hour running sans-GPU. That doesn't mean it's the GPU though. This machine is using way less power without the GPU... so it could still be a PSU related problem.

so after running fine for about 3 hours with no GPU, I've gone out and bought a new... power supply.

yeah I don't think it's the GPU. And a flakey PSU could easily fail with the GPU and not without, since the power usage is way lower without a GPU in there

okay new PSU is in. That took way longer than it should.
Apparently between the RM650x and the RM850x, Corsair redesigned their modular cables, so I couldn't just swap the PSU and reuse the cables. So now I have a cable management nightmare, but it's running. Let's put the stress-testing pants on

the worst part is that I forgot to double-check that the new PSU would come with the right cables to let me hook up my floppy drives. Thankfully, it did.

well my 3.5" floppy drive is working. That's a good test

changing my PSU seems to have confused Satisfactory into thinking I'm a different person and now I'm sitting on the floor of my own base. WHO ARE YOU?

sticking in a different GPU.
Swapping out my Asus Gegorce RTX3070 for an EVGA GeForce GT 1030

ran an hour and 30 minutes on the other GPU with no crashes.

god damn it, IS it my GPU?

taking a work meeting from the system under test
this is known as "living dangerously"

okay. on the new PSU, with old GPU back in, but in a different PCIe slot. Let's see how this goes

no crash in an hour with it in a different slot? weird.

19 hours in the different slot, no crashes. Very strange.
So, theories:
1. that slot was just bad/dirty. Possible, I guess? The other GPU worked fine in that slot, though.
2. The GPU might be running at 8x PCIe instead of 16x PCIe. Maybe that pushes it over some timing/temperature threshold and makes it not crash?

okay, GPU-z says it is indeed running at x8 3.0 speed, when it's capable of x16 4.0.

So, how much you wanna bet that if I fix that, the system will start crashing again?

So it turns out I can't get it to do 16x in the other slot. My motherboard has 4 16x slots, but it does them in sequential order: if the first one is full, it gets 16x. if the first and second are full, they get 16x. if the second one is full and the first isn't, the second gets 8x.

yeah.

So I swapped the card back to slot 1.
Interestingly, GPU-z says it's at 2.0 now instead of 3.0. Not sure why that is.

I'm gonna run my stress test with some performance logging on to see if it's overheating.
I did realize my card has a physical switch for "high fan" vs "quiet fan" and switched it to "high fan".
I can't imagine that'd be why it was crashing but maybe.

also while looking around my BIOS, I realized I could clock my ram faster. It's running at 2133mhz and could go up to 3200mhz, supposedly.

I didn't test that out for obvious reasons.

over an hour with the GPU back in the Crashy Slot and no crashes.

huh. Maybe it was temperature based?
My GPU isn't getting THAT hot, my fans aren't even maxing out.
GPU temp hit a max of 65C with a hot spot of 76C.
Those aren't out of range for a GPU under load, and they're not trending upward at all, it's stable.

or it's still just a memory corruption problem and it's just VERY random and I need to test for longer

it's now been nearly 5 hours with no crashes.
what the fuck?

(temps are about the same)

So it's now ran for 23 hours... no system-crash.

minecraft did crash, but it's modded minecraft. it might have just done that on its own

I also stuck the "bad" set of ram sticks into a spare computer and ran it on memtest.
16 hours, 9 full passes of memtest86+, zero errors.

@mos_8502 if it is, it's generating the wrong voltages in the motherboard, because I did replace the PSU

@foone I was thinking about the voltage settings in the BIOS/UEFI menu, but that's also possible. Bad regulators suck.

@mos_8502 so I've checked those and according to them, they're outputting the right voltages. the RAM is running at the 1.2v it expects for non-overclocked usage

Follow

@foone @mos_8502 maybe there's a sensitive timing issue going on. Maybe it's doing its cycles just a bit sooner than expected, or there's a bit of lag on the address/data lines. At that speed I can't imagine what could go imperceptibly wrong enough to cause trouble.

I suppose one way to experiment would be to manually set the latency timings to be a bit more generous than the SPD implies, but at that point it's already a sign of a larger fault looming if it started cropping up now...

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.