PS3:Syscon Diagnosis

Diagnosing any issue that prevents the PS3 from booting, or specifically, the Yellow Light of Death has been made easy with the aid of the PS3's System Controller chip. It stores error codes that the system may encounter on boot, or during normal operation.

Getting this information is one thing, but in order to do that, we need to get access to the syscon, and to do so, we're gonna use a serial connection and a linux terminal.

Prerequisites

 * A PS3 to open and diagnose
 * A linux computer with Python installed (Windows can be used, but this guide will not cover that)
 * A USB to TTL Serial Converter Cable
 * Some spare wire, specifically 30 AWG single core
 * A soldering iron, and the required experience with a soldering iron.
 * Various other electronic tools, a multimeter isn't a bad idea.
 * This python script

Identify your syscon type
You either have a Mullion or a Sherwood syscon, the differences are important for this guide.

PS3 Models from A - K, COK-001 to DIA-002 have a Mullion, all PS3s from the L models afterwards have a Sherwood.

Connecting the syscon to the serial connection
The points to solder to on the PS3's motherboard are different for every model. You will need to refer to this github to identify your board, and the solder points you will connect the serial converter to.

It is recommended to use wires with connectors so you can remove the serial converter once you are done, and so you don't have to unsolder your wires. This will not only make you do less work, but also make it easier to diagnose any future errors and issues.

In the end, you should have 4 wires, one for the RxD point on the board, one for the TxD point, Ground (GND), and Diag (Sherwood models do not have DIAG).

RxD should be connected to the TX pin on the USB serial converter. TxD to the RX pin, Ground to Ground, and leave DIAG unconnected for now.
 * The metal shield around the motherboard, or the copper edges can be used as Ground.

Your converter needs to be set to 3.3V mode, otherwise you could seriously damage the console.

Differences between Internal and External command mode
For Mullion SCs, you may find that there are two different modes of commands you can use.

External mode is easy and quick to access if you just need to see the errors, but doesn't have much else to do.

Internal mode is a lot better and has more commands to use, for the purposes of this guide, we'll be starting in external, but we'll quickly switch to internal.

Sherwood SCs use internal mode by default, and therefore, no DIAG pin will be needed.

Opening a serial connection to the syscon
Now that we've done the necessary solder work to connect the syscon to our serial converter, we can interface with it via a terminal.

Plug in your serial converter to your PC and open a terminal. You'll need to find out what filesystem path the converter exists on, usually it'll be /dev/ttyUSB0. You can use dmesg to find this out:


 * $ sudo dmesg | less (you need root privileges to use dmesg)
 * Because the dmesg command outputs the entire kernel log buffer, the output of this command is too big to put in this guide.

Once you know what file you need to use, we can use the python script:

Make sure you have the PS3's power supply connected to the board, and that it is plugged in, and if you have a PHAT, turn the switch in the back on.


 * $ sudo ./ps3_syscon_uart_script.py /dev/ttyUSB0 
 * You should replace "/dev/ttyUSB0" with the path that corresponds to your serial converter, in case that's not where it is.
 * CXR will tell the script to use external mode, for Mullion SCs, use this first, then once we enable Internal mode, we'll use CXRF.
 * Sherwood models just put in SW for that field.

Afterwards, you should be presented with a >$ prompt!

Mullion SCs
Type in AUTH or auth. It may be case sensitive, so try both if one doesn't work. You should get "Auth successful" once you do so.
 * If you get "Auth1 response invalid", your RX and TX wires may connected backwards, swap them. Otherwise just check your wiring.
 * Any time you change the connection, exit the script, switch off the PS3 and switch it back on, then rerun the script.

For repair purposes, we can use the command ERRLOG to retrieve the last 32 codes stored in the SC. Note that every time you mess with or run into a new error, it deletes the oldest error to make room for the new one. So be careful with how you proceed in error hunting.

To get the most recent error, type 'ERRLOG GET 00', you can replace 00 with any value in hexadecimal up to 1F in order to get the error history.

The error codes are followed by a timestamp, but they're in a weird format in external mode and aren't really that useful. So for that reason, we will now proceed to switch to internal mode.

Switching to Internal Mode
Remember, only Mullion ICs need to do this.

Log into the syscon using external mode like normal, then type in the command "EEP GET 3961 01". That should return "00000000 FF", if not, make sure you typed the command correctly.

Then, enter "EEP SET 3961 01 00", MAKE SURE YOU TYPE THIS CORRECTLY! We are changing very sensitive memory and could severely mess things up if we change the wrong bits.

Afterwards, type in that first command again, it should return all zeros.
 * Your console probably won't boot now, that's because the syscon's checksum is now invalid. We will fix this later!

Switch your console off and exit the script, now we can finally ground the DIAG pin and switch the PS3 back on. Remember, your PS3 probably won't boot, it will flash red repeatedly. This is normal, and we will fix this.

Rerun the python script, but change the CXR to a CXRF, as now we can use internal mode.

Re-Authenticate with the auth command like before, you should get "Auth successful". Otherwise, your wiring is wrong.

Now in order to fix the checksum, we're gonna have to write to more memory. Run "eepcsum" and look for any line that say "sum:" followed by some hexadecimal number, now note down the line right below it.

This output is purely just for example, your results may vary.
 * > eepcsum
 * Addr:0x000032fe should be 0x528c
 * Addr:0x000034fe should be 0x7115
 * sum:0x0100
 * Addr:0x000039fe should be 0x0038
 * Addr:0x00003dfe should be 0x00ff

The line we're looking for is right after that sum, "Addr:0x000039fe should be 0x0038" is where the mismatch is. If we enter "r 3900 100" you'll get a table of the syscon's memory starting at address 0x00003900, remember 8 bits is a byte, and each byte is 2 hexadecimal digits. The table that is outputted displays 16 bytes in each row, the first being from 0x3900 to 0x390F. Each row is dictated by the second to last digit in the address, so the row after the first one is from 0x3910 to 0x391F. In our example, we're looking for address 0x39FE, which currently displays 0x02 when it should be 0x38. Note that the byte-endianness will need to be swapped when we write the current byte with the correct one. Endian is simply just the order bytes are written. When we write the correct value to the EEPROM, it'll look like we're entering it backwards, but don't fret, this is correct.

We can set the correct byte with this command:
 * > w 39FE 38 00
 * Remember to use the values that correspond to YOUR eepcsum output. The ones we use are purely just for example and are not going to be the same for you.

Verify that write was successful by reading that memory again. 0x39fe should read 38 00. You can rerun eepcsum and verify that you have no more "sum:" outputs. If you do, do the steps again for each incorrect byte in the checksum, remembering to change the values for each result.


 * Note: There may be some cases where an eepcsum output has an expected value with 4 bytes, the first 2 both being FF. Just ignore this and write the last 2 bytes, remembering to swap the endianness.
 * Example:
 * > eepcsum
 * Addr:0x000032fe should be 0x1596
 * sum:0xf812
 * Addr:0x000034fe should be 0xffff3aee

We can now write the correct byte
 * > w 34FE ee 3a

You can turn the console off and on again, and it should boot now, unless there is some hardware error that we can now hunt down.

Sherwood SCs
When you authenticate with the syscon, use the lowercase auth command as it will let you use the internal commands. However now this guide will be the same for both Mullion and Sherwood SCs.

Getting Error Codes from the Syscon
In order to get a full look at what could be wrong with our system, we're going to run a couple commands before we start looking at the error logs.

This shows a total count of startups, shutdowns, and PWR on time. It'll give you a good idea just how old your console is.
 * > becount

This will try to start up the console and show a log of the power up sequence. If there is a hardware error preventing the system from booting, you'll see an error here which could help you figure out what's going on.
 * > bringup

Now finally, we can look at the error log.


 * > errlog

This output the 32 error codes that the syscon has stored, in the format of 4 bytes followed by another 4. The first are the actual error code, and the last are the timestamp. The timestamp is recorded in J2000 format, if you wish to convert to UNIX time, that's possible, but won't be covered. The byte order is in little-endian, meaning you need to read it from right to left (like manga).

The error codes are in their own format: A R ST C ERR. Each letter is it's own hexadecimal digit.
 * A = This is always "A"
 * R (Reserved) 0 - E is unknown.
 * F = Frequent error (like damaged TOKEN caps)
 * ST 00-7F = Step number of the boot sequence, if 80, then that means the error occurred when the system was fully booted.
 * 90 means that the error occurred while the console was powering off.
 * A0 means that the error occurred when you plugged the console in, because the console is very sophisticated, with features like being able to turn on the console with the controller, there is circuitry on the motherboard that is always powered when you plug the console in. This includes the Syscon, and it's clock, the Bluetooth and WiFi card, Thermal monitoring ICs, etc. If there are any issues with anything in that circuit (like earlier with the incorrect checksum on Mullion SCs), you'll get an A0 step number.
 * C = Category. Ranging from 1 for System Error, 2 for Fatal Error, 3 for Booting Error, 4 for Data Error.
 * ERR = The specific error code. For example, System Error 002 (1002) means that the RSX's (GPU) VRMs Failed.

The 3 digit error code could repeat in other categories, but don't mean the same thing. System Error 001 and Fatal Error 001 don't mean the same thing.

Because of the complexity of the PS3's hardware, you should be very familiar with it when proceeding to actually fix your console. The errors provided by the syscon aren't going to tell you exactly what's wrong, rather it should point you in the right direction. Be careful of red herrings, some errors may not correspond to your issue, or may just be symptoms instead of causes.

Error Codes
The list of Syscon error codes is far too big to put on this wiki, check this link for a comprehensive list of error codes and what they're (usually) caused by.

Some examples however:


 * 14FF (Check Stop):


 * This is similar to a Machine Check Exception on x86 processors (the PS3 uses a PowerPC chip) where the CPU, GPU, or some other hardware found something in an "impossible" state (Impossible unless there is a hardware fault). This is typically caused by a broken solder joint underneath the GPU, CPU, or some other hardware. It's also likely that the hardware causing the error could be faulty. This error is kinda generic and is usually logged with something more specific.


 * 1701 (CELL BE ATTENTION)


 * BE ATTENTION is similar to the NMI line sent out by x86 CPUs, in the PS3's CELL, BE ATTENTION is driven high by the CPU used to request an operation from the Syscon. The Syscon will read a status register to determine why the attention signal was sent. It remains high until the software resets the condition that caused it.


 * BE ATTENTION shouldn't be driven high after the system is booted, if it is, it's an immediate notice to the Syscon from the CPU that a serious error has occurred and the Syscon will react accordingly by immediately shutting the system off. It will then log the attention signal, and any error that caused it to the error log.


 * Things like a damaged RSX, CELL, or other component could cause this, but it's been reported by at least one user that this error was caused by a damaged Hard Drive. So this error is regarded as generic.


 * 1001 (Power CELL)


 * This error is caused by insufficient voltage filtering on the CPU's VDDC line, which supplies the core voltage the CPU needs to power on. There is a large amount of voltage noise on that line that needs to be filtered out so the CPU has a stable, smooth voltage going to it. There are numerous components involved in the filtering of the CPU's VCore, but your first guess should be the NEC/TOKIN Capacitors near the CPU, as those are starting to go bad. Otherwise, this error could simply mean the system didn't shut off properly.


 * 1200 (Thermal CELL)


 * The CPU is overheating. First try replacing the thermal compound that sits between the CPU and the heatsink, as that stuff from the factory is probably dried up and garbage now. Use something good like Arctic Silver or IC Diamond, as those are good thermal compounds. If that doesn't work, make sure the fan is working, and replace it if not. Otherwise, it could be a damaged thermal monitor chip (IC1101) or a dying CELL.