Enum or TinyInt

Hi

When you create a user table with a column indicating whether the user is active or inactive or perhaps a payment table with the various payment status viz. “Payment received”, “Payment pending”, “payment cancelled”

what datatype do you prefer for the status column, do you keep it ENUM so that you can store the status as strings or do you keep the datatype as tinyint/int and store numbers representing each number as a status?

Thanks

For just two states, I always do a boolean (is_active could be true for active or false for inactive).

For multiple ones, it depends:

  • If I know beyond a shadow of a doubt that there will never ever ever be another option I may need to add, I’ll use ENUM and specify each one of my choices.
  • If I think I might one day need to add another option, I use an int (depending what exactly it is I may use a smaller or larger version of int) which is an ID for another table where I list all of them.

So, for your payment example, I’d have two tables like this:


= payments
- payment_id (int)
- payment_status_id (int)
- ... # rest of the values

= payment_statuses
- payment_status_id (int)
- name (varchar)

Dont you think that if we have a table to maintain the statuses, we would need to make muti-joins making the query slower?

Instead, why not define constants with their respective value?

for example:

define(‘PAYMENT_COMPLETED’,‘Payment Completed’);
define(‘PAYMENT_CANCELLED’,‘Payment Cancelled’);

and so on… and use them where ever required?

I am just trying to learn the best practice so please bear with me :slight_smile:

Thanks

You’re mixing up two different languages. =p

MySQL databases should be able to stand 100% on their own without the need for PHP intervention.

Yes, you will have to do some joins, but with properly set foreign keys (in this case, payments.payment_status_id = payment_statuses.payment_status_id) and indexes (if they’re set up as keys they’ll automatically be indexed) the time to do the join is virtually zero until you’re joining thousands or millions of rows in one query.

The other problem with constants is they are constant, which is something you don’t always want. Sometimes you need to be able to declare new things on the fly (in this scenario, probably not, but others you may). If the list is truly constant and would never change, I would use enum (or boolean). The two tables is only for scenarios where the possible states are not constant.

I got you.

Another question. Suppose you need to fetch all records from the payments table where the payment_status column is “Payment Completed”, how would you fire the query with following php code?

<?php 

$sql = "select * from payments WHERE payment_status=?????";

mysql_query($sql);

?>

Thanks

There are two ways. The first way would be to simply know what the ID of that specific payment status was (that would be the most reliable, since it wouldn’t change even if you changed the name):


<?php

$complete_id = 1;

$sql = "SELECT * FROM payments WHERE payment_status_id = '$complete_id'";
?>

The second would be to do a join looking for the name.


SELECT * FROM payments p
LEFT JOIN payment_statuses s
ON p.payment_status_id = s.payment_status_id
WHERE s.name = "Payment Completed"

The LEFT JOIN llinks the two tables together based ON rows that share that value and then gets them for those that have “Payment Completed” in the name field.

If you were me, which method would you prefer and why?

Many thanks for all your replies.

I can’t really make that call because I don’t know what you’re making. However, if you’re fairly confident that they will never change and you will never need to add another, I’d make it an enum.

If you think there is a possibility that they may change, I would use the two table approach. I would also create the initial values with PHP so I have control over the ID of the important one(s), and then store those where I can reference them without having to do a join based on name.

So does that mean its okay to do something like:

<?php

define(‘PAYMENT_COMPLETED’, 1);

$sql = “SELECT * FROM payments WHERE payment_status_id = '”. PAYMENT_COMPLETED .“'”;
?>

Yes, that would be acceptable. Just be sure to put the define in some config file so you can adjust it if necessary.

Sure thank u so much.

i think you are under the assumption that a table has to have a numeric id – this is not the case

i would still create a table of valid statuses –

CREATE TABLE payment_statuses
( payment_status VARCHAR(37) NOT NUYLL PRIMARY KEY
);
INSERT INTO payment_statuses 
VALUES ( 'pending' ),( ' completed' );

but then to update a payment, i would use the status like this –

UPDATE payments
   SET payment_status = 'completed'
 WHERE orderno = 937

this way, there is no extra join required to retrieve payments by status –

SELECT * FROM payments 
 WHERE payment_status = 'pending'

and yet the relational integrity of the foreign key is still enforced

In that case, how are you making use of the “payment_statuses” table?

as a reference table for the foreign key payment_status in the payments table, thus preventing an invalid status from being entered

Will this work on myisam based tables or just innodb?

just innodb

i keep forgetting that the crippled version of mysql is so darned popular

I usually use CHAR(1) for statuses so they are not completely meaningless numbers but still 1-byte in length. So for example “Payment received”, “Payment pending”, “payment cancelled” will be R, P and C respectively. Then in the column comment I store an explanation of what each letter means, e.g. “payment Pending; payment Received; payment Cancelled” so that I don’t get confused when viewing the table directly.

I know r937’s method is the most ‘proper’ one but I’m not keen on having so many separate tables for every possible status column as there can get many of them.