Edit avi metadata with PHP

Video: See the code in action

Avi file structure

Output from following tools were used in screen shots: abcAVI tag editor, Free HEX editor Neo. Let’s start.

Avi is a binary file type with exact structure. In this article I’m not going to write exactly about how Avi works. For info metatags editing it is enough to know the structure. Avi file is divided to several containers. There are basically three types of headers – the top one is RIFF container, then one that is called LIST and last type is called CHUNK.

RIFF AVI container

RIFF container is the main container in which complete video is stored. Its header is very simple. It consists of 12 bytes where first 4 bytes is word “RIFF”, then goes other 4 bytes integer which represents how large (in bytes) the RIFF container is and last four bytes is word “AVI ” (including that space in end). The 4 bytes integer does not include itself and previous RIFF word so basically we can say that it is size of the video in bytes minus 8 bytes.

For our PHP script is important only 4 bytes integer value (because the size of RIFF will vary while editing metatags.

This header is mandatory and must be the first 12 bytes of avi file.

wiki edit avi metadata with php
wiki edit avi metadata with php

LIST hdrl container

Headers of LIST containers have similar structure as the main RIFF container. That means – 12 bytes, first 4 bytes is the word “LIST”, then goes 4 bytes integer which represents the size (in bytes) of the LIST (again, these first 8 bytes are not counted in the LIST size) and last 4 bytes represents the name of the list – in this case it will be “hdrl” (in lower case).

This container is not important for this PHP script so I did not studied the contents very closely. It contains some important tags which describes the formant of video and audio streams. PHP script will copy this whole container and don’t change anything here. However it seems that some header index info is important for AVI2 structure, but I did not completed the PHP avi tag editing for AVI2 structure yet. Maybe one day I will complete that and the info here 🙂

This container is mandatory and must follow the initial RIFF header. Its position is then 13th-24th byte in the avi file.

wiki edit avi metadata with php
wiki edit avi metadata with php

LIST movi container

movi container is of type LIST. That means header has 12 bytes – first 4 bytes is word “LIST”, then 4 bytes integer representing size of this LIST and last 4 bytes containing name of the list – in this case word “movi” (in lower case).

This container includes video, audio and subtitles data and again it is not important for our PHP script. We are not going to change it, just copy it all.

movi container is again mandatory but it does not matter where it stands. It might be located anywhere in the avi file (respecting mandatory positions of other containers).

wiki edit avi metadata with php
wiki edit avi metadata with php

CHUNK idx1 container

Here we have first CHUNK container. It has slightly different structure then LIST containers. It consists only from 8 bytes – first 4 bytes is the name of the CHUNK (in this case it will be word “idx1” in lower case) and they are followed by another 4 bytes representing the size of the CHUNK container in bytes (again not including the 4 bytes with name and these 4 bytes with size).

This container includes video, audio and subtitle index and it is not important for our PHP script. Again, we are not going to change it and we will only copy it completely.

idx1 container is mandatory and must be placed just after the movi container.

wiki edit avi metadata with php
wiki edit avi metadata with php

LIST INFO container

INFO container is of type LIST – its size is 12 bytes. First 4 bytes is word “LIST”, then 4 bytes with info size and last 4 bytes is word “INFO” (in upper case).

This container is the important one for editing metadata. It contains CHUNKs with 4 byte representation (fourCC) of metadata tags and their values. Each of the tags has its own 4 byte abbreviation, for example “Title” tag is represented as “INAM”. You can see complete mapping in this table (original article is from abcavi.kibi.ru). Each tag is of CHUNK type. That means its header consists of 8 bytes – first 4 bytes contains the mentioned tag abbreviation (for example INAM) and last 4 bytes is a length of the CHUNK (not including the initial 8 bytes of the CHUNK). There is one very important thing we need to count with. The size of the CHUNK must always be an even number. So in case we have for example a title with odd number of characters (like “zavisko”) then we need to add a NULL byte to it to make it even. This NULL byte can be added either to beginning or end. Avi players will not read this byte and will show just the original title. I found this bug (or feature) mainly on windows machines and Windows Media Player. Might be that some other players or readers can read also tags with odd count of characters.

INFO container is optional, also it might be located anywhere in the avi file while respecting mandatory positions of other containers. BUT I found a bug (or feature?) for some linux tools, like libextractor or ffmpeg – they wont read the INFO tag when it is not located just after the “hdrl” container. Other tools are reading the tags just well – no matter where they are located – like tool called mediainfo. I have opened amessage to ffmpeg developers team to take a look at this and possibly to enhance in some future release.

wiki edit avi metadata with php
wiki edit avi metadata with php

LIST JUNK container

JUNK container is of type LIST so its header has 12 bytes (check other LISTs for more info).

This container is not important at all and it is being completely ignored. You can include here anything you want – it will not be displayed. It is just a garbage. Some use this container to align the container positions in file or to insert some hidden data (like this AVI was ripped with XYZ software or so). We might also consider of using this container to save a space for INFO tags. Check part below for code logic.

JUNK container can be placed anywhere in the file and it also don’t have to respect other mandatory positions. It can be also placed inside of any other of mentioned containers.

wiki edit avi metadata with php
wiki edit avi metadata with php

Some other containers

There exists also some other containers, but I did not found them used anywhere except the abcAVI tag editor. First of them is LIST MID (“MID ” – with additional space). It has the same structure and usage as INFO LIST. Basically we can say that it is exactly the same just with different name and different tags used. Number of tags is much smaller so it is far less functional as original RIFF INFO or Extended INFO tags. Second one is IDIVX. This container is completely different, it has fixed size structure (each tag has its own fixed size) and also it has less number of tags available. Check this link for some more information.

As stated, these containers are optional. I did not found (I have not tried to search 🙂 any statement about their positioning inside avi file. When file is edited with abcAVI tag editor these containers are put in the end of the file (first MID container and then IDIVX).

wiki edit avi metadata with php
wiki edit avi metadata with php

PHP code

Code logic

First of all the most important part. I did not found anywhere the information about how to insert bytes inside (in the beginning or in the middle) of the binary file. Every tool I found was only able to to change the bytes in place or insert new one in the end. This is quite a big problem when you want to change or add the metadata. If you know any good way how to insert bytes into the file please let me know (drop a comment here or send me a mail ;).

Because of this small issue I decided to go this way: I will put the INFO container in the end of the file and with each edit the file will be truncated of the old INFO container and new one will be inserted. At first sight this seemed to be the best way and quite effective, but it has one small issue and that is the fact that not all tools can read this container when it is not located in the beginning of the file – just after the “hdrl” container (like ffmpeg or libextractor). I was so unlucky that I have completely rebuild my whole video library this way just to find out that ffmpeg is not going to read my tags :(. Otherwise it is working very well. The cons are that if the INFO tag exists you have no other choice but just completely rebuild whole file without the INFO container (see below).

Next possible logic would be to completely rebuild the whole avi file. At first it seems that it is a waist of resources, because when you want to edit the tag you have to copy whole file over. That means copy 1 or 2GB of data just to change 1kB of them. But when I am thinking about it now it seems not that bad. First of all you will not edit the tags over and over and most of times you will edit the tag just once. Second, this way you wont face mentioned issues with some tools that does not read metadata when they are not in desired position. The con in this case will be that when you want to edit the file later you have to copy whole file again.

Other possibility would be to adapt the previous this way: You will rebuild completely the file but you will add the NULL bytes to every tag to make something like maximum length. For example when adding INAM (Title) tag you will make its size 50 and when you want to call it “zavisko” you will just insert 43 NULL bytes after it. Later when you decide to change the title to something longer you can do it easily (PHP is able to change the bytes in place). The con in this case is only the fact that each file will have some junk (overhead) data. Considering all of the abovethis logic seems to be the best, except the one with which I would be able to insert new bytes into the file directly :).

Read the file structure

For reading the AVI file structure I have created two functions. First one (fAviStructure) is the main one which reads the top containers and then calls second function (fRiffStructure) which will read the containers data.

First – TOP – function is called fAviStructure and it has only one input parameter which is a complete path to the file which will be read. Function looks like this:

function fAviStructure ($vFile) {
$vReturn = array();
$vPosition = 0;
$vHandleR = fopen($vFile, "rb");

while ($vPosition < filesize($vFile)) {
$vList = array();
fseek($vHandleR, $vPosition);
$vList['tag'] = fread($vHandleR, 4);
$vList['start'] = $vPosition;
$vListSize = unpack("l", fread($vHandleR, 4));
$vList['size'] = $vListSize[1];
$vList['name'] = fread($vHandleR, 4);
$vList['value'] = fRiffStructure($vFile, $vList['start'], $vList['size']);
array_push($vReturn, $vList);
$vPosition = $vPosition + 8 + $vListSize[1];
}
fclose($vHandleR);
return $vReturn;
}

This function will return an array with five parameters – tag, start, size, name, value. I will not write the details here, read first part of this article to get detailed information.

  • parameter “tag”: This parameter is a string containing top container tag, that is first 4 bytes of container. Usual AVI file will contain only one element and this parameter value will be “RIFF”. AVI2 file will have more elements all with value “RIFF”.
  • parameter “start”: This parameter is a integer containing starting position of the container, that is the byte where the tag starts. For the first element this will be equal to “0”.
  • parameter “size”: This parameter is a integer containing size of the container in bytes, that is second 4 bytes group of container.
  • parameter “name”: This parameter is a string containing the name of the container, that is the third 4 bytes group of container. For the AVI file this will be equal to “AVI “. AVI2 file will have more elements and might have this value equal to “AVIX”.
  • parameter “value”: This parameter is an array containing the inside containers of RIFF tag. Here will be stored similar info as above about all containers contained in RIFF like hdrl, INFO, movi, idx1, …

Second – INSIDE – function is called fRiffStructure and it has three input parameters. First is a complete path to the file which will be read. Second is a starting position where this function starts reading. Third is a length – how many bytes should the function read. It looks like this:

function fRiffStructure ($vFile, $vStart, $vSize) {
$vReturn = array();
$vPosition = $vStart + 12;
$vHandleR = fopen($vFile, "rb");

while ($vPosition < ($vStart + $vSize)) {
$vList = array();
fseek($vHandleR, $vPosition);
$vTag = fread($vHandleR, 4);
if ($vTag == "LIST") {
$vList['tag'] = $vTag;
$vList['start'] = $vPosition;
$vListSize = unpack("l", fread($vHandleR, 4));
$vList['size'] = $vListSize[1];
$vList['name'] = fread($vHandleR, 4);
} else {
$vList['tag'] = "";
$vList['start'] = $vPosition;
$vListSize = unpack("l", fread($vHandleR, 4));
$vList['size'] = $vListSize[1];
$vList['name'] = $vTag;
}
$vPosition = $vPosition + 8 + $vListSize[1];
array_push($vReturn, $vList);
}
fclose($vHandleR);
return $vReturn;
}

This function will return an array with four parameters – tag, start, size, name. Check previous top function (fAviStructure) for more details on these parameters.

Both functions have the same logic. There is a loop which reads the tag, size and name. Based on current position and size it calculates where the next container should be located and sets this info to position variable and runs next loop until the maximum number of bytes has been reached. In the end it returns the array with all collected data. fRiffStructure includes also an IF statement which differentiates between LIST and CHUNK types of container because they need to be handled differently.

Check if file has the correct (desired) structure

This will be the help function and it will be used to check if file has correct structure – means it is editable by other our functions. I called itfAviStructOK and it has only one input parameter which is a complete path to the file which will be read. Function looks like this:

function fAviStructOK ($vFile) {
$vAviStructure = fAviStructure($vFile);
$vHdrl = 5; $vJunk = 0; $vReturn = 0; $vInfoLen = 0;
foreach ($vAviStructure as $vEntry) {
if ($vEntry['tag'] == "RIFF") {
foreach ($vEntry['value'] as $vSubEntry) {
if ($vSubEntry['name'] == "hdrl") $vHdrl = 0;
if ($vSubEntry['name'] == "JUNK") $vJunk = 5;
if ($vSubEntry['name'] == "INFO") {
$vInfo = $vSubEntry['start'];
$vInfoLen = $vSubEntry['size'];
}
}
// if JUNK does exist return 1
if ($vJunk !== 0) $vReturn = 1;
// if INFO is not positioned right after hdrl LIST return 5
if (isset($vInfo)) {
foreach ($vEntry['value'] as $vSubEntry) {
if (($vSubEntry['name'] !== "hdrl") && ($vSubEntry['start'] < $vInfo)) $vReturn = 5;
}
// if length of INFO is not equal to info.bin return 5
if (($vInfoLen + 8) !== filesize("../scripts/info.bin")) $vReturn = 5;
} else {
// in INFO does not exist return 3
$vReturn = 3;
}
break;
}
}
// if hdrl LIST does not exist return 3
if ($vHdrl !== 0) $vReturn = 3;
// if there are more RIFF tags return 4
if (count($vAviStructure) > 1) $vReturn = 4;
return $vReturn;
}

Logic is following: Loops through containers and looks for the one named RIFF. When it is found then it runs inner loop through containers inside and looks especially for those named “hdrl”, “INFO” and “JUNK”. When all data are collected then function performs the check. It checks that only one RIFF container exists (that means AVI is not of version 2), then it checks that “hdrl” container exists, then it checks that “INFO” container is positioned right after “hdrl” and it has correct length. If any of these conditions is not met then function will return error code (meaning that file can not be edited). Last check is looking for “JUNK” container. If it is found then the function will return the warning (RC=1) – meaning that file can be edited, but the structure is not optimal. If all conditions are met and no JUNK is found then the function will return success.

Rebuild the file to correct (desired) structure

Now function called fAviClean, which is used to rebuild the AVI file to the desired structure (check above). This function has only one input parameter and that is the filename with complete path. It looks like this:

function fAviClean ($vFile) {
if (fAviStructOK($vFile) == 0) {
echo "STRUCTURE IS OK NO CLEANING NECESSARY<br>".chr(10);
} else {
echo "Start: ".date("H:i:s.u")."<br>".chr(10);
$vFileTmp = $vFile."_tmp";
$vFileInfo = "../scripts/info.bin";
$vHandleR = fopen($vFile, "rb");
$vHandleW = fopen($vFileTmp, "wb");
$vHandleI = fopen($vFileInfo, "rb");
$vAviStructure = fAviStructure($vFile);
$vRiffSize = 4; $vLoop = 1;

foreach ($vAviStructure as $vEntry) {
if ($vEntry['tag'] == "RIFF") {
// write RIFF tag
if ($vLoop == 1) fwrite($vHandleW, "RIFF0000AVI ");
else fwrite($vHandleW, "RIFF".pack("l", $vEntry['size'])."AVIX");
foreach ($vEntry['value'] as $vSubEntry) {
// read and write hdrl LIST and INFO LIST
if ($vSubEntry['name'] == "hdrl") {
fseek($vHandleR, ($vSubEntry['start']));
fwrite($vHandleW, fread($vHandleR, ($vSubEntry['size'] + 8)));
fwrite($vHandleW, fread($vHandleI, filesize($vFileInfo)));
if ($vLoop == 1) $vRiffSize = $vRiffSize + filesize($vFileInfo) + $vSubEntry['size'] + 8;
}

// read and write movi LIST in 10MB blocks
if ($vSubEntry['name'] == "movi") {
$vMoviSize = pack("l", $vSubEntry['size']);
fwrite($vHandleW, "LIST".$vMoviSize);
$i = 1;
if ($vSubEntry['size'] < 10485760) {$vReadLen = $vSubEntry['size'];}
else {$vReadLen = 10485760;}
fseek($vHandleR, ($vSubEntry['start'] + 8));
while (($i * $vReadLen) <= $vSubEntry['size']) {
fwrite($vHandleW, fread($vHandleR, $vReadLen));
$i++;
}
if (($i * $vReadLen) > $vSubEntry['size']) {
$vReadLen = $vSubEntry['size'] - (($i - 1) * $vReadLen);
fwrite($vHandleW, fread($vHandleR, $vReadLen));
}
if ($vLoop == 1) $vRiffSize = $vRiffSize + $vSubEntry['size'] + 8;
}

// read and write idx1 LIST in 10MB blocks
if ($vSubEntry['name'] == "idx1") {
$vIdx1Size = pack("l", $vSubEntry['size']);
fwrite($vHandleW, "idx1".$vIdx1Size);
$i = 1;
if ($vSubEntry['size'] < 10485760) {$vReadLen = $vSubEntry['size'];}
else {$vReadLen = 10485760;}
fseek($vHandleR, ($vSubEntry['start'] + 8));
while (($i * $vReadLen) <= $vSubEntry['size']) {
fwrite($vHandleW, fread($vHandleR, $vReadLen));
$i++;
}
if (($i * $vReadLen) > $vSubEntry['size']) {
$vReadLen = $vSubEntry['size'] - (($i - 1) * $vReadLen);
if ($vReadLen > 0) fwrite($vHandleW, fread($vHandleR, $vReadLen));
}
if ($vLoop == 1) $vRiffSize = $vRiffSize + $vSubEntry['size'] + 8;
}
}
$vLoop++;
}
}

fclose($vHandleR);
fclose($vHandleW);
fclose($vHandleI);

// replace orig file with new file
unlink($vFile);
rename($vFileTmp, $vFile);

// insert new RIFF size information
$vHandleW = fopen($vFile, "cb");
fseek($vHandleW, 4);
fwrite($vHandleW, pack("l", $vRiffSize));
fclose($vHandleW);

echo "OK<br>".chr(10);
echo "End: ".date("H:i:s.u").chr(10);
}
}

And now some explanations. Function first checks the structure. If it is OK then nothing is done, otherwise it will initialize some variables that will be used later. Variable named “$vFileInfo” contains the path to the binary file which holds the empty INFO container that will be copied to each AVI file. The file can be downloaded here info.bin and it looks like this:

wiki edit avi metadata with php

This function will be copy the old AVI file to the temporary file “*.avi_tmp” with desired structure. When the copy process is completed the original file is removed and the temporary file is renamed to “*.avi”. To the code – on top is the loop which goes through all RIFF tags. Now it has no purpose but I have created it to be ready for AVI2 structure implementation. Initial 12 bytes RIFF header is written at beginning of this loop and then inner loop is started. It loops through the inside containers and it is copying the important ones. Right after header it copies the complete hdrl container, afterwards it copies the clean INFO container from info.bin file, then the movi container and finally – in the end – theidx1 container. Movi and idx1 are copied in 10MB blocks to overcome any memory issues. All other (not important) containers – like JUNK – are not being copied. The final step of the whole process is to write the information about the RIFF size to the 12bytes header (because the size has changed).

I have also included the timestamp print at beginning and end of the process to see how long the whole conversion took.

Write the metadata

For editing the metadata I’m using the HTML form which POSTs data to php file which looks similar to this:

$vFile = $_GET["file"];
$vFileInfo = "../scripts/info.bin";
$vAviStructure = fAviStructure($vFile);
$vHandleI = fopen($vFileInfo, "rb");
$vHandleW = fopen($vFile, "cb");
$vInfoStart = $vAviStructure[0]['value'][1]['start'];
if (fAviStructOK($vFile) > 1) {
echo '<p style="color:#ff0000;">AVI has incorrect structure!</p>'.chr(10);
} else {
// clean all tags
fseek($vHandleW, $vInfoStart);
fwrite($vHandleW, fread($vHandleI, filesize($vFileInfo)));
// write title - INAM
fseek($vHandleW, $vInfoStart + 20);
fwrite($vHandleW, substr($_POST["inam"], 0, 64));
// write year - ICRD
fseek($vHandleW, $vInfoStart + 92);
fwrite($vHandleW, substr($_POST["icrd"], 0, 4));
// write rating - IKEY
fseek($vHandleW, $vInfoStart + 104);
fwrite($vHandleW, substr($_POST["ikey"], 0, 4));
// write director - IART
fseek($vHandleW, $vInfoStart + 116);
fwrite($vHandleW, substr($_POST["iart"], 0, 64));
// write genre - IGNR
fseek($vHandleW, $vInfoStart + 188);
fwrite($vHandleW, substr($_POST["ignr"], 0, 64));
// write actors - ISTR
fseek($vHandleW, $vInfoStart + 260);
fwrite($vHandleW, substr($_POST["istr"], 0, 256));
// write language - ILNG
fseek($vHandleW, $vInfoStart + 524);
fwrite($vHandleW, substr($_POST["ilng"], 0, 32));
// write country - ICNT
fseek($vHandleW, $vInfoStart + 564);
fwrite($vHandleW, substr($_POST["icnt"], 0, 32));
// write poster link - ILGU
fseek($vHandleW, $vInfoStart + 604);
fwrite($vHandleW, substr($_POST["ilgu"], 0, 128));
// write comments - ICMT
fseek($vHandleW, $vInfoStart + 740);
fwrite($vHandleW, substr($_POST["icmt"], 0, 1024));
// close the file
fclose($vHandleW);
}

Coding here was very simple. Before writing the changes we check for the last time if the structure is ok. If it is ok then we clean the old tags (by coping the clean metadata from info.bin file), then we will cut the strings to their maximum length and finally we write the POSTed data to respective containers. Close the file and we are finished :).

Leave a Reply