Review Board

beta

Fix keyword expansion in files from SVN repositories

Submitted
Updated 8 months, 4 weeks ago

Christian Hammond Reviewers
trunk reviewboard
362
None Review Board SVN
Fixed issues with SVN keyword expansion. If a file had $Id$ or $Date$ or something and the keyword was defined in that file's svn:keywords property, svn cat would return the file with the keyword expanded. This broke patching for us and essentially meant that any patches modifying anything near a keyword expansion would break.

We now collapse the keyword by turning $Keyword:.*$ into $Keyword$. We only do this for keywords actually defined in that file's svn:keywords, in order to prevent collapsing things that aren't actually defined keywords but look like them.

This also supports known keyword aliases.
Unit tests succeed! Debug output shows we're collapsing correctly.
Posted 9 months ago (February 22nd, 2008, 9:11 p.m.)
This is a great start!  I imagine that this alone will fix the issue for most of my users, so let's please get this in.  However, there are a few corner cases I can think of that the current test doesn't expose:

1. Fixed-length substitution:  By using a double-colon, you can tell SVN the exact length of the field.  e.g.,
    $Revision::        $
becomes
    $Revision:: 42     $

2. Keyword aliases:  There are only 5 keywords that SVN understands, but several have equivalent aliases.  The worst part is that expansion still happens if the file uses a different alias than what's in svn:keywords.

3. Non-keywords:  The property can contain any arbitrary keywords, and any invalid keywords are silently ignored.  Your code will happily collapse anything.  Admittedly, it would be a quite malformed repo for someone to run into this.
  1. Christian Hammond 9 months ago (February 23rd, 2008, 1:28 a.m.)
    That regex should handle case #1, shouldn't it? As long as "$Keyword:" exists, anything after that and up until a "$" should be condensed. We should test for this in the unit tests, though.
    
    #2 is a problem. I'll have to cover that at some point. Hopefully it's less common?
    
    #3 I'm less concerned about, as it's sort of user error.
  2. David Trowbridge 9 months ago (February 23rd, 2008, 1:52 a.m.)
    This looks pretty good.  I agree with you that the regex covers #1.  According to the svnbook, there's actually a small number of aliases:
    
    Date -> LastChangedDate
    Revision -> LastChangedRevision, Rev
    Author -> LastChangedBy
    HeadURL -> URL
    
    It should be easy enough to just create a small table and iterate over a few regexes.
  3. Josh Stone 9 months ago (February 23rd, 2008, 6:01 p.m.)
    The difference with the double-colon (#1) is that the unexpanded version in the diff will still have the colons and spaces.  Then in expansion it doesn't touch the width at all; it just replaces the spaces with field contents.
    
    I think we can handle both cases in re.sub() with a function instead of a replacement string, i.e.:
    
    def repl(m):
        if m.group(2):
            return "$%s::%s$" % (m.group(1), " "*len(m.group(3)))
        return "$%s$" % m.group(1)
    data = re.sub(r"\$(%s):(:?)([^\$]+)\$" % keyword_names, repl, data)
  4. Christian Hammond 9 months ago (February 24th, 2008, 2:56 p.m.)
    Excellent, worked like a charm.
    
    Going to commit this and leave aliases to another day.
/trunk/reviewboard/scmtools/svn.py (Diff revision 1)
65
                keyword_names = keywords[normpath].replace(",", "|")
The docs on svn:keywords specify that they are delimited by whitespace, not commas.
  1. Christian Hammond 9 months ago (February 23rd, 2008, 1:28 a.m.)
    Oops, thanks!
I sure wish we didn't have to re-implement all of SVN's crazy keyword rules to make this robust.  The most painful part is that SVN checkouts do have an unexpanded copy on the client side in .svn/text-base/foobar.svn-base!  I can't find any way to capture that using pysvn though, unless you actually create a temporary working directory. (yuck)
  1. Christian Hammond 9 months ago (February 23rd, 2008, 1:28 a.m.)
    Yeah, this was my last resort approach. I couldn't find anything anywhere that allowed us to do what we wanted gracefully. Shame svn cat doesn't have a "Don't expand keywords" flag.
  2. David Trowbridge 9 months ago (February 23rd, 2008, 1:52 a.m.)
    Maybe we should file a bug :)
Posted 9 months ago (February 22nd, 2008, 9:11 p.m.)
This is a great start!  I imagine that this alone will fix the issue for most of my users, so let's please get this in.  However, there are a few corner cases I can think of that the current test doesn't expose:

1. Fixed-length substitution:  By using a double-colon, you can tell SVN the exact length of the field.  e.g.,
    $Revision::        $
becomes
    $Revision:: 42     $

2. Keyword aliases:  There are only 5 keywords that SVN understands, but several have equivalent aliases.  The worst part is that expansion still happens if the file uses a different alias than what's in svn:keywords.

3. Non-keywords:  The property can contain any arbitrary keywords, and any invalid keywords are silently ignored.  Your code will happily collapse anything.  Admittedly, it would be a quite malformed repo for someone to run into this.
I sure wish we didn't have to re-implement all of SVN's crazy keyword rules to make this robust.  The most painful part is that SVN checkouts do have an unexpanded copy on the client side in .svn/text-base/foobar.svn-base!  I can't find any way to capture that using pysvn though, unless you actually create a temporary working directory. (yuck)
Posted 8 months, 4 weeks ago (February 24th, 2008, 4:51 p.m.)

   

  
/trunk/reviewboard/scmtools/svn.py (Diff revision 2)
14
    keyword_aliases = {
15
        'Date':     ['LastChangedDate'],
16
        'Revision': ['LastChangedRevision', 'Rev'],
17
        'Author':   ['LastChangedBy'],
18
        'HeadURL':  ['URL'],
19
    }
This table needs to have the reverse mappings as well.
/trunk/reviewboard/scmtools/svn.py (Diff revision 2)
108
        keywords = keyword_str.split(" ")
109
110
        # Get any aliased keywords
111
        keywords += [alias
112
                     for keyword in keywords
113
                     for alias in self.keyword_aliases.get(keyword, [])]
If you make keyword_aliases include self-references (e.g., "Id": ["Id"], "URL": ["URL", "HeadURL"], etc.), and then build the final list purely from the alias table, instead of appending, then unknown keywords will be ignored (corner case #3).

You'll also prevent any weird values from going into '|'.join(keywords) which might break the regular expression.

I know these cases are unlikely to crop up in real repositories, but it's pretty easy to be protective here.
/trunk/reviewboard/scmtools/tests.py (Diff revision 2)
261
        newfile = patch(diff, file, filename) # Throws an exception!
269
        print "Resulting collapsed file: [%s]" % file
Debug statement...
With those tweaks, this will cover all the cases that I can think of.  Thanks!
Ship it!
Posted 8 months, 4 weeks ago (February 24th, 2008, 4:53 p.m.)

   

  
Ship it!
Posted 8 months, 4 weeks ago (February 24th, 2008, 10:47 p.m.)

   

  
/trunk/reviewboard/scmtools/svn.py (Diff revision 4)
15
    keywords = {
Need "Id" and "Rev" in here also.
Looks good - please commit!