Coding for Source Citations
Does anyone understand the coding for source citations in GEDCOM files? Both Ancestry and Legacy Family Tree produce GEDCOM coding for sources and source citation. Not only do the two systems not look anything alike, but there doesn't seem to be any rhyme or reason to the coding either. Is there supposed to be a pattern?
|
Re: Coding for Source Citations
When you say "coding" what exactly do you mean? The GEDCOM spec is very specific and available on the internet for your review.
Now.. This being said... A "citation" as specified can be one of two types: 1) inline, 2) linked
An "inline" citation has all citation and source information imbedded within the (for example) EVEN.SOUR tag structure.
A "linked" citation contain only the citation information imbedded within EVEN.SOUR tag structure, and a link or pointer to a separate Source record.
If you are not familiar with GEDCOM this probably does not make sense. Note that each GEDCOM tag does not have any division within a specific tag. The specification requires a sub tag to have additional data points.
So. An example event based citation of type 2 would look like this:
1 EVEN 2 TYPE .....Type of event 2 SOUR .....link to Source record 3 PAGE ......WHERE_WITHIN_SOURCE 3 EVEN ......EVENT_TYPE_CITED_FROM 4 ROLE ......ROLE_IN_EVENT 3 DATA 4 DATE ......ENTRY_RECORDING_DATE 4 TEXT .......TEXT_FROM_SOURCE 5 [CONC|CONT] ......TEXT_FROM_SOURCE 3 ....MULTIMEDIA_LINK 3 ....NOTE_STRUCTURE 3 QUAY .....CERTAINTY_ASSESSMENT
The DATA tag has two sub tags DATE and TEXT. The DATA tag does not allow information specifically associated with the tag.
Hope this helps!
|
Re: Coding for Source Citations
The start of the "citation" begins with the SOUR tag and continue thru the QUAY tag. Only the first two tags are not part of the citation. Obviously I would need to know what the citation is to tell you what information to place where.
|
Re: Coding for Source Citations
When I say “coding” I mean the text representation coded in the GEDCOM file just as you’ve described. I am very familiar with the GEDCOM standard and have referenced it many, many times. The standard is anything but specific.
The context for my use is to read the GEDCOM file produced by Legacy Family Tree and enter the data in a SQL database on my desktop computer. I wrote another program that uses the SQL database to produce HTML coding for web pages for the family website I created. I’ve found that formatting source citations from Legacy’s coding almost comes down to a separate subroutine for each source. Ancestry is better but still not ideal. The purpose of my question is to gain insight so that I can refine the format of my citations on the website to a more proper form.
Let’s look at one citation and contrast the coding from Legacy with the coding in the file produced by Ancestry.com. Both are of the linked variety. Both describe a birth from the Ancestry database, California Birth Index, 1905-1995. I believe a proper reference is described on the Progenealogists website’s Citation Guide.
Database developer or compiler, "Title of Database in quotes," Title of Website in italics (Online: Internet publisher, Internet published date) [Original data publisher, original published date if applicable], URL of database, web page access date.
The source citation might look like:
Ancestry.com Operations, Inc., “California Birth Index, 1905-1995,” Ancestry.com (The Generations Network, Inc., 2005) [Original published Sacramento, CA, State of California Department of Health Services], URL-Ancestry.com, accessed 01/21/2011.
Legacy produces the following statements for the 0 @S@ SOUR section:
0 @S69@ SOUR 1 ABBR California Birth Index, 1905-1995 1 TITL "California Birth Index, 1905-1995," database, The Generati 2 CONC ons Network, Inc., \i Ancestry.com\i0 1 AUTH State of California Department of Health Services 1 PUBL (URL-Ancestry.com : 2005)
Ancestry produces:
0 @S-2059594242@ SOUR 1 REPO @R-2139264844@ 1 TITL California Birth Index, 1905-1995 1 AUTH Ancestry.com 1 PUBL Online publication - Provo, UT, USA: Ancestry.com Operations Inc, 2005.Original data - State of California. California Birth Index, 1905-1995. Sacramento, CA, USA: State of California Department of He 2 CONC alth Services, Center for Health Statistics.Original dat
Legacy produces the following in the EVEN.SOUR section:
1 BIRT 2 DATE 23 Nov 1922 2 PLAC Sacramento, Sacramento, CA 2 SOUR @S69@ 3 PAGE accessed 01/21/2011), Fritze. 3 QUAY 3
Ancestry produces:
1 BIRT 2 DATE 23 Nov 1922 2 PLAC Sacramento, Sacramento, California 2 SOUR @S-2059594242@ 3 PAGE Birthdate: 23 Nov 1922; Birth County: Sacramento. 3 NOTE URL-trees.ancestry.com/rd?f=sse&db=cabirth1905&h=944471&ti=0&indiv=try&gss=pt 3 NOTE 3 DATA 4 TEXT Birth date: 23 Nov 1922 Birth place: Sacramento, California
Notice the difference in the content of the SOUR.PUBL statement between the two systems in the Source Description and the difference in the content of the BIRT.SOUR.PAGE statements. It’s like there are two different standards in use. In fairness, Ancestry claims to be using GEDCOM version 5.5 while Legacy is using 5.5.1. Both standards describe the contents of the SOUR.PUBL statement as “When and where the record was created. For published works, this includes information such as the city of publication, name of the publisher, and year of publication.” I’m at a loss as to why the coding is so different.
|
Re: Coding for Source Citations
Donald,
First it needs to be pointed out that the definition of "Citation" as it pertains to GEDCOM is very different than "Citation" with regard to academic and scholarly papers. A citation in GEDCOM is not complete and formatted to the standards set out in (for example) the Chicago Style, MLA or Elizabeth Shown Mills. To build a citation to these standards you need to pull information from multiple records in GEDCOM not just the "source_citation" tag-set.
EDIT: GEDCOM has it's own "Style" that has many of the components of the other styles but not all.
Second, the GEDCOM "Style" is somewhat lax in the way they specifically support online resources. This is to say that they don’t define their tags with regard to the vast amount of “sourcing” people do today via online resources. So you have to be a little creative to make it work.
Third, most software programs put little or no thought into creating good GEDCOM and never collaborate in the grey areas of creating good GEDCOM, sourcing being one example.
As a reader of the GEDCOM specification then you are very aware of the definition of the tag SOUR.PUBL (but for those that are not familiar with the definition) GEDCOM tag PUBL has the following definition:
"SOURCE_PUBLICATION_FACTS: When and where the record was created. For published works, this includes information such as the city of publication, name of the publisher, and year of publication. For an unpublished work, it includes the date the record was created and the place where it was created. For example, the county and state of residence of a person making a declaration for a pension or the city and state of residence of the writer of a letter."
With this being said the information that goes into the SOUR.PUBL tag for a published work should be , not in any particular order:
1) city of publication 2) name of publisher 3) year of publication
The GEDCOM specification does not detail the specifics of any sub-tags for the PUBL tag therefore you can not, via GEDCOM, find specifically the "city", "publisher" or "year" information as it relates to the source material.
In my opinion the "Publisher" of a census is not Ancestry.com but the government agency that the census was produced by/for, but this is just my thought. Ancestry.com is the "presenter" of the image. For example, the following template for the scholarly paper citation could be:
Online presenter, "name of Image and description" type of image, Name of Website (Online: online publisher, online publish date), specific page number or citation within the website, url or image, (original publisher, original publish date), web page accessed date.
So from the SOUR.PUBL tag you would only gather the information for the “Original Publisher” and “Original Publish Date” in the above example.
The “page” information can only come from the EVEN.SOUR.PAGE tag.
The Online Publisher information and website name should come from the “Repository”. REPO.NAME,
This is a record type that is rarely used but is specific to “Where the document was found” for example a library or online location.
The type of media (type of image) comes from the SOUR.REPO.CALN.MEDI tag
Yes you can make the Source record be Ancestry source, but the information can be found in other places so I don’t like doing it this way.
In Conclusion: The scholarly citation, the written footnote, end note, or bibliography data can come from 3 different places.
1) ???.EVEN.SOUR (for example: an individual/family record event source tag) 2) SOURce Record 3) Repository
Depending on the type of source and the citation style you are using the order that they are put together will change. So you can’t just plop it in one field and expect that to be your “citation”. I have not outlined all tags that can be used to create a complete scholarly citation, but I’m sure that if you look at the GEDCOM standard you can pick them out. I don’t expect any software company to do it the same way that any other company does it. So comparing them is just going to make your head spin.
Since I have my own software that is not commercially available I follow the GEDCOM more specifically and were grey areas occur I try to follow the spirit of GEDCOM, if not the exact letter. Since the letter does not encompass every modern possibility.
|
Re: Coding for Source Citations
Let's not forget that the companies producing the genealogy software are in it to make money and hence their database formats tend to be proprietary. The GEDCOM import/export is an add-on to give us the illusion that it doesn't matter which software we use as we can always transfer our data via GEDCOM. This is simply not true. However, this does not address your problem. Your mention of SQL made me think of Rootsmagic which has a wiki regarding accessing Rootsmagic data via SQL http://sqlitetoolsforrootsmagic.wikispaces.com/How+to+Query+...Rootsmagic can directly import a Legacy database without the need to go via GEDCOM. I confess I do not know if it is necessary to purchase the full Rootsmagic to do this or whether it can be done by Rootsmagic essentials but it is a route which might be worth exploring. David
|
Re: Coding for Source Citations
I use SQL and have a program that is not listed on this board. I suspect this users is rolling his own report writer, or book production from GEDCOM. But maybe not.
|
Re: Coding for Source Citations
|
Re: Coding for Source Citations
Thanks. I hope the insight I gave you regarding "Citations" helps a little. I realize that this is one of the areas that most "out of the box" software solutions have gone their own way on sourcing. Very little is transferable from one program to another in this area.
So your "Coding Question" would be product and source type specific. You will have to either: 1) roll your own sourcing software disconnected from your primary recording program, 2) write extensive and somewhat brittle parsing code to merge the various layouts, 3) Go with a pure GEDCOM based collection and storage program that supports your needs.
I went with #3.
|
Re: Coding for Source Citations
I'm not familiar with GEDCOM based collection programs. May I ask the name of the one you use?
|